What to Look For in a Robot Data Provider

The robot training data market is young and underregulated. Unlike traditional AI data labeling, robot demonstration data requires physical equipment, skilled operators, and task-specific knowledge that most data labeling companies do not have. Evaluating providers requires different criteria than you would use for image labeling or text annotation services.

Four dimensions matter most: robot variety (a policy trained on a single arm type generalizes poorly), operator quality (smooth, consistent demonstrations train better than jerky or hesitant ones), data format flexibility (receiving data in a format incompatible with your training framework wastes weeks of conversion time), and legal clarity on IP ownership and confidentiality.

Start by defining your requirements precisely: which robot arm(s), task description with success criteria, environment setup, number of demonstrations, target format, and timeline. A provider who asks these questions before quoting is more credible than one who quotes immediately.

Key Evaluation Questions to Ask Every Provider

  • Which specific robot models do you operate? Get a list with arm models, gripper options, and available robot hands. Vague answers ("various industrial arms") are a red flag.
  • What is your demonstration rejection rate, and how is it measured? Target: <15% rejection. A good provider tracks rejection reasons (operator error, hardware fault, task infeasible) separately.
  • Do you perform annotation? Some tasks benefit from annotated contact events, success/failure labels, or language descriptions per episode. Ask whether annotation is included or priced separately.
  • What data format do you deliver, and can you export to HDF5, LeRobot Parquet, and RLDS? Requiring format conversion on your side adds 1–3 weeks of engineering work. See our format guide.
  • Can I see 5–10 sample episodes before committing? Any serious provider has a demo dataset. Review it for smoothness, camera framing, gripper timing, and episode-level consistency.
  • What are your NDA and IP terms? Specifically: does your contract prohibit the provider from using your task data to train their own models?
  • What is your environment replication process? If your task requires a specific tabletop layout, object set, or background, how do they replicate it? Do they require you to ship objects, or can they source proxies?

Pricing Models in the Robot Data Market

Pricing varies enormously depending on task complexity, robot type, required demonstrations, and quality level. Understand the model before comparing quotes — a low per-demo price with a high rejection rate may cost more in practice than a higher per-demo price with quality guarantees.

Pricing ModelTypical RangeBest ForWatch Out For
Per-demonstration$15–$200/demoWell-defined tasks with clear success criteriaRejection rate not included in price; quality variance
Per-hour (operator)$150–$500/hrExploratory tasks, novel robot setupsEfficiency varies; no guaranteed demos per hour
Project-based (fixed)$5K–$100K+Large defined dataset with full specScope creep if task spec is loose; no iteration
Volume tier$80/demo → $25/demo at 500+High-volume production datasetsOnly works if first batch quality is verified
Subscription / retainer$3K–$15K/monthOngoing data collection pipelinesOverkill for one-time research; good for flywheels

For a typical manipulation research project (single task, 200 demonstrations, bimanual, medium complexity), expect $8K–$25K at per-demo pricing or $15K–$40K at hourly rates depending on setup time. Factor in: environment setup (charged once), operator training on your task (1–3 hours typically), quality review, and format conversion.

SVRC prices manipulation data at $25–$80 per demonstration depending on task complexity and robot configuration, with volume discounts at 500+ demos. All prices include QA review, metadata, and multi-format export.

Contract Terms: What Must Be in Your Agreement

The following terms are non-negotiable. Walk away from any provider who will not include them.

  • Full IP transfer: "All demonstration data, including raw sensor logs, video, joint trajectories, and derived annotations, are the exclusive intellectual property of [Your Company]." No "license" language — you must own it outright.
  • No training on your data: "Provider shall not use Client's task specifications, demonstration data, or derived data to train, fine-tune, or benchmark any Provider model or third-party model." This clause is often missing in early contracts.
  • Rejection rate guarantee: Define the rejection criteria in the contract. "Provider guarantees a maximum 15% rejection rate; rejected episodes are not billed." This aligns incentives.
  • Data retention and deletion: After delivery, the provider should delete all copies within 30 days. Specify the destruction certificate requirement for sensitive tasks.
  • Confidentiality covers task specification: Your task design may be proprietary. The NDA must cover not just data but the task description, environment setup, and object list.

Red Flags: When to Walk Away

  • Cannot disclose rejection rate — means their quality process is immature or rejection is high.
  • Proprietary format only — "We deliver in our platform format" with no standard export means you are locked to their analysis tools or face expensive conversion.
  • No sample episodes available — a provider who has collected robot data at scale has samples. No samples = no track record.
  • Fewer than 2 distinct robot types — a provider running only one arm type cannot offer the diversity needed for generalization unless you specifically need single-arm data.
  • Operator credentials unavailable — "trained teleoperators" without specifics means you cannot assess quality. Good providers describe operator selection, training process, and demonstrate operator-consistency metrics.
  • Vague IP language — phrases like "non-exclusive license" or "Provider retains derivative rights" are unacceptable for proprietary task data.
  • No task feasibility assessment — a credible provider will tell you if your task is too difficult, requires special hardware, or has a low expected success rate before you pay.

Provider Comparison

ProviderRobot VarietyFormat OutputIP ClarityTypical Price/DemoBest For
SVRC (us)High (8+ arms, humanoids, hands)HDF5, LeRobot, RLDSFull transfer standard$25–$80Research + commercial, flexible format
Scale AI RoboticsMedium (UR, xArm focus)Custom + some standardGood (enterprise)$40–$150Large commercial contracts
DIY (internal)Whatever you ownYour choiceN/A$5–$15 (labor only)Maximum control, if you have bandwidth
Academic lab partnershipVariesUsually HDF5/customRequires explicit agreement$0–$20Cost-sensitive, relationship-dependent

RFP Checklist for Data Collection Vendors

When issuing a Request for Proposal to data collection vendors, include all of the following specifications. Ambiguity in the RFP leads to mismatched expectations and wasted budget.

RFP SectionRequired InformationWhy It Matters
Task specificationWritten task description with success/failure criteria, object list, workspace diagramPrevents scope creep and misinterpretation
Robot requirementsSpecific arm model(s), gripper type, end-effector, any sensor requirements (F/T, tactile)Policies are hardware-specific; wrong arm = useless data
VolumeNumber of demonstrations, episode length range, total hours of dataDetermines pricing, timeline, and operator staffing
Data formatHDF5, LeRobot Parquet, RLDS, or custom. Camera resolution, frame rate, joint recording rateFormat conversion costs weeks; specify upfront
Quality criteriaMaximum rejection rate, smoothness threshold, success criteria per episodeAligns incentives; prevents billing for bad data
TimelineDelivery date, milestone checkpoints, pilot batch sizePrevents open-ended projects; enables early quality review
IP and confidentialityFull IP transfer, no training on your data, NDA covering task specProtects your proprietary task design and data

Evaluation Scoring Matrix

Score each vendor on a 1-5 scale across these dimensions, then weight by importance for your project:

CriterionWeight (Suggested)How to Assess
Operator quality30%Review 10 sample episodes; measure smoothness, consistency, success rate
QA process20%Ask for documented QA pipeline; automated + human review steps
Data format flexibility15%Can they deliver in HDF5, LeRobot, and RLDS? Or only proprietary?
Hardware coverage15%Number of arm types, gripper options, sensor availability
Turnaround time10%Quoted timeline for your volume; ask for references who received on time
IP and legal terms10%Full IP transfer? No training on your data? Standard contract review

SVRC's Data Collection Process

For reference, here is how SVRC executes a typical data collection project from inquiry to delivery:

  1. Task feasibility assessment (free, 1-2 days): We review your task specification and provide a feasibility assessment including estimated success rate, recommended robot configuration, and demonstration count estimate.
  2. Pilot batch (50-100 demos, 1 week): We collect a small pilot batch on the agreed hardware. You review the data for quality, format compatibility, and task correctness before committing to the full volume.
  3. Production collection (2-6 weeks): Certified operators collect the full demonstration volume. Every episode passes automated QA (smoothness check, format validation) plus human review (task success verification, annotation if requested).
  4. Delivery and iteration: Data delivered in your requested format(s) via secure download or direct upload to your cloud storage. We include metadata (operator ID, timestamp, hardware config) and a quality report. If the data does not meet spec, we re-collect at no additional cost.

All SVRC data collection uses hardware from our store catalog including OpenArm 101 (6-DOF, 500g payload, $4,500), DK1 bimanual systems, and third-party arms (UR, xArm, Kinova). Data is recorded at 30 Hz joint positions + 60 fps RGB and delivered in HDF5, LeRobot Parquet, or RLDS format.

Related Guides