Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.nusomi.com/llms.txt

Use this file to discover all available pages before exploring further.

The capture happens once. The same data shape that powers replay also powers training. Export sealed sessions as a dataset and feed them to whatever model you’re fine-tuning.

What’s in a dataset

Each row is a frame/action pair:
{
  "session_id": "ses_01HXY...",
  "t_ms": 12480,
  "frame": {
    "image_url": "s3://...frame_01HXY.webp",
    "viewport": { "w": 1440, "h": 900 },
    "dom": { "...": "..." },
    "accessibility_tree": "..."
  },
  "action": {
    "type": "click",
    "target": { "role": "button", "name": "Approve" },
    "x": 612,
    "y": 388
  },
  "context": {
    "workflow": "process_invoice",
    "prior_action": { "type": "input_text", "field": "amount", "value": "[masked]" },
    "outcome": "success"
  }
}
Pairs are ordered by (session_id, t_ms) so a model can learn temporal dependencies (this action follows that screen) without you doing the join yourself.

Output formats

FormatBest for
parquetAnalytical pipelines, Spark, DuckDB. Default.
webdatasetVision training (PyTorch / JAX). Tar shards with frame WebPs and JSONL sidecars.
jsonlQuick iteration, manual inspection, custom loaders.
arrowIn-memory training (Polars, HuggingFace datasets).

Exporting

const exp = await nusomi.exports.create({
  workflow: "process_invoice",
  filter: {
    outcome: "success",
    since: "2026-01-01",
  },
  format: "webdataset",
  destination: {
    kind: "s3",
    bucket: "my-training-bucket",
    prefix: "nusomi/process_invoice/v1/",
    region: "us-east-1",
  },
});

await exp.wait(); // resolves when shards are uploaded
console.log(exp.manifest_url);
Supported destinations: s3, gcs, azure_blob, signed_url (Nusomi-hosted, time-limited).

Filters

FilterNotes
workflowOne or more workflow slugs.
outcomesuccess | error | abandoned.
since / untilISO timestamps or relative durations (30d).
actor.kindhuman | model | script.
min_duration_ms / max_duration_msWall-clock bounds.
tagSessions carrying a specific tag.
pathMemory-graph subpath ID. Lets you train on a specific path through the workflow.
exclude_session_idsManually drop runs you don’t want in the set.

Frame sampling

By default, only frames that have an event attached are exported (the action-bearing frames). Override with frame_sampling:
{
  format: "webdataset",
  frame_sampling: {
    mode: "every_n_ms",
    interval_ms: 100,    // 10 fps
    keep_action_frames: true, // always include event frames
  },
}
Other modes: event_only (default), every_n_ms, keyframes_only (only frames where the screen changed materially), all (full capture rate, costly).

Masking

All masking applied at capture time persists into the export. If you set mask_pattern: "credit_card" on the workspace, exported frames will have the cards blurred and the events will carry [masked] placeholders. See security/masking.

Manifest

Every export produces a manifest:
{
  "export_id": "exp_01HZ...",
  "workflow": "process_invoice",
  "format": "webdataset",
  "rows": 184_312,
  "shards": 24,
  "shard_size_bytes": 268_435_456,
  "frames": 184_312,
  "actions": 184_312,
  "filter_summary": "...",
  "destination": "s3://my-training-bucket/nusomi/process_invoice/v1/",
  "created_at": "2026-05-07T14:08:11Z"
}
Use the manifest to reproduce or version your training set.

Determinism

Exports are deterministic for a given filter — re-run the same export call and you’ll get the same rows. To version a dataset across time, freeze the filter (especially until) and tag the export.
const v1 = await nusomi.exports.create({
  ...,
  tag: "process_invoice@v1",
});