Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.nusomi.com/llms.txt

Use this file to discover all available pages before exploring further.

Two things come out of a session: frames (what the screen looked like) and events (what happened, structured). They are linked one-to-one in time — every event points to the frame it occurred on.

Frames

Frames are screen captures sampled at up to 30 fps. Each frame is more than a screenshot — it carries the structure needed for replay and training.
FieldNotes
idfrm_<id>
session_idParent session
t_msMilliseconds since session start
image_urlTime-bounded signed URL to the WebP frame
domDOM snapshot (browser surfaces) — null for native desktop
accessibility_treemacOS AX / Windows UIA / Linux AT-SPI tree
viewport{ w, h, scale }
appForeground app metadata (bundle_id, name, version)
cursor{ x, y }
Why both pixels and DOM/AX? Pixels are the ground truth for training a vision model. DOM/AX is what makes replay deterministic when the underlying app updates and the pixels shift.

Events

Events are structured actions extracted from the recording. Each one is a typed payload anchored to a specific frame.

Event types

TypePayloadEmitted when
click{ x, y, target, modifiers }Mouse click (left or right)
keypress{ key, modifiers, target }Keyboard input on a focused element
input_text{ field, value, target }A field’s value changed (debounced)
navigate{ from_url, to_url, source }Browser navigation
app_focus{ from_app, to_app }Foreground app switched
submit{ form, payload }Form submission detected
validation_error{ field, message }Inline validation surfaced
retry{ of_event_id, after_ms }Same action repeated within 30s
wait{ duration_ms, reason }UI hung — modal, spinner, network
success{ marker }Tagged success state reached
error{ kind, message }Tagged or detected failure
tag{ name, data }User-supplied session.tag()

Frame anchoring

Every event has a frame_id and a t_ms. Two events that happen on the same frame share frame_id but differ in t_ms (sub-frame ordering is preserved). The training-data export joins them automatically — see datasets.

Querying

// All events for a session
const events = await nusomi.events.query(sessionId);

// Just the validation errors
const errors = await nusomi.events.query(sessionId, {
  type: "validation_error",
});

// Events between two frame timestamps
const slice = await nusomi.events.query(sessionId, {
  t_ms: { gte: 12_000, lte: 18_000 },
});

// Events in the form of {frame, event} pairs (training-shaped)
const pairs = await nusomi.events.query(sessionId, { include: "frame" });

Streaming live

If you need events as they happen (live ops dashboards, real-time agent supervision) subscribe to the session stream:
for await (const ev of nusomi.events.stream(sessionId)) {
  console.log(ev.type, ev.t_ms, ev.payload);
}
See streaming for the SSE wire format.

Event quality

Event extraction runs on a mix of accessibility-tree introspection and a vision pass for surfaces without an AX tree. Quality numbers from public benchmarks:
SurfaceF1
Modern web (Chromium)0.97
Native macOS apps0.93
Native Windows (UIA)0.91
Linux GTK / Qt0.88
Citrix / RDP / VNC0.78
Lower-quality surfaces are flagged in the event stream — check event.confidence if you need to filter.