Frames & events

Two things come out of a session: frames (what the screen looked like) and events (what happened, structured). They are linked one-to-one in time — every event points to the frame it occurred on.

Frames

Frames are screen captures sampled at up to 30 fps. Each frame is more than a screenshot — it carries the structure needed for replay and training.

Field	Notes
`id`	`frm_<id>`
`session_id`	Parent session
`t_ms`	Milliseconds since session start
`image_url`	Time-bounded signed URL to the WebP frame
`dom`	DOM snapshot (browser surfaces) — `null` for native desktop
`accessibility_tree`	macOS AX / Windows UIA / Linux AT-SPI tree
`viewport`	`{ w, h, scale }`
`app`	Foreground app metadata (`bundle_id`, `name`, `version`)
`cursor`	`{ x, y }`

Why both pixels and DOM/AX? Pixels are the ground truth for training a vision model. DOM/AX is what makes replay deterministic when the underlying app updates and the pixels shift.

Events

Events are structured actions extracted from the recording. Each one is a typed payload anchored to a specific frame.

Event types

Type	Payload	Emitted when
`click`	`{ x, y, target, modifiers }`	Mouse click (left or right)
`keypress`	`{ key, modifiers, target }`	Keyboard input on a focused element
`input_text`	`{ field, value, target }`	A field’s value changed (debounced)
`navigate`	`{ from_url, to_url, source }`	Browser navigation
`app_focus`	`{ from_app, to_app }`	Foreground app switched
`submit`	`{ form, payload }`	Form submission detected
`validation_error`	`{ field, message }`	Inline validation surfaced
`retry`	`{ of_event_id, after_ms }`	Same action repeated within 30s
`wait`	`{ duration_ms, reason }`	UI hung — modal, spinner, network
`success`	`{ marker }`	Tagged success state reached
`error`	`{ kind, message }`	Tagged or detected failure
`tag`	`{ name, data }`	User-supplied `session.tag()`

Frame anchoring

Every event has a frame_id and a t_ms. Two events that happen on the same frame share frame_id but differ in t_ms (sub-frame ordering is preserved). The training-data export joins them automatically — see datasets.

Querying

// All events for a session
const events = await nusomi.events.query(sessionId);

// Just the validation errors
const errors = await nusomi.events.query(sessionId, {
  type: "validation_error",
});

// Events between two frame timestamps
const slice = await nusomi.events.query(sessionId, {
  t_ms: { gte: 12_000, lte: 18_000 },
});

// Events in the form of {frame, event} pairs (training-shaped)
const pairs = await nusomi.events.query(sessionId, { include: "frame" });

Streaming live

If you need events as they happen (live ops dashboards, real-time agent supervision) subscribe to the session stream:

for await (const ev of nusomi.events.stream(sessionId)) {
  console.log(ev.type, ev.t_ms, ev.payload);
}

See streaming for the SSE wire format.

Event quality

Event extraction runs on a mix of accessibility-tree introspection and a vision pass for surfaces without an AX tree. Quality numbers from public benchmarks:

Surface	F1
Modern web (Chromium)	0.97
Native macOS apps	0.93
Native Windows (UIA)	0.91
Linux GTK / Qt	0.88
Citrix / RDP / VNC	0.78

Lower-quality surfaces are flagged in the event stream — check event.confidence if you need to filter.

Get started

Concepts

Capture

Self-hosted

Security

Recipes

Reference

Frames & events

Frames

Events

Event types

Frame anchoring

Querying

Streaming live

Event quality

Get started

Concepts

Capture

Self-hosted

Security

Recipes

Reference

Documentation Index

​Frames

​Events

​Event types

​Frame anchoring

​Querying

​Streaming live

​Event quality

Frames

Events

Event types

Frame anchoring

Querying

Streaming live

Event quality