Documentation Index
Fetch the complete documentation index at: https://docs.nusomi.com/llms.txt
Use this file to discover all available pages before exploring further.
Two things come out of a session: frames (what the screen looked like) and events (what happened, structured). They are linked one-to-one in time — every event points to the frame it occurred on.
Frames
Frames are screen captures sampled at up to 30 fps. Each frame is more than a screenshot — it carries the structure needed for replay and training.
| Field | Notes |
|---|
id | frm_<id> |
session_id | Parent session |
t_ms | Milliseconds since session start |
image_url | Time-bounded signed URL to the WebP frame |
dom | DOM snapshot (browser surfaces) — null for native desktop |
accessibility_tree | macOS AX / Windows UIA / Linux AT-SPI tree |
viewport | { w, h, scale } |
app | Foreground app metadata (bundle_id, name, version) |
cursor | { x, y } |
Why both pixels and DOM/AX? Pixels are the ground truth for training a vision model. DOM/AX is what makes replay deterministic when the underlying app updates and the pixels shift.
Events
Events are structured actions extracted from the recording. Each one is a typed payload anchored to a specific frame.
Event types
| Type | Payload | Emitted when |
|---|
click | { x, y, target, modifiers } | Mouse click (left or right) |
keypress | { key, modifiers, target } | Keyboard input on a focused element |
input_text | { field, value, target } | A field’s value changed (debounced) |
navigate | { from_url, to_url, source } | Browser navigation |
app_focus | { from_app, to_app } | Foreground app switched |
submit | { form, payload } | Form submission detected |
validation_error | { field, message } | Inline validation surfaced |
retry | { of_event_id, after_ms } | Same action repeated within 30s |
wait | { duration_ms, reason } | UI hung — modal, spinner, network |
success | { marker } | Tagged success state reached |
error | { kind, message } | Tagged or detected failure |
tag | { name, data } | User-supplied session.tag() |
Frame anchoring
Every event has a frame_id and a t_ms. Two events that happen on the same frame share frame_id but differ in t_ms (sub-frame ordering is preserved). The training-data export joins them automatically — see datasets.
Querying
// All events for a session
const events = await nusomi.events.query(sessionId);
// Just the validation errors
const errors = await nusomi.events.query(sessionId, {
type: "validation_error",
});
// Events between two frame timestamps
const slice = await nusomi.events.query(sessionId, {
t_ms: { gte: 12_000, lte: 18_000 },
});
// Events in the form of {frame, event} pairs (training-shaped)
const pairs = await nusomi.events.query(sessionId, { include: "frame" });
Streaming live
If you need events as they happen (live ops dashboards, real-time agent supervision) subscribe to the session stream:
for await (const ev of nusomi.events.stream(sessionId)) {
console.log(ev.type, ev.t_ms, ev.payload);
}
See streaming for the SSE wire format.
Event quality
Event extraction runs on a mix of accessibility-tree introspection and a vision pass for surfaces without an AX tree. Quality numbers from public benchmarks:
| Surface | F1 |
|---|
| Modern web (Chromium) | 0.97 |
| Native macOS apps | 0.93 |
| Native Windows (UIA) | 0.91 |
| Linux GTK / Qt | 0.88 |
| Citrix / RDP / VNC | 0.78 |
Lower-quality surfaces are flagged in the event stream — check event.confidence if you need to filter.