Skip to main content

Module: terminal

Embedded terminals for working on a machine from inside the app.

Status: frontend implemented (D2). The terminal.instance panel (xterm.js) binds to a backend PTY session over the terminal channel — input/resize up, output down — and kills its PTY on close. runCommand opens a terminal pre-running a command (via an initialCommand pane param). The agent's gated terminal.exec tool is D3.

Contributions to the layout shell

  • Panels: terminal.instance (one PTY per instance, default: bottom dock tab group).
  • Commands: terminal.new, terminal.kill (active), terminal.clear (active), terminal.focusNext/focusPrev (cycle, via the layoutController.focusPane seam).
  • Keybindings: mod+kterminal.clear, scoped to terminal.instance: it clears the focused terminal (the iTerm/VS Code convention) while shadowing the global command-palette shortcut, which still works everywhere else. See keybinding scopes.
  • Services for other modules: runCommand(cmd, opts) (exported as runTerminalCommand) lets modules (e.g. agent chat showing a suggested command) open a terminal pre-running a command — always visibly, never hidden execution.
  • Settings: terminal.fontFamily (enum of common monospace fonts; falls back to the system monospace if the chosen font isn't installed) and terminal.fontSize (px). Read live via useSetting in TerminalPane; changes apply to open terminals immediately (xterm re-fits and the PTY is resized).

Backend surface

backend/modules/terminal/ owns the PTYs — backend implemented. A TerminalManager lives per /ws connection and routes the shared socket's terminal channel; the frontend (xterm.js) will render it. The PTY always runs where the backend runs — there is no client-side shell in either layout, so browser and desktop are byte-for-byte identical and agents and humans share the same terminals. Cross-platform PTYs unify on ptyprocess (POSIX) and pywinpty (Windows ConPTY), which expose the same PtyProcess.spawn API behind pty.py; the default shell is PowerShell on Windows. On POSIX it prefers $SHELL, but that is often unset when the app is launched from a GUI launcher rather than a terminal (and /bin/bash doesn't exist on every distro, e.g. NixOS), so it falls back through common shells down to /bin/sh. Each PTY's blocking read runs in a thread so it never stalls the event loop; sessions are killed when the socket closes (resume-on-reconnect is a later enhancement). PTY spawn/IO failures are sent to the pane as an error event and shown inline ([terminal error] …) rather than leaving a blank, dead terminal.

Channel protocol ({channel:'terminal', event, data}):

Directioneventdata
client→serverstart / input / resize / kill{id, …}
server→clientstarted / output / exit / error{id, …}

Agent integration

Status: implemented (D3). Declared on the terminal.instance panel. Verified live: the agent ran a command through gated terminal.exec, which prompted with the rendered {command} shell specifier and ran it in a visible terminal on approval. The terminal exposes agent tools & getAgentContext, and is the most security-sensitive tool surface in the app:

  • Read (ungated): terminal.list() and terminal.read(id) (recent scrollback); the active terminal's cwd via getAgentContext().
  • Gated: terminal.exec(id?, command). There is no separate "authority" switch — what the agent may run is purely a function of the permission mode and rules: in plan/default the command is prefilled and prompts before running; in autonomous it runs (subject to deny/ask rules and the rm -rf / circuit breaker). Execution is always visible in a real terminal pane — never hidden.

Matching a shell command safely is the bulk of the permission engine's work. The terminal's specifiers reuse Claude Code's Bash/PowerShell rule logic: * globs with word boundaries (terminal.exec(npm run *)), compound-command splitting on && || ; | (a rule must match each subcommand), wrapper stripping (timeout, nice, …), and a built-in read-only allowlist (ls, cat, pwd, …) that runs without a prompt in every mode.

Browser vs desktop

ConcernBrowserDesktop
Where the shell runson the backend host (local dev: your machine; remote backend: the server — make this visible in the panel title)localhost backend = your machine
Shell defaultsbackend host's default shellsame (PowerShell on Windows)
Copy/paste, scrollback, themesidentical (xterm.js)identical
Keybinding capturebrowser reserves some chords (e.g. Ctrl+W) — the shell keybinding service must not bind those for terminal focusfull capture available

Security note: exposing the backend beyond localhost exposes shell execution on that host. Any future remote-access story must address auth at the backend boundary, not in this module.