I was working on a Linux distribution installer and ran into a familiar problem: every existing installer stack I looked at felt larger and more institutional than what I wanted to ship. I did not want a framework that dragged an entire desktop mental model behind it. I wanted a small bootable Linux, graphics because it is 2026 and installers need graphics, and one focused application that owns the screen until the installation is done.
That led me to Cage, a Wayland kiosk compositor. Cage is refreshingly direct: run one graphical application, make it full-screen, and get out of the way. It also says the quiet part out loud: it is a kiosk, not a security boundary.
I do not love when a kiosk compositor has to say that. I also appreciate the honesty. A compositor can remove window management from the user experience; it does not automatically contain the process it launches, the browser it embeds, the kernel interfaces it exposes, or the local services the UI talks to.
So I flipped a coin and built the first version anyway: Cage launches Chromium; Chromium renders a TypeScript/CSS/HTML application; the web app talks to a Python backend that performs the privileged work. My immediate use case was an installer, but the design target is broader: a real-world kiosk that can run one serious local application without pretending the compositor is the sandbox. It sounds like a bad idea until you see the result. The UI is fast, legible, and genuinely pleasant to use.
That is when the real work starts. If this stack is going to be more than a pretty demo, it needs a threat model.
This is the first part of a longer analysis. This post defines the kiosk attack surface for Debian Trixie + Wayland + Cage + Chromium. Later parts will go deeper into Cage source-level findings, Chromium kiosk configuration, Wayland security properties, and the backend boundary.
What the stack actually is
The stack I am analyzing is deliberately small:
- Debian Trixie as the base system.
- A minimal boot environment with enough graphics support to start Wayland.
- Cage as the single-application compositor.
- Chromium as the renderer and interaction shell.
- A TypeScript/CSS/HTML frontend.
- A Python backend that performs privileged local operations.
The installer is only the first use case. The useful problem is the general-purpose kiosk pattern: any appliance UI, registration terminal, factory console, internal control panel, lab station, demo unit, or field laptop that boots into one graphical application and pretends the rest of Linux is not there.
The security mistake is treating those as simpler than a desktop. They are not. A kiosk is a desktop with fewer visible controls and fewer escape routes by convention. The kernel, device nodes, IPC surfaces, browser engine, GPU stack, filesystem, network, and update path are all still present.
What the attacker wants
For a real-world kiosk, the attacker goals are unusually sharp. The installer case is just the most obvious high-impact example:
| Goal | Why it matters |
|---|---|
| Escape the browser UI | Reach a shell, a file chooser, devtools, another URL, or a privileged local endpoint. |
| Influence the backend | Turn a UI-level bug into privileged local actions: file writes, device control, account changes, package changes, or arbitrary command execution. |
| Tamper with kiosk-controlled output | Change what the kiosk provisions, submits, displays, installs, unlocks, prints, enrolls, or writes to local state. |
| Persist in the live environment | Survive restarts long enough to affect later sessions, later users, or later provisioning runs. |
| Exfiltrate secrets | Session tokens, enrollment credentials, operator input, Wi-Fi secrets, API keys, logs, or local provisioning material. |
| Abuse hardware access | Keystroke injection, USB storage, DMA-adjacent device paths, camera/microphone if present, or unexpected serial devices. |
For an installer, some of those actions are disk and bootloader operations. For another kiosk, they may be printer jobs, access-control API calls, ticket issuance, payment device workflows, factory terminal actions, or SCADA dashboard operations. The shape is the same.
Trust boundaries
The stack has four boundaries that matter.
The first is the browser boundary. Chromium is a huge renderer, JavaScript runtime, media stack, networking stack, GPU client, and policy engine. It can be configured into a reasonable kiosk shell, but it should not be mistaken for a small UI toolkit. The web app is untrusted until proven otherwise because DOM input, navigation, origin rules, clipboard, downloads, file handling, devtools, extensions, and command-line flags all change the attack surface.
The second is the Cage boundary. Cage controls what appears on screen and which Wayland client owns the session. It is useful for UX containment. It is not, by itself, a sandbox for the application it launches. If Chromium can access a file, open a socket, talk to D-Bus, invoke a helper, or reach a privileged backend, Cage does not make those operations safe.
The third is the backend boundary. This is the important one. The Python backend is where harmless clicks become privileged local actions. In an installer that might mean “partition this disk” or “install these packages.” In a kiosk it might mean “unlock this door,” “submit this form,” “print this document,” or “change this device state.” Every web-to-backend edge needs authentication, authorization, state validation, and strict command construction. The backend is not a convenience layer. It is the broker.
The fourth is the OS boundary. UID separation, mount options, seccomp, AppArmor, network policy, systemd unit restrictions, device permissions, and immutable live media are the things that still work when the UI fails.
Primary attack vectors
1. Boot and image integrity
The kiosk starts before the application has a chance to defend itself. If the boot image or appliance image can be modified, the game is over before Cage starts. Secure Boot, signed images, verified initramfs contents, read-only media, and reproducible release artifacts matter more than kiosk flags.
For deployed kiosks, the same point shows up as update integrity. If an attacker can swap the kiosk image or push an unsigned update, the compositor choice is irrelevant.
2. Physical input
Kiosks should assume hostile hands. That is true whether the kiosk installs an OS, checks in a visitor, controls a device, or exposes an internal workflow.
Key combinations, virtual terminals, SysRq, USB keyboards, barcode scanners that act as keyboards, touchscreens that generate unexpected gestures, accessibility shortcuts, and hotplugged input devices are all part of the UI. The first hardening question is not “can the user click out of Chromium?” It is “what can any input device make the system do?”
Disable virtual terminal switching where appropriate. Remove shells from reachable TTYs. Treat all hotplugged HID as hostile. If physical USB is needed, separate that requirement from arbitrary keyboard injection.
3. Chromium navigation and web platform escape
Chromium is both the UI and the largest attack surface in the design.
The obvious kiosk failures are familiar: address bar exposure, context menus, downloads, print dialogs, file pickers, external protocol handlers, password manager surfaces, certificate interstitials, error pages, devtools, crash restore, translation bubbles, extension surfaces, and “open with” flows.
The less obvious failures are web-origin problems. If the kiosk UI is loaded from http://localhost, what else can reach that origin? If it is loaded from file://, how do relative file reads behave? If it talks to 127.0.0.1, can another local process bind first? If it uses WebSockets, is the origin checked? If it uses a bearer token, where is it stored? If it renders backend error messages, can those messages become HTML?
For this stack, Chromium should be treated as an untrusted client. The backend should never trust it because it is “the only window.”
4. The local backend API
The backend is the highest-risk application code in the system.
Avoid generic RPC. Avoid “run this command” endpoints. Avoid passing shell fragments from the frontend. Make each privileged operation a narrow, typed endpoint with a state machine behind it. For an installer that means scan disks, choose disk, confirm destructive action, partition, format, install, configure bootloader, create user, finalize. For a kiosk, the verbs change but the rule does not.
Every endpoint should answer three questions before acting:
- Is the caller allowed to invoke this operation?
- Is the kiosk currently in a state where this operation is valid?
- Are all target paths, devices, package names, and configuration values derived from allowlisted structured data rather than UI-provided strings?
The browser can be beautiful. The backend must be boring.
5. D-Bus, portals, and desktop leftovers
A minimal Wayland kiosk often inherits more desktop plumbing than intended. D-Bus session buses, system buses, XDG portals, notification services, file managers, policykit agents, secret services, print services, update agents, and network managers can all exist even when no desktop shell is visible.
That does not make them bad. It makes them inventory. If Chromium or the backend can talk to a service, that service is part of the kiosk attack surface.
The correct posture is explicit allowlisting: only the services needed for graphics, input, networking, and the kiosk workload should exist in the live environment.
6. Filesystem and device exposure
A real kiosk often needs dangerous access by design: local devices, credentials, printers, card readers, update channels, internal APIs, or in the installer case disks and bootloaders. That does not mean the browser process needs any of that.
Run Chromium as an unprivileged user with a disposable profile directory. Keep the backend separate. Make the backend the only process with access to privileged devices and target resources. Mount the live system read-only where possible. Put temporary state on tmpfs. Make logs explicit: useful enough to debug, not full of secrets.
7. Network and captive environments
Different kiosks need different network policies. An installer may need package repositories, mirrors, enrollment, or updates. A registration terminal may need one API origin. A factory console may need only an internal control plane. Those are different policies.
Define allowed egress before boot: application origin, API endpoints, time sync if needed, update endpoints if used, and nothing else. Block arbitrary browsing at the network layer, not just in Chromium configuration.
If the UI has any “open documentation,” “report issue,” or “learn more” links, they are navigation surfaces. Treat them as such.
Initial hardening posture
The first version of the hardening checklist is short on purpose.
| Area | Baseline |
|---|---|
| Chromium user | Dedicated unprivileged user, fresh profile per boot, no saved state. |
| Backend user | Separate service identity; privileged only where specific operations require it. |
| Backend API | Local-only, origin-checked, token-bound, typed endpoints, no generic command execution. |
| Filesystem | Read-only live root where practical; tmpfs for browser profile and transient state. |
| Devices | Chromium gets no privileged device access; backend gets only the narrow device access the kiosk workload needs. |
| Network | Default deny egress with an explicit allowlist for the kiosk origin, APIs, updates, or repositories. |
| Desktop services | No unused portals, agents, file managers, shells, or notification helpers. |
| Input | Audit VT switching, SysRq, hotkeys, HID hotplug, accessibility shortcuts, and crash paths. |
| Updates | Signed image/update path; no unsigned content controlling kiosk behavior. |
This is not complete. It is the minimum posture before looking at source-level bugs.
About Cage specifically
Cage’s “not a security boundary” position is the right starting point. A kiosk compositor can be correct and still not be a sandbox. It owns the display relationship between Wayland client and output. It does not automatically solve browser compromise, backend authorization, filesystem reachability, device permissions, D-Bus exposure, kernel bugs, malicious USB devices, or an unsigned boot image.
That distinction matters because it keeps the analysis honest. The question is not “can Cage make Chromium safe?” It cannot. The question is “what can Cage reliably simplify, and what boundaries must be built elsewhere?”
That is where the next part starts: source-level behavior, assumptions, sharp edges, and the recommendations that fall out of reading the compositor as a component in a hostile kiosk rather than as a desktop convenience.
Why not a Rust webview?
I looked at Rust-based alternatives because the shape is attractive: a smaller application, stronger language defaults, and less of Chromium’s historic weight. The problem is rendering. For a production kiosk UI, I need boring support for modern HTML, CSS, JavaScript, accessibility, fonts, input methods, and layout behavior. “Almost good enough” is not good enough for the thing that drives a real-world local workflow.
That may change. I would like it to change. A smaller, memory-safe rendering stack would be a better long-term fit for this class of UI. For now, Chromium is the pragmatic option, so the work is constraining it rather than pretending it is small.
The series map
This first post defines the attack surface. The follow-up work breaks into four tracks:
- Cage source audit notes: each finding, why it matters, and how to avoid building the wrong trust model around it.
- Chromium kiosk configuration: flags, profile policy, disabled surfaces, navigation control, and why Chromium is still uncomfortable in this role.
- Wayland security analysis: what Wayland improves, what it intentionally does not solve, and what matters for kiosk threat maps.
- Backend hardening: Python service design, local API authorization, privileged operation brokers, and kiosk state machines.
The kiosk can be beautiful. The local application can be clean. Neither property is a boundary. The boundary is the part we deliberately build.
Sneak peek: the installer actually boots
Here is the practical end state behind this threat model: the kiosk-style installer running inside a QEMU virtual machine with a virtual video card. The security analysis in this series is about making this kind of setup boring enough to trust, not about proving that a browser in a kiosk is magically safe.