EXPANDED
MEMORY
A multimodal cognitive workspace engineered to mitigate interaction fatigue. Designed to index personal data clusters via low-latency voice intent pipelines, utilizing subtle spatial micro-gestures for fluid contextual manipulation across a continuous semantic memory layer.
INITIATE UPLINK_- Role
- Creative Director & Technologist
- Year
- 2025
- Client
- Tech Lab
- Discipline
- Tech Lab
PROJECT OVERVIEW
A multimodal cognitive architecture inspired by cinematic computing, engineered to overcome the physical limits of spatial interaction. By shifting heavy input to voice-driven intent and reserving spatial micro-gestures for fluid data manipulation, the system establishes a zero-fatigue environment for human-computer collaboration.
Expanded Memory is a modern response to interactive sci-fi interfaces—from Minority Report's gestural fantasy to today's ambient stack of XR glasses, on-device hand tracking, and advanced LLMs. Where early motion sensors demanded exhausting, full-arm choreography, contemporary tooling makes spatial cognition a practical design medium rather than a demo novelty.
The Ergonomics of Free Space
Voice Cognition.
Low-latency bidirectional voice pipelines streaming natural language directly into localized memory clusters via LiveKit WebSocket transport.
Spatial Gestures.
Camera-mapped gesture tracking translates cinematic interface theory into raw frontend inputs across the spatial data plane.
Cognitive Core.
A semantic vector engine continuously indexes, clusters, and surfaces cross-references from your personal data ecosystem.
The core challenge in free-space interaction is well documented in HCI as Gorilla Arm Syndrome—the physical exhaustion caused by prolonged, high-effort gestures held away from the body. The design problem is precise: how do we preserve the cinematic impact of spatial UI without fatiguing the user within minutes?
Expanded Memory reframes spatial input as a complement to voice, not a replacement. Abstract commands migrate to speech; the hands remain available for precise, low-amplitude manipulation—keeping the interaction loop sustainable for extended work sessions.
The Multimodal Solution
The architecture splits intent across two parallel channels. Voice commands route through real-time, low-latency WebRTC pipelines (LiveKit), translating natural language into structured semantic actions without demanding the user hold a pose.
Hand tracking is reserved for micro-gestures only—subtle pinch, flick, and scroll bounds executable with an elbow resting on a desk. This division of labor keeps spatial interaction expressive while dramatically reducing the physical cost of continuous use.
The Lightweight Browser Sandbox
The engineering outcome is a high-fidelity, client-side spatial interaction sandbox—built on a modern Vite-driven, React 19 architecture with seamless, performance-optimized local processing pipelines.
Semantic synchronization, gesture bounds, and voice intent resolve in the browser without round-trips to heavy backend render farms—making the system deployable as a lightweight proof of a production-grade multimodal workflow.

Demonstrated a zero-fatigue multimodal loop: voice for intent, micro-gestures for manipulation
Shipped a browser-native spatial sandbox with sub-20ms trace ingestion and 60fps interaction targets
Established a reusable interaction design system for cinematic computing in real product contexts