IndexTech Lab / 2025
EM / 001
[ GESTURAL AI / VOICE COGNITION NETWORK ]

EXPANDED
MEMORY

A multimodal cognitive workspace engineered to mitigate interaction fatigue. Designed to index personal data clusters via low-latency voice intent pipelines, utilizing subtle spatial micro-gestures for fluid contextual manipulation across a continuous semantic memory layer.

INITIATE UPLINK_

Tech Lab

Expanded Memory

Tech Lab / Interaction Design System

Role
Creative Director & Technologist
Year
2025
Client
Tech Lab
Discipline
Tech Lab

PROJECT OVERVIEW

A multimodal cognitive architecture inspired by cinematic computing, engineered to overcome the physical limits of spatial interaction. By shifting heavy input to voice-driven intent and reserving spatial micro-gestures for fluid data manipulation, the system establishes a zero-fatigue environment for human-computer collaboration.

Expanded Memory is a modern response to interactive sci-fi interfaces—from Minority Report's gestural fantasy to today's ambient stack of XR glasses, on-device hand tracking, and advanced LLMs. Where early motion sensors demanded exhausting, full-arm choreography, contemporary tooling makes spatial cognition a practical design medium rather than a demo novelty.

The Ergonomics of Free Space

The core challenge in free-space interaction is well documented in HCI as Gorilla Arm Syndrome—the physical exhaustion caused by prolonged, high-effort gestures held away from the body. The design problem is precise: how do we preserve the cinematic impact of spatial UI without fatiguing the user within minutes?

Expanded Memory reframes spatial input as a complement to voice, not a replacement. Abstract commands migrate to speech; the hands remain available for precise, low-amplitude manipulation—keeping the interaction loop sustainable for extended work sessions.

The Multimodal Solution

SPATIAL / MESHTRACKING
C04: HEAD-TRACK
S11: PINCH
C13: SWIPE
C13: GRASP
L07: STANCE
L08: STRIDE
MESH 32PTθ 0°SCAN OK
KINETIC VECTOR LOG // SESSION #4471
USER >"Recall the synth layout notes from last winter's audio session."
SYS >[ Accessing Sound Design Logs — 2025-11 ]
SYS >[ Vector Match Verified — 0.0021s ]
OUT >Darkwave synthesis configurations, Ableton track routings, modular hardware cluster schemas.
AWAITING INPUT_

The architecture splits intent across two parallel channels. Voice commands route through real-time, low-latency WebRTC pipelines (LiveKit), translating natural language into structured semantic actions without demanding the user hold a pose.

Hand tracking is reserved for micro-gestures only—subtle pinch, flick, and scroll bounds executable with an elbow resting on a desk. This division of labor keeps spatial interaction expressive while dramatically reducing the physical cost of continuous use.

The Lightweight Browser Sandbox

The engineering outcome is a high-fidelity, client-side spatial interaction sandbox—built on a modern Vite-driven, React 19 architecture with seamless, performance-optimized local processing pipelines.

Semantic synchronization, gesture bounds, and voice intent resolve in the browser without round-trips to heavy backend render farms—making the system deployable as a lightweight proof of a production-grade multimodal workflow.

Expanded Memory technical infrastructure — TanStack Start, LiveKit, Framer Motion, semantic vector store

Demonstrated a zero-fatigue multimodal loop: voice for intent, micro-gestures for manipulation

Shipped a browser-native spatial sandbox with sub-20ms trace ingestion and 60fps interaction targets

Established a reusable interaction design system for cinematic computing in real product contexts