RESEARCH
WE PUBLISH WHAT WE SHIP.
Research stays public when it settles an architecture decision we then deploy — a routing pattern, an evaluation method, a reliability tradeoff. Each piece carries a real result.
THE WORK
RESEARCH WITH A RESULT ATTACHED.
Two papers, two production decisions. Each one led to a routing or evaluation pattern we now run for clients — with the number it moved.
Multi-Agent AI
16 min · 2025
Multi-agent calendar intelligence: hybrid LLM + CP-SAT for executive scheduling
29.4%
cost reduction
Executive calendar management represents a constraint satisfaction problem characterized by high dimensionality, conflicting objectives, and dynamic updates. Traditional LLMs fail on complex scheduling (0.6% success on TravelPlanner). This research introduces the Cognitive Temporal Orchestration (CTO) framework—a hybrid architecture integrating heterogeneous LLM orchestration (GPT-5, Gemini 3 Pro, Claude Sonnet 4.5) with CP-SAT constraint programming. Through 81 test scenarios, we demonstrate 100% orchestration success, 100% high-value event identification, and 29.4% cost reduction. Critical analysis reveals 99% of latency originates from LLM inference, fundamentally informing optimization strategies. We validate three cognitive modules establishing a methodology for evaluating evolution from reactive assistants to proactive wealth management systems.
Why it matters: route reasoning across models and let a solver own the constraints — cheaper than one big model, and the schedule actually holds.
Read the paperClinical AI
10 min · 2025
Comparative LLM analysis for clinical decision support: routing across Gemini-3-Pro and GPT-5.1
94.7%
system reliability
This comprehensive evaluation of the Vitruviana Hybrid AI Architecture for clinical decision support analyzes model selection patterns, service integration, and clinical outcomes across 100+ automated tests. The hybrid architecture achieved 94.7% system reliability with intelligent task routing, demonstrating 100% optimal routing decisions and directing complex clinical reasoning to Gemini 3 Pro (67% of tasks) and structured tasks to GPT-5.1 (33% of tasks).
Why it matters: pick the model per task instead of standardizing on one — you get higher reliability without paying for the frontier model on every call.
Read the paperWANT THIS RUNNING IN YOUR OPS? START WITH THE AUDIT.
THE PATTERN ON THIS PAGE, POINTED AT YOUR WORKFLOW.
These papers became routing and evaluation patterns we now run in production. Bring the workflow you'd want them applied to — we map which pattern fits, what it moves, and the costed plan to ship it. The audit credits toward the build.