Subly / 2025–2026 / AI Product Design

Designing an AI-Orchestrated Audio Description System

Shipped 2 versions in 8 sprints as sole designer. Designed the full AI pipeline, including all prompt engineering, and led the shift from script-first control to automation-first delivery.

Company
Subly
Role
Senior Product Designer
Scope
UX, Prompt Engineering, AI Pipeline Design
Duration
Nov 2025 – Feb 2026 · 8 sprints

Enterprise deals were stalling on one missing feature.

AI-powered Audio Description was the last piece of Subly's accessibility suite, and enterprise deals were stalling without it. As sole designer and product lead, I owned it end-to-end: problem definition, AI pipeline and prompt architecture, UX, and shipping two complete versions in 8 sprints.

Subly's full accessibility suite showing where Audio Description fits within the product ecosystem

Subly's accessibility suite. Audio Description was the last missing piece for enterprise compliance.

System architecture diagram showing the five-stage audio description generation pipeline, from video upload through LLM multi-step orchestration to final output

The full pipeline I designed: from video upload through LLM orchestration to final audio-described output

Automate first. Give control when needed, not the other way around.

Users can't judge AI output they haven't experienced.

Version 1 asked users to edit the AI script before hearing it. In testing, nobody could judge script quality without audio context. Version 2 flipped the model: deliver the complete audio-described video first, then expose the script for optional editing.

Before

V1: Script First

Users had to review before output. Script quality was impossible to judge without audio context.

  1. Generate script
  2. Review & edit
  3. Generate voice
  4. Embed
After

V2: Automation First

Output first. Script exposed for transparency. Regeneration if needed.

  1. Generate full audio-described video
  2. Expose script for transparency
  3. Allow regeneration if needed
Version 1 script-first interface showing 3-step flow: generate script, review and edit, generate voice

V1: Script-first. Users must review the AI script before hearing any output

Version 2 automation-first interface with Standard or Extended AD selection and one-click generation

V2: Automation-first. Choose type, add context, generate. Script exposed after for transparency

Not a feature. An orchestration layer.

Audio description wasn't built as a standalone product. I designed it as an orchestration layer on top of Subly's existing transcription infrastructure. Four stages from video upload to final output.

01 Video upload → automatic transcription
02 Frame extraction for visual context
03 Three-stage LLM orchestration (context → generate → refine)
04 TTS rendering → audio mixing → embedding

I designed the prompts. Not just the interface.

As sole designer, I owned the entire prompt architecture behind the audio description pipeline. This wasn't prompt tweaking. It was structural design that directly shaped output quality and reliability.

01 Context stage: Feeds the LLM the transcript, extracted frames, and explicit constraints ("only describe what is visually present")
02 Generation stage: Produces timed audio descriptions fitted within natural speech gaps in the video
03 Refinement stage: Cross-references generated descriptions against source material to catch hallucinations before output

The first iteration used a two-step approach that hallucinated heavily. The AI described objects and actions that weren't in the video. Restructuring into three stages with explicit guardrails at each step reduced hallucinations significantly, making output reliable enough for enterprise accessibility compliance.

Three things I'll carry to the next AI product.

Presentation order shapes perceived quality. Showing output first made users trust the AI more, even when the underlying model was the same.
Prompt architecture is UX architecture. How you structure the AI pipeline directly determines what the user experiences.
In regulated enterprise contexts, design for trust first. Transparency and control aren't features. They're the foundation.