Engine systems

Audio as a gameplay pillar: why Ashlight runs on FMOD instead of Unity's AudioSource

Ashlight's audio isn't decoration, it's a core gameplay pillar that has to deliver frame-aligned legendary telegraphs, an AI hearing system that reads player footsteps, and a Lucidity scheduler that fires phantom cues without ever lying to the player during real combat. Here's the middleware decision, the 10-RTPC contract that binds Unity to FMOD, and why the most interesting audio code in the game lives outside our codebase.

Arthur Dutra··10 min readShare ↗RSS

Ashlight is a horror game where audio is listed as a core gameplay pillar alongside combat, traversal, and stealth. That's a sentence games say all the time and mostly don't mean. We mean it. Here's what that commitment forces on the audio system:

  • Every legendary enemy attack broadcasts a non-diegetic stinger and a creature-specific diegetic vocalization, frame-aligned with the visual telegraph. Miss the alignment by 50ms and the player learns to mistrust audio cues for the rest of their playthrough.
  • The parry clash carries a sacred sonic signature. No other system in the game is allowed to use it.
  • Every player action produces a noise footprint the AI hearing system reads. If a footstep doesn't ping the enemy detector, that's not "subtle stealth", that's a bug.
  • A six-layer vertical adaptive score crossfades between Room Tone, Melodic Bed, Tension, Combat, Rupture, and Boss Override, beat-synced and gated by combat state.
  • And the signature mechanic: Lucidity false cues. As the player's sanity drops, the audio channel itself becomes unreliable: phantom footsteps from impossible directions, the player's own footsteps echoing with delay, reverse-reverb stings on input, brief stereo inversion. None of this is post-processing. It's runtime DSP modulated by gameplay state, scheduled by a system that knows which cues are protected and which aren't.

A code-driven AudioSource.Play() model can't deliver any of that without rebuilding a middleware in scripts. The question was never whether to use middleware. It was which one.

Unity's built-in audio stack: why it isn't on the table

I'll lead with the rejected option because the reasons map directly to features we need:

What Ashlight needsWhat AudioSource + AudioMixer give us
Designer-facing authoring toolNothing. Every variation, randomization, ramp is programmer time.
Beat-synced vertical adaptive musicNo native concept. Groups + snapshots, no layered beat-sync.
DSP-level control for the Lucidity cuesNot exposed without writing native plugins.
Bank-based streaming per biomeReimplement on top of Addressables.
Live mix tuning against a running gameNo. Tuning means rebuild.

The cost of "free" is engineering time we don't have, and an audio team blocked behind the programming queue every time they want to tune a curve. Rejected.

FMOD vs. Wwise: the honest answer

This is the harder question. Both can deliver everything Ashlight needs. The relevant axis is velocity at our team size.

  • Team familiarity. An indie-scale team should not learn a new middleware mid-preproduction unless capability forces it. Familiarity is the deciding vote when capability is roughly equivalent.
  • Authoring velocity. FMOD Studio's event-based model maps cleanly to our event taxonomy (event:/Player/Combat/Sabre/Swing_Light_01). Ship-from-authoring-to-game is minutes, not hours.
  • Licensing posture. FMOD's indie tier is the friendlier path at our current revenue stage. Wwise's free tier exists; commercial terms add friction we don't need.

If we were a 30-person studio with a dedicated audio programmer, this choice would deserve a re-review. At our size, FMOD is the right call. (A custom audio engine was mentioned only to close the door. Building a DSP graph, event scheduler, parameter system, bank streamer, and authoring tool ourselves would consume more engineering budget than the rest of the game combined. Defeats the purpose of using Unity.)

The architecture in one diagram

Audio middleware architecture

Diagram source: ShaderArchitecture.svg. Top-down data flow from Unity Game State → FMOD and Unity Bridge → FMOD Studio Events & Banks → Mix Output

State flows downward only. Unity writes parameters and triggers events; Unity never reads back from FMOD beyond playback-state queries. FMOD never reaches into Unity. Every arrow in that diagram is one direction.

This isn't a stylistic choice; it's what makes the contract layer versionable. If state flowed both ways, we'd never be able to refactor either side independently.

The 10 RTPCs are a versioned interface

The single most important architectural rule on the project: the 10 RTPCs and 5 Snapshots are a versioned contract. Adding or removing one is an architecture-level change.

RTPCRange / typeOwner (Unity)Drives (FMOD)
Player_Health0–100LifeManager.csbreath, heartbeat
Player_Will0–100WillSystem.csCenser ember, Catalisador buildup
Player_Lucidity0–100LuciditySystem.csfalse-cue scheduler, music corruption
Player_Stealthenum: Open / Concealed / DetectedStealthSystem.csplayer movement footprint mix
Player_Rhythm0–10RhythmTracker.csPerfect Rhythm undertone
Censer_Stateenum: Active / Hidden / ExtinguishedCenser.csambient bed + Censer layer
Combat_Stateenum: Exploration / Tension / Combat / Rupture / BossCombatStateMachine.csmusic-layer gating
Biome_IDenumBiomeManager.csambient + music routing
Surface_MaterialenumSurfaceDetector.csfootstep selection
Noise_Occlusion0–1SonicPresenceSystem.csmasking coefficient for self-sounds

Why this discipline matters: a single RTPC drives behavior across dozens of FMOD events. Renaming Player_Lucidity without telling the audio designer breaks the false-cue scheduler, the music corruption layer, the ambient pressure bed, and the Catalisador backlash response simultaneously. Treating these as a versioned interface (rather than ad-hoc parameter passing) is what prevents one rename from cascading into a week of debugging.

The five Snapshots follow the same rule:

Snapshot_Combat              ← Combat_State == Combat
Snapshot_Rupture             ← Combat_State == Rupture
Snapshot_Lucidity_Collapse   ← Player_Lucidity == 0
Snapshot_Cinematic           ← explicit (no clean RTPC representation)
Snapshot_Menu                ← explicit

Snapshots are activated by parameter conditions where possible; explicit C# calls only when no RTPC fits the trigger. No exceptions, in either direction.

The Lucidity false-cue scheduler lives in FMOD, not Unity

This is the part most likely to surprise programmers who haven't shipped middleware-driven audio. The horror system that picks when a phantom footstep fires from an impossible direction doesn't live in our codebase. It lives in the FMOD project.

Three FMOD mechanisms make this work:

  1. Programmer Instruments. FMOD's mechanism for events whose audio content is selected at runtime. The scheduler picks a phantom-footstep sample from a pool keyed on Surface_Material + Biome_ID, so a phantom step in a stone corridor sounds different from one in dense forest, automatically.
  2. RTPC-gated trigger probability. Player_Lucidity modulates the fire rate per Lucidity band: 0 cues/min at B0 (full lucidity), up to 3 cues/min at B4 (near collapse). All gating is parameter-driven; no C# polling, no per-frame work.
  3. Contract-protected exclusions. Some cues are system mimicry: they pretend to be real game events. These have hard interlocks against firing during real combat, gated by the Combat_State RTPC. The scheduler will literally refuse to fake a Legendary telegraph stinger when a real Legendary fight is happening, because that breaks the audio contract the player has learned to trust.

Unity's role in all of this: write Player_Lucidity and Combat_State. That's it. The audio designer owns the scheduler, the cue pools, the cooldowns, the contract-protection logic, the per-band fire rates. They can iterate on horror pacing without ever opening Unity.

This is the architectural choice I'm proudest of. Letting the audio designer own the horror system without making them write C# is the entire reason for choosing middleware.

The mix isn't a Unity concern

Three consequences of the same principle:

  1. Priority-1 events are never ducked. Legendary telegraphs, Parry Clash, Lucidity Rupture. Enforced via a "no-duck" routing flag at the bus level, not by C# logic. We never want the wrong line of code to silence a frame-critical telegraph.
  2. Sidechain rules are authored in FMOD. The Bacamarte (heavy ranged weapon) shot sidechaining the Music Bus down 6 dB for 400 ms is configured on the event itself. Unity has no code involvement.
  3. Snapshot transitions are RTPC-gated. When Combat_State flips to Rupture, Snapshot_Rupture activates automatically through FMOD's snapshot conditions. Unity doesn't call "enable rupture snapshot"; it just sets the parameter.

If Unity were calling MixerGroup.SetVolume() every time gameplay state changed, the audio designer would lose authority over their own mix. Keeping that boundary intact is what makes long-term iteration sustainable.

What it costs

The trade-offs are real and you should know them before signing on:

  • Two source-of-truth surfaces. The Unity project and the FMOD project version together but live as separate artifacts. Binary FMOD project files require lock conventions to avoid merge conflicts.
  • Onboarding cost. Audio team members need FMOD literacy. New programmers need to learn the RTPC contract and resist reaching for AudioSource.
  • Build pipeline complexity. Gains an FMOD bank-build step. Bank streaming has to be tested per platform.
  • Debugging surface widens. A bug in the false-cue scheduler is debugged in FMOD Studio's profiler, not Unity's. The audio designer's autonomy comes with the cost of cross-tool debugging.
  • Performance budget. Voice-limit groups per bus, priority eviction, bank pre-loading on biome approach: all need active management. Target is 60–120 simultaneous voices in worst-case combat on mid-range hardware.

The cost is bounded. The expressivity it buys is the design language of every Lucidity false cue, every Legendary telegraph, every Censer state transition. Worth the tooling tax.

The narrow place Unity's built-in audio still belongs

Honest answer: nowhere in shipped code. Mixing FMOD and Unity's built-in audio creates contract gaps the fairness rules can't tolerate. Unity's AudioMixer cannot sidechain an FMOD bus. Snapshot states can't unify across the two systems. Spatialization, occlusion, reverb drift apart over development. A loudness budget measured in one is invisible to the other.

The two legitimate AudioSource uses during development:

  • Editor preview of placeholder assets where attaching FMOD is overkill, since the asset will be replaced before ship.
  • Boot-time chrome if a sound is needed before FMOD banks finish loading. Currently no such case exists in Ashlight.

Both are exceptions documented in PRs, not patterns the team falls back to. The default for any new audio work is FMOD.

What I'd do again

The split. The contract. Especially the contract.

If I were starting this from scratch tomorrow, the one artifact I'd write first (before installing FMOD, before authoring a single event, before touching code) is the RTPC + Snapshot table. The hardest discipline on a project like this isn't writing the audio middleware integration. It's deciding what Unity is allowed to say to FMOD, and what FMOD is allowed to know about the game. Get that surface right and the rest of the system falls out cleanly. Get it wrong and you spend two years digging out from under cross-cutting parameters.

For anyone building a similar system: pick the middleware that lets your audio designer work without you. Define the parameters that cross the boundary, version them, and resist every temptation to peek across the boundary from either direction. Your audio team will outpace your code team by a factor of three, and your game will sound like a game whose audio is actually a gameplay pillar, because it is.