KVR Audio

JonnyBMG · Post by **JonnyBMG** » Tue Jun 16, 2026 8:36 pm

Hi all,

I've been building an effect I've wanted for years and figured this is the right crowd to talk shop with about the architecture.

The problem: masking in dense mixes is a whole-project phenomenon, but a plugin on a single track only ever sees that one track. Static EQ is blind to the rest of the session; dynamic EQ and sidechain spacing work but have to be routed by hand, pair by pair. On a 40-track orchestral session that just doesn't scale.
The approach I'm taking (Spectral Engine): one instance per track, and the instances share a picture of the whole project. Each instance publishes its short-term spectral envelope into a low-latency shared buffer; every instance then reads the combined picture and carves dynamically, only where and when instruments actually collide.

A few design decisions I've landed on:

Psychoacoustic, critical-band model rather than raw FFT-bin comparison — masking is perceptual, so the arbitration happens in a Bark-style representation.
Minimum-phase carving, deliberately. I tested linear-phase/FFT carving and the pre-ring/smearing on transient-rich acoustic material wasn't worth it. Minimum-phase keeps it clean and keeps latency sane.
Role-based arbitration — each track gets a role (Lead / Support / Background) so the system knows what should yield to what instead of two tracks ducking each other into a hole.
Targeting real-time use with many linked instances, so the shared-state path has to stay cheap and lock-light.

Where I'd love input from this forum:
How have you handled instance-to-instance communication in a host-agnostic way? I'm weighing the usual trade-offs around shared memory vs. a broker, dealing with PDC/latency differences between tracks, and instances coming and going mid-session. Curious what's bitten others here.
Status: in development, macOS VST3/AU, currently verified in Cubase, more DAWs and Windows to follow. There's a short write-up of the concept and a beta sign-up here if you want to follow along: spectralengineaudio.com

Happy to go deeper on any of the above.

mystran · Post by **mystran** » Wed Jun 17, 2026 1:06 pm

Well, since the plugin binary is normally loaded just once (even with bridges you'd usually expect instances of the same plugin to live in the same bridge process), in principle it's relatively simple to share information between instances through shared memory, but the thing to watch out is that there is really no guarantees in terms of the relative processing times and with multi-threading the order in which multiple instances are processed (perhaps concurrently) might vary from block to block.

For this reason I would not really build something that relies on one plugin instance using information from other instances within the same block. I'm not sure if there's a totally reliable way to even keep track of which block you've processed in which instance in order to align things. Statistics (or just visual feedback, etc) is probably fine with this method, but for anything that requires any kind of latency considerations it's going to be problematic.

Side-chaining in general would be a lot more predictable, but setting it up can indeed be anything slightly tedious to completely impossible depending on host.

JonnyBMG · Post by **JonnyBMG** » Wed Jun 17, 2026 3:14 pm

mystran wrote: Wed Jun 17, 2026 1:06 pm Well, since the plugin binary is normally loaded just once (even with bridges you'd usually expect instances of the same plugin to live in the same bridge process), in principle it's relatively simple to share information between instances through shared memory, but the thing to watch out is that there is really no guarantees in terms of the relative processing times and with multi-threading the order in which multiple instances are processed (perhaps concurrently) might vary from block to block.

For this reason I would not really build something that relies on one plugin instance using information from other instances within the same block. I'm not sure if there's a totally reliable way to even keep track of which block you've processed in which instance in order to align things. Statistics (or just visual feedback, etc) is probably fine with this method, but for anything that requires any kind of latency considerations it's going to be problematic.

Side-chaining in general would be a lot more predictable, but setting it up can indeed be anything slightly tedious to completely impossible depending on host.

Thanks mystran,
that's exactly the constraint the whole thing is built around, and you've articulated it better than I did in the first post.

You're completely right that you can't rely on one instance consuming another's data within the same block, and that with multi-threaded hosts the processing order (and concurrency) is unpredictable block to block. So I deliberately don't do that. There's no "align the blocks" bookkeeping, because the arbitration never needs sample alignment.

The shared state is a slow, control-rate picture, not a per-block signal path. Each instance publishes a short-term spectral envelope; a background worker on each instance reads the combined picture at ~20–40 ms and computes the carve, which the audio thread then applies with attack/release smoothing on the order of tens to hundreds of ms (minimum-phase). Masking is a perceptual, relatively stationary property on that timescale — so per-block ordering, concurrency, even an occasional stale or skipped frame all sit comfortably below the time constants that actually matter. It's exactly the "statistics is probably fine" regime you described. The frames are published through a seqlock so a reader never sees a torn envelope; a few ms of staleness is a non-issue, and concurrent access is lock-free either way.

The honest flip side: this is not sample-accurate cross-track processing, and it can't be. I'm not doing phase-locked cancellation between tracks — it's perceptual de-masking at control rate. For dense acoustic/orchestral material that's the right trade; if I needed surgical, sample-locked interaction I'd be back to sidechains.

On latency/PDC, since that was the other half of your point: the analysis is taken at the pre-delay input, so for a given render every track's published envelope is time-coherent at the source (same playhead). Output alignment I leave to host PDC, and on top of that I normalize reported latency across instances — each instance pads up to the project-wide max so they all report the same number, which keeps things aligned even in hosts with weaker PDC. The control path tolerates the residual offset because it's slow.

Instances coming and going mid-session is handled with lock-free slot allocation (CAS on an allocation mask), plus a heartbeat + owner-PID liveness check so a crashed or removed instance's slot gets reclaimed automatically, an ABI version guard on the shared block, and a graceful in-process fallback if the OS refuses the shared segment (sandboxed hosts) — it degrades to "no cross-talk," never to a broken state.

And yes — sidechaining is the predictable option, but the manual N×N routing is precisely the pain I'm trying to remove; on a 40-track session it's a non-starter, which is the whole reason for going down the shared-state road in the first place. Happy to go deeper on any of it.

In Development: Cross-track de-masking: sharing spectral state between instances to unmask a whole mix