I've been building an effect I've wanted for years and figured this is the right crowd to talk shop with about the architecture.
The problem: masking in dense mixes is a whole-project phenomenon, but a plugin on a single track only ever sees that one track. Static EQ is blind to the rest of the session; dynamic EQ and sidechain spacing work but have to be routed by hand, pair by pair. On a 40-track orchestral session that just doesn't scale.
The approach I'm taking (Spectral Engine): one instance per track, and the instances share a picture of the whole project. Each instance publishes its short-term spectral envelope into a low-latency shared buffer; every instance then reads the combined picture and carves dynamically, only where and when instruments actually collide.
A few design decisions I've landed on:
- Psychoacoustic, critical-band model rather than raw FFT-bin comparison — masking is perceptual, so the arbitration happens in a Bark-style representation.
- Minimum-phase carving, deliberately. I tested linear-phase/FFT carving and the pre-ring/smearing on transient-rich acoustic material wasn't worth it. Minimum-phase keeps it clean and keeps latency sane.
- Role-based arbitration — each track gets a role (Lead / Support / Background) so the system knows what should yield to what instead of two tracks ducking each other into a hole.
- Targeting real-time use with many linked instances, so the shared-state path has to stay cheap and lock-light.
How have you handled instance-to-instance communication in a host-agnostic way? I'm weighing the usual trade-offs around shared memory vs. a broker, dealing with PDC/latency differences between tracks, and instances coming and going mid-session. Curious what's bitten others here.
Status: in development, macOS VST3/AU, currently verified in Cubase, more DAWs and Windows to follow. There's a short write-up of the concept and a beta sign-up here if you want to follow along: spectralengineaudio.com
Happy to go deeper on any of the above.
