References for how voice / note allocation systems work from first principles?

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Does anyone know of a reference for how voice / note allocation systems work from first principles? In Miller Puckette's book there are two tantalizing sections, but they don't go super deep.

http://msp.ucsd.edu/techniques/v0.11/bo ... ode65.html
http://msp.ucsd.edu/techniques/v0.11/bo ... ode66.html

There's a section in Cipriani's Electronic Music and Sound Design Volume 2, but it's more of a discussion of the the Max/MSP poly~ object, which is not what I want.

I'm looking specifically for details on how they work, ideas on how to build them from scratch, the algorithms and data structures under the hood. How note protection and voice stealing works and how you could implement them in code, pseudocode, Max or PD. (I will be building something in RNBO, the newest Max/MSP platform, but I don't expect to find anything out there specific for that. Gen is also an alternative.)

I.E., First note priority, last note priority, highest note priority, lowest not priority, arbitrary note stealing, quietest note stealing, closest distance to new note stealing, etc

Post

For each strategy (you listed quite a few) there are numerous ways to implement it. The strategy needs data for decisions, so that needs to be available. The algorithms and required data structures themselves will evolve naturally once you start developing. So me thinks this is a case of "just do it".
:shrug:

NB: maybe start with building a voice allocation method for a mono synth, then expand from there. The reverse (reducing a good poly method to mono) seems more difficult imho.
We are the KVR collective. Resistance is futile. You will be assimilated. Image
My MusicCalc is served over https!!

Post


Post

BertKoor wrote: Tue Feb 07, 2023 7:17 am NB: maybe start with building a voice allocation method for a mono synth, then expand from there. The reverse (reducing a good poly method to mono) seems more difficult imho.
I'd treat mono vs. poly allocation as two completely separate problems.

For mono alloc a FIFO will give you first-note priority and a stack will give you last-note priority. An array of flags representing the currently held keys can give you lowest/highest note priority depending which direction you scan from. Don't fall into the "asymptotic complexity" trap here, these structures are small and can all be stored as arrays of integers just fine. While doing a scan with compact to cleanup dead notes from a last-note-priority stack when you see a note-off might sound very scary, it's not nearly as scary when you consider the average stacksize is practice is usually something like 1-3 and even the absolute worst case (all keys held) is only 128.

As for polyphonic alloc, usually your goal on note-on is to find the "best" voice to allocate and the obvious way to do this is to loop over the voices, score each one and then choose this as the current best candidate if the score is better than whatever was the previous best choice. You can early out if a voice is completely unallocated. Your voice allocation scheme then becomes a matter of choosing the heuristics on which you score. Probably the most simple heuristic that gives reasonable results (and doesn't care about what your actual synth looks like at all) is to just measure the time since the last note-on/note-off for each voice and pick the one with the longest note-off time or if all the keys are held, then pick the one with the longest note-on time. You can choose more involved heuristics though.

Post

I use an integer counter 'voice age'. Once a new voice is triggered/allocated the age-counter is set to 0 and status is set to A (attack). The age-counter is incremented with every tick of the modulation interrupt.

possible voice statuses of the ADSR envelopes are:
A
D
S
R
OFF (when release stage has already passed and it is not longer playing)

1) First i search all voices if there is one with OFF. If yes then I return this voice number.
2) If all voices are in use, I return the oldest one (=biggest age counter value) with status R.
3) If there is none with status R, I return the oldest one with status S.
4) If there is none with status S, I return the oldest one with status D.
5) If there is none with status D, I return the oldest one.

For cases 2-5 you might need some addition 'click removal code'. A common practice on trackers back in the days was add a ramp.
You can also quickly fade-out voices that need to be killed.

Post

Markus Krause wrote: Tue Feb 07, 2023 1:46 pm I use an integer counter 'voice age'. Once a new voice is triggered/allocated the age-counter is set to 0 and status is set to A (attack). The age-counter is incremented with every tick of the modulation interrupt.
I use basically the same strategy, except rather than tracking ADSR stages I just track states as [KEY,PEDAL,RELEASED,OFF] where "PEDAL" means that a note-off was received, but hold pedal is keeping the note from transitioning into release. I suppose this is somewhat of a subjective preference, but I find personally that it feels more "predictable" playing live if pedal-held voices are stolen with preference over those held explicitly with a key (eg. the obvious case would be holding a bass drone and then exceeding voice count with pedal held playing chords higher up).

Post

Thank you, I'll definitely read through these.

Post

mystran wrote: Tue Feb 07, 2023 2:09 pm
Markus Krause wrote: Tue Feb 07, 2023 1:46 pm I use an integer counter 'voice age'. Once a new voice is triggered/allocated the age-counter is set to 0 and status is set to A (attack). The age-counter is incremented with every tick of the modulation interrupt.
I use basically the same strategy, except rather than tracking ADSR stages I just track states as [KEY,PEDAL,RELEASED,OFF] where "PEDAL" means that a note-off was received, but hold pedal is keeping the note from transitioning into release. I suppose this is somewhat of a subjective preference, but I find personally that it feels more "predictable" playing live if pedal-held voices are stolen with preference over those held explicitly with a key (eg. the obvious case would be holding a bass drone and then exceeding voice count with pedal held playing chords higher up).
This strategy seems less interdependent on the envelope generator than the other method suggested. In general, it seems that EGs tend to have an output when all the stages have completed, which is for sure the case in Max.

Post

For the sake of documenting my path, this SOS article is really informative regarding note priorities on a mono synth:

https://www.soundonsound.com/techniques ... s-triggers

Post

mystran wrote: Tue Feb 07, 2023 7:56 am
BertKoor wrote: Tue Feb 07, 2023 7:17 am NB: maybe start with building a voice allocation method for a mono synth, then expand from there. The reverse (reducing a good poly method to mono) seems more difficult imho.
I'd treat mono vs. poly allocation as two completely separate problems.

For mono alloc a FIFO will give you first-note priority and a stack will give you last-note priority. An array of flags representing the currently held keys can give you lowest/highest note priority depending which direction you scan from. Don't fall into the "asymptotic complexity" trap here, these structures are small and can all be stored as arrays of integers just fine. While doing a scan with compact to cleanup dead notes from a last-note-priority stack when you see a note-off might sound very scary, it's not nearly as scary when you consider the average stacksize is practice is usually something like 1-3 and even the absolute worst case (all keys held) is only 128.

As for polyphonic alloc, usually your goal on note-on is to find the "best" voice to allocate and the obvious way to do this is to loop over the voices, score each one and then choose this as the current best candidate if the score is better than whatever was the previous best choice. You can early out if a voice is completely unallocated. Your voice allocation scheme then becomes a matter of choosing the heuristics on which you score. Probably the most simple heuristic that gives reasonable results (and doesn't care about what your actual synth looks like at all) is to just measure the time since the last note-on/note-off for each voice and pick the one with the longest note-off time or if all the keys are held, then pick the one with the longest note-on time. You can choose more involved heuristics though.
If I wanted it to be a switchable system to go from mono to duo to poly, would you set up three different trees and switch between them or would you build one tree and put all the options into it?

Post

Markus Krause wrote: Tue Feb 07, 2023 1:46 pm I use an integer counter 'voice age'. Once a new voice is triggered/allocated the age-counter is set to 0 and status is set to A (attack). The age-counter is incremented with every tick of the modulation interrupt.

possible voice statuses of the ADSR envelopes are:
A
D
S
R
OFF (when release stage has already passed and it is not longer playing)

1) First i search all voices if there is one with OFF. If yes then I return this voice number.
2) If all voices are in use, I return the oldest one (=biggest age counter value) with status R.
3) If there is none with status R, I return the oldest one with status S.
4) If there is none with status S, I return the oldest one with status D.
5) If there is none with status D, I return the oldest one.

For cases 2-5 you might need some addition 'click removal code'. A common practice on trackers back in the days was add a ramp.
You can also quickly fade-out voices that need to be killed.
I mentioned in the other comment that this seems to strongly link the voice allocation to the envelope generator. I'm curious if it comes down to one of those things where ideally you could keep them separate, but inevitably I'll have to end up building my own ADSR as well to include the functionality I want to make this work?

Post

You can simplify the logic by looking just into the NoteOn/Off states:

Voice NoteOn (or Gate High)
Voice NoteOff (or Gate Low)
Voice idle

So you'd

1) first search any voice that is idle
2) if none is idle, search the one has had a NoteOff issued longest time ago
3) if none has had a NoteOff issued, search the one that had a NoteOff issued longest time ago

This way you can forgo knowledge of envelope states.

Post

You may want flags to track highest_note and lowest_note, and make them exempt from note stealing, since these are typically the most important notes, and the notes we're most sensitive to. You can kill inner voices with much less consequence than outer voices.

You can take this even further:
(While still keeping the NoteOn > NoteOff > idle priority), rather than simply killing the note with the oldest NoteOff or NoteOn time, create a "proximity_score" for each active note by adding the distances of the closest active note above and below, and kill the note with the lowest score (still keeping highest and lowest notes exempt.) If if there is a tie for lowest proximity score, kill the one closest to the median pitch. And if there are two notes equally close to the median, kill the higher of the two (to best maintain chord weight.)

This method contextualizes the notes and prioritizes them for weight and density. Also, it doesn't require tracking the age of any notes.

Another method for minimizing intrusiveness would be to simply steal from the note with the lowest volume level. That wouldn't be so good with a slow attack, but you could resolve it with Markus's envelope tracking method. (You would only need to track the attack stage and give those notes priority.)
THIS MUSIC HAS BEEN MIXED TO BE PLAYED LOUD SO TURN IT UP

Post

jamcat wrote: Tue Feb 07, 2023 10:06 pm Another method for minimizing intrusiveness would be to simply steal from the note with the lowest volume level. That wouldn't be so good with a slow attack, but you could resolve it with Markus's envelope tracking method. (You would only need to track the attack stage and give those notes priority.)
Stealing the oldest one that is in release/decay in practice also results in stealing the one with the lowest volume as they ramp downwards (if you use ADSRs).
It is also a method that sounds predictable to the listener which is a point that also should be considered.

I got a question:
How do you guys handle 'click removal' when there are no free voices?

Post

Markus Krause wrote: Wed Feb 08, 2023 6:06 am Stealing the oldest one that is in release/decay in practice also results in stealing the one with the lowest volume as they ramp downwards (if you use ADSRs).
Not necessarily, if the voice is very dynamic and the performance has a broad range of velocities. Also, you still need to track the age of every note until the end if you are killing the oldest.
THIS MUSIC HAS BEEN MIXED TO BE PLAYED LOUD SO TURN IT UP

Post Reply

Return to “DSP and Plugin Development”