KVR Audio

JustinJ · Post by **JustinJ** » Sat Sep 24, 2022 11:22 pm

2DaT wrote: ↑Sat Sep 24, 2022 8:33 pm One solution would be to process extra "fake" elements. Best choice would be to copy a valid state from the voice that is being processed. That way you avoid convergence problems and in case of gathers, it will access "hot" memory.

Ok, this makes sense. I was figuring I'd have to keep the scalar functions around to process remainders or I'd have to mask out the unused lanes somehow.

I've just tried some quick tests with this on my scalar vs SSE wavetable testbed. The scalar version still beats the SSE version when there's just one remainder, but anything after that is a win.

Currently, I've got an array of active voice indexes that I'm using to gather the data for processing. All I have to do is make sure the used parts of this array are a multiple of four and then any remainder voices point to, say, the last active voice. It's then possible to always just process four at a time without any special logic.

Well, except each of my voices can have additional phasors for super-saw style processing and super-saw parameters can be modulated, i.e. change per voice. That'll take some figuring out.

Thanks for the pointer.

Richard_Synapse · Post by **Richard_Synapse** » Mon Sep 26, 2022 9:01 am

mystran wrote: ↑Wed Sep 14, 2022 9:05 pm I haven't personally noticed anything too significantly running my code on M1 even though my code isn't really aware of efficiency cores, I think I might be creating threads for those too, though perhaps the macOS scheduler is intelligent enough not to schedule CPU-heavy real-time threads on those(?), so can't really share anything too useful about that... but yeah, the efficiency cores might be challenging to use for DPS purposes.

Just creating threads does not work properly on M1 unfortunately, whether real-time or not. The proper way seems to be to get the workgroup from the main RT thread or in some other way, then in the auxiliary threads, join() then leave() that workgroup. Note that this workgroup feature was added in macOS 11, so backwards compatibility is something to watch out for as well.

Richard

mystran · Post by **mystran** » Mon Sep 26, 2022 9:15 am

Richard_Synapse wrote: ↑Mon Sep 26, 2022 9:01 am
mystran wrote: ↑Wed Sep 14, 2022 9:05 pm I haven't personally noticed anything too significantly running my code on M1 even though my code isn't really aware of efficiency cores, I think I might be creating threads for those too, though perhaps the macOS scheduler is intelligent enough not to schedule CPU-heavy real-time threads on those(?), so can't really share anything too useful about that... but yeah, the efficiency cores might be challenging to use for DPS purposes.
Just creating threads does not work properly on M1 unfortunately, whether real-time or not. The proper way seems to be to get the workgroup from the main RT thread or in some other way, then in the auxiliary threads, join() then leave() that workgroup. Note that this workgroup feature was added in macOS 11, so backwards compatibility is something to watch out for as well.

I've been just using pthreads for creation and Mach thread policy to bump them to realtime, but I admit I haven't really stress-tested any of it excessively. I didn't do anything with the code when I recompiled for M1.

Richard_Synapse · Post by **Richard_Synapse** » Mon Sep 26, 2022 9:22 am

mystran wrote: ↑Mon Sep 26, 2022 9:15 am I've been just using pthreads for creation and Mach thread policy to bump them to realtime, but I admit I haven't really stress-tested any of it excessively.

Do you have the "old" M1 or one of the new machines? It seems the new ones are specifically problematic.

Btw the problem seems to be unrelated to the new efficiency cores (if I measured that correctly), rather it seems to be related to scheduling somehow.

Richard

mystran · Post by **mystran** » Mon Sep 26, 2022 9:30 am

Richard_Synapse wrote: ↑Mon Sep 26, 2022 9:22 am
mystran wrote: ↑Mon Sep 26, 2022 9:15 am I've been just using pthreads for creation and Mach thread policy to bump them to realtime, but I admit I haven't really stress-tested any of it excessively.
Do you have the "old" M1 or one of the new machines? It seems the new ones are specifically problematic.

M1 Air 2020 model (bought recently though, so no idea about revisions), running macOS 12.4.

Modern synth architecture SIMD + Multithreading