KVR Audio

FathomSynth · Post by **FathomSynth** » Wed Feb 20, 2019 5:24 pm

Actually I'll be doing GPU CUDA first, because Vector SIMD only gives you a x4 gain for most intel processors and x8 Max for some new ones.

GPU CUDA however gives you a x32, x64 or x128 gain using NVIDIA graphcis cards, and similar gains for AMD cards using CUDA equivalents.

So the CPU gain is massive using GPU, much more than SIMD.

I will probably do SIMD as well but after GPU.

zzz00m · Post by **zzz00m** » Wed Feb 20, 2019 5:52 pm

FathomSynth wrote: Wed Feb 20, 2019 5:24 pm Actually I'll be doing GPU CUDA first, because Vector SIMD only gives you a x4 gain for most intel processors and x8 Max for some new ones.

GPU CUDA however gives you a x32, x64 or x128 gain using NVIDIA graphcis cards, and similar gains for AMD cards using CUDA equivalents.

So the CPU gain is massive using GPU, much more than SIMD.

I will probably do SIMD as well but after GPU.

Thanks for the info!

Adding a GPU (and a new PSU to handle it) should be cheaper than upgrading my CPU, mobo, and memory to arrive at the current Intel vector support level.

But I have tried to avoid discrete GPUs on a DAW because (1) unnecessary if the integrated GPU can get the job done (2) extra expense in DAW budget, rather have extra SSD (3) GPUs and their drivers can be an additional point of failure to troubleshoot (4) additional power draw and heat output (5) additional fan noise if not the silent type.

FathomSynth · Post by **FathomSynth** » Wed Feb 20, 2019 6:07 pm

I really want to do both for those reasons. But most people have an NVIDIA graphics card already, and if they don't I'll still be doing the equivalent for any card. Graphics cards with GPU processing have been around for ages so the chances of anyone having a card that is so old the GPU processing can't be sent to it is about as high as someone having a 32 bit system.

I wasn't sure if I wanted to get this deep into it, but the real reason is the instruction set. SIMD is not really threading, it's just a wider register, so you can't increase the processing power unless you have operations where all indexes into buffers are identical across the 4 threads. I tried to convert my code to SIMD before but I discovered the hard way that it can't be done (at least I couldn't) because when you build waveforms from buffers using sin waves the indexes for each thread is different, which makes SIMD useless. I know there are synths such as Sylenth which us SIMD, but I have no idea how they did it.

If any of you know a developer who got SIMD working for a synth engine and they are willing to give away the secret then I'll be happy to add the SIMD first. Otherwise I'll be doing the GPU method first.

With GPU processing you not only have much greater multiplier, but you have a wider instruction set which means (I think) that you can do indexed memory access which is what is needed to build the waveform buffers from the sin buffers.

The x4 advantage of SIMD is useful but its not so much that it will change the basic power of the synth, it will just increase the polyphony. However, the multiply factor is so huge with GPU like x128 or more that it actually changes what can be done such as real time partial noise and real spectral morphing. Plus I don't know of any synths on the market which do it yet, so if Fathom has GPU it will literally be the fastest synth on the market by several factors.

sbmongoose · Post by **sbmongoose** » Thu Feb 21, 2019 11:33 pm

Hi Everett,

Looking forward to the upcoming, erm, "cover"age? Now I'm not all that used to modular synths, so I'm just as likely to plug things together that will sound aweful, or at least maybe create feedback. Does Fathom have a way to have a maximum limit on volume and/or pitch,so that I don't cause mega clipping or noise or crazy high pitches accidentally?

FathomSynth · Post by **FathomSynth** » Fri Feb 22, 2019 2:07 am

Fathom has both connection feedback safeguards and amplitude safeguards.

Fathom will not allow you to connect components in any way which would create a feedback loop, if you try the connection will simply snap off.

Volume limits are implemented in the code at every possible level.

All filters have input and output checks which catch bad float values and out of range amplitudes.
All signal flow objects have a surge protector on the output which catches any amplitude out of range.
The instrument as a whole has a surge protector on the master output to the host.
All digital delays check for and prevent any potential feedback loops.

It's possible to produce some harsh sounds if you do it intentionally using either the distortion page or FM, but it is highly unlikely that you would do it accidentally. All dials in all objects are set with initial default values which should be pleasing to sensitive ears mine included.

zzz00m · Post by **zzz00m** » Fri Feb 22, 2019 2:18 am

But does it go to "11"???

FathomSynth · Post by **FathomSynth** » Fri Feb 22, 2019 4:42 am

dial 11.png

zzz00m · Post by **zzz00m** » Fri Feb 22, 2019 5:36 am

Preview of the new skin?

GearNostalgia · Post by **GearNostalgia** » Fri Feb 22, 2019 5:44 am

I love the GPU idea! I have an 1080card in my PC that is idling when I am not gaming. Would be awesome to use it in music making. How much extra performance can I expect form it in Fathom?

FathomSynth wrote: Wed Feb 20, 2019 6:07 pm I really want to do both for those reasons. But most people have an NVIDIA graphics card already, and if they don't I'll still be doing the equivalent for any card. Graphics cards with GPU processing have been around for ages so the chances of anyone having a card that is so old the GPU processing can't be sent to it is about as high as someone having a 32 bit system.

I wasn't sure if I wanted to get this deep into it, but the real reason is the instruction set. SIMD is not really threading, it's just a wider register, so you can't increase the processing power unless you have operations where all indexes into buffers are identical across the 4 threads. I tried to convert my code to SIMD before but I discovered the hard way that it can't be done (at least I couldn't) because when you build waveforms from buffers using sin waves the indexes for each thread is different, which makes SIMD useless. I know there are synths such as Sylenth which us SIMD, but I have no idea how they did it.

If any of you know a developer who got SIMD working for a synth engine and they are willing to give away the secret then I'll be happy to add the SIMD first. Otherwise I'll be doing the GPU method first.

With GPU processing you not only have much greater multiplier, but you have a wider instruction set which means (I think) that you can do indexed memory access which is what is needed to build the waveform buffers from the sin buffers.

The x4 advantage of SIMD is useful but its not so much that it will change the basic power of the synth, it will just increase the polyphony. However, the multiply factor is so huge with GPU like x128 or more that it actually changes what can be done such as real time partial noise and real spectral morphing. Plus I don't know of any synths on the market which do it yet, so if Fathom has GPU it will literally be the fastest synth on the market by several factors.

elassi · Post by **elassi** » Fri Feb 22, 2019 8:28 am

BlitBit · Post by **BlitBit** » Fri Feb 22, 2019 5:30 pm

FathomSynth wrote: Wed Feb 20, 2019 6:07 pm The x4 advantage of SIMD is useful but its not so much that it will change the basic power of the synth, it will just increase the polyphony. However, the multiply factor is so huge with GPU like x128 or more that it actually changes what can be done such as real time partial noise and real spectral morphing. Plus I don't know of any synths on the market which do it yet, so if Fathom has GPU it will literally be the fastest synth on the market by several factors.

I don't want to be a party pooper but I think the reason that you don't see many audio plugins using the GPU for processing is that doing so would introduce further (uncertain) latencies. If you want to process audio on the GPU the audio data has to be transferred to GPU memory which takes time. Once it has been processed on the GPU it has to be transferred back to memory that's accessible by the CPU so that it can be finally copied into the output buffers.

Some other interesting arguments against this approach can be found in this comment:

The effort involved in using a GPU - particularly getting data in and out - is likely to far exceed any benefit you get. Furthermore, the capabilities of inexpensive personal computers - and also tablets and mobile devices - are more than enough for many digital audio applications AMD seem to have a solution looking for a problem. For sure, the existing music and digital audio software industry is not about to start producing software that only targets a limited sub-set of hardware.

I'm still keeping my fingers crossed that SOUL will deliver what it promises:

Good luck for your endeavors!

FathomSynth · Post by **FathomSynth** » Fri Feb 22, 2019 8:36 pm

Good question about the GPU latency and I knew eventually a technically advanced user would notice this.

In my own humble opinion the latency problem between CPU and GPU is a good thing for Fathom because it will, as you have pointed out, drive most plugin developers from attempting it. Which is great for Fathom because if my plan works it will make Fathom the only synth which uses it, and therefore easily the fastest synth on the planet.

To understand how I will solve this problem you need to know a little about my background. I started out in embedded firmware not application software. Most of the work I did early in my career was for communication interfaces and most of the products I worked on had extreme throughput requirements using relatively slow processors which required heavy use of threads and queues.

For instance at one company I worked for there was a terrible communication problem where the product was dropping packets from com because the processor could not keep up with the interface. Three or four senior level application developers tried to solve it and gave up recommending that the product be redesigned with a faster CPU. When I took my turn on the problem it took me less than a day to add a simple message queue to the input and the problem was solved. The reason it worked is because the CPU was too slow only for very limited time periods but more than fast enough on average. The CPU to GPU interface has very similar properties.

The CPU to GPU latency problem (I think) is a solvable in the case of audio for the following reasons.

1. The requirements for throughput and bandwidth are extreme but the requirements for synchronization are not as extreme. This means that if the communication interface to transfer blocks between the CPU and GPU can itself be queued and also can use multiple parallel buffers then it could work. For instance if only two buffers are used then one buffer can be processed while the other buffer is swapping data.

2. It's possible to pin memory between the CPU and GPU which speeds up the transfer by a massive amount.

3. Common sense. Gamers have notoriously fast reflexes and extreme requirements for graphics speed and tactile response. The fact that the GPU can keep up with CPU data during the gaming experience means that common sense would dictate that audio requirements, which are on the same order of magnitude, should at least in theory be manageable. One might say that the bandwidth requirements for the data are worse for audio, but this is not the case since the gamer's world defined by massive 3D latices must also be transferred to the GPU in real time and therefore it is reasonable to assume that the bandwidth for audio being only one dimensional on the time axis should if anything be far less.

4. Knowledge of exactly what must be done. The whole point of offloading data from the CPU to the GPU in the first place is basically to render the waveforms from a massive array of partial amplitudes. And the whole reason for this is modulation. The final goal is smooth audio. Let's take a worse case scenario assume we are using multiple transfer buffers and that the total latency is an extreme case of say 10 milliseconds. If certain modulation points such as the hard edge of an envelope were delayed by 10 milliseconds this would be the equivalent of a 128'th note at 120 BPM. Not exactly the end of the world.

5. The whole problem assumes that nothing can be pre-computed which is not the case. The plugin has apriori knowledge of what waveforrms are needed by the modulation data. This fact means that the GPU can be used ahead of time when possible and is not necessarily limited to real-time when the host transport is playing. Furthermore, only the partial arrays themselves need to transferred in real time, the sin tables only need to be transferred once.

6. The problem is a time shift problem not a cumulative bandwidth problem. The limit invoked by the latency is constant not cumulative due to the fact that there is no limit to the size of the buffers between the CPU and GPU. Keep in mind that the GPU is a massive parallel processor and this fact applies also to the transfer buffers themselves.

Also it's interesting to note that Vector SIMD on the CPU side may not be useful for computing waveforms from partials due to indexing problems, however, SIMD is perfectly suited for filling the GPU buffers on the CPU side. So in this case CUDA and SIMD could actually be used in correlation.

willsavino · Post by **willsavino** » Fri Feb 22, 2019 9:03 pm

I can't be the only one who has no idea what you're really talking about anymore, but I've got a big ol' GPU and a lamentably old i7. If you can lighten the CPU load, I know I'll be able to get a lot more out of this powerful synth. Thanks Everett!

GearNostalgia · Post by **GearNostalgia** » Fri Feb 22, 2019 11:22 pm

Even if using GPU would come with a "latency tax" it would be awesome if it could increase the number of instances and voices of Fathom I can use. If there is a setting to "engage latency GPU mode" that could mean that I could use that for tracks in song playback mode and set other tracks to delay a few milliseconds to syncronize. That would kill the limit of tracks of fathom I can use with my CPU. Keep going! Will definitely upgrade to that version.

Sinthoid · Post by **Sinthoid** » Sat Feb 23, 2019 12:34 am

FathomSynth wrote: Fri Feb 22, 2019 8:36 pm Good question about the GPU latency and I knew eventually a technically advanced user would notice this.

This comment just broke my brain in the best possible way. We really need more software developers like you in the world of audio and synthesis.

Fathom Synth Development Thread