KVR Audio

Basilbasil · Post by **Basilbasil** » Thu May 31, 2018 5:24 pm

Greetings!

We are BRAINGINES - a startup company who made an early prototype of GPU AUDIO technology.

It works on your laptop with 0% CPU usage using GPU as a primary source of power (which is significantly much more than DSP or CPU has), it's scalable, cheap, laptop-thin and has lots of additional features.

We want to share our showcases with people who might be interested in such a technology coming to the market using this section. Someday we will open beta version but for now, all we can say it is by the end of 2018.

The first post we want to make using this section is a low latency showcase (96 kHz \ 96 samples and 96 threads used to process within a 1 ms input buffer) with 0% CPU usage and eGPU power scaling feature. Here (https://youtu.be/phuQKMBKrIw) you can see new Big Things.

Cheers,
BRAINGINES team

bmanic · Post by **bmanic** » Thu May 31, 2018 7:58 pm

Yawn.. no offense but that "demo" was not at all impressive. On a modern 2017 laptop with an i7, running a simple flanger, delay and chorus with some filtering will literally show almost 0% CPU usage in Reaper (using windows taskmanager).

If you want to truly show that your thing is working and not just vapor ware, then show a HUGE project with 40 tracks each with 10 or so plugins.

Also.. going the GPU route means you are the ones that need to make the plugins too, right? The whole point of native processing and why it's so attractive is because there are literally tens of thousands of plugins to choose from. Proprietary systems always fail due to not being able to compete with this... unless you are Universal Audio and can ride the "we make super quality plugins!" bandwagon.

Lesha · Post by **Lesha** » Thu May 31, 2018 11:19 pm

If someone could think of a way to power any VST using a GPU - that would certainly grab my attention.

resynthesis · Post by **resynthesis** » Thu May 31, 2018 11:58 pm

There have been a few GPU audio libraries but, as I remember, the bottleneck was moving data to and from the GPU. So it'll be interesting to see how a library can manage a large bunch of VSTs and VSTis (if this is the aim of the product).

Xenakios · Post by **Xenakios** » Fri Jun 01, 2018 2:42 am

Lesha wrote:If someone could think of a way to power any VST using a GPU - that would certainly grab my attention.

That is never going to happen. Plugins that have already been compiled into x86/x64 machine code can't realistically later have their DSP code separated and translated to run on a GPU. (Even if that wasn't technically nearly impossible to do, it would more importantly also be against the product licenses in 99.99% of cases.)

Xenakios · Post by **Xenakios** » Fri Jun 01, 2018 2:46 am

Basilbasil wrote: made an early prototype of GPU AUDIO technology.

Have you also come up with an actual problem that this will solve for users? Have you developed any non-trivial audio DSP algorithms that will greatly benefit from this? (One can run hundreds of eqs, compressors and delays just on native CPU power, so there's not necessarily going to be anything interesting in those running on a GPU.)

Anyway, best of luck with it!

VariKusBrainZ · Post by **VariKusBrainZ** » Fri Jun 01, 2018 12:04 pm

Did I hear correctly, 1ms latency?

Would be nice on older machines

liquidsonics · Post by **liquidsonics** » Fri Jun 01, 2018 12:45 pm

By running chains of in-house effects in a container like this there is an opportunity to avoid the GPU/CPU memory domain transfers for each link in the chain which is what often ruins performance at scale. What I see here looks immediately suited to a mix rack plugin (in the style of something like Slate Digital's VMR for instance where you can easily be linking 5 or more modules up in series) where everything can stay wholly within the GPU domain. Then it would be easier to get some benefits from moving the audio there in the first place if you're going to actually do a lot of work. Without this it's difficult to get much benefit; as has been said already most plugins use very little proportion of the CPU these days anyway. Even some of the heaviest plugins I use are really not hitting a modern CPU by more than a few percent, and most much less.

So that's the single plugin single channel concept. What about multiple plugins on multiple channels. Assuming this already works with multiple plugin instances, when you have a lot of instances of this running I would be interested to see the CPU usage of your GPU Server utility (this doesn't appear to be in view in task manager). Running a server like this is a good idea so you have centralised control over the GPU and the transfers of audio going to/from it because lots of instances of GPU enabled plugins fighting over resources is unlikely to scale well. The rest of my post assumes this is your approach as to me it seems like the obvious way to do it, sorry if it isn't.

Having an ability to batch up all the separate channels of audio sent to the server from multiple plugin instances and moving it over to the GPU in one big transfer and then dividing back out on the GPU for individual processing before batching up for the return trip is probably the best way to make the use of a GPU efficient. I could have done this in my GPU plugins, but I didn't see the point because most people don't need more than a couple of reverbs running at once so the gains are minimal when a few plugins aren't fighting for a shared resource. If you want to support 20-50-100 concurrent GPU enabled plugins scattered randomly across a session and make a proper go at this to compete with UAD/AAX-DSP as a means of moving load to the GPU then a coordinated GPU server is really the only way forwards.

It's common for demos like this to focus on what seems to be an impressive metric, but personally I'm not very interested in the CPU use of Reaper here because my guess from this demo is that you're moving audio from Reaper over to the gpu-server.exe and letting the server handle the more demanding task of coordinating multiple streams of audio to/from the GPU and then marshalling the various GPU computation activities using that. If this is the case, really all the work is being done by the server. All the plugin is doing is transferring data to and from the server, which if course is going to register in Reaper as virtually nil effort as it's just a basic memory copy with some synchronisation. The GPU drivers in the background take some of the load too (in my tests, a non-trivial amount at times). Regardless of whether my guess on your design is correct total system load is the key metric because of the system CPU load when GPUs are in use, what is showing in the DAW is region specific and can be quite misleading. Such performance assessments should be done using lots of instances of more complex plugin designs spread over many channels (larger memory and computation requirements, e.g. complex saturation, reverbs, etc) with an equivalent CPU/GPU design and honest CPU optimisations (e.g. using Intel IPP, hand-coded intrinsics, etc, as would be done with any commercial plugin release).

I think the GPU server concept provides an interesting proof of concept with the potential to scale well as you can properly control the GPU resource. I think the concern in your approach is that you're going to need to put a lot of effort into high quality plugin development if you wish to make it a first-party system with lots of plugins on the menu because otherwise I don't see the benefit. In modern times UAD appeals largely on the strength of its excellent library of plugins coded by very skilled DSP engineers, much less in its ability to offload processing to external DSP chips (as was the original appeal when its library was a lot smaller with quite basic plugins). Unless you can create plugins that simply could not be done on the CPU, or are unavailable on the CPU (and good equivalents aren't either), then the commercial viability of embarking on a platform here is questionable. Maybe that isn't your goal, but I'm just raising the point. The alternative route is the Avid AAX DSP way where they have a strong SDK that gets given to third parties to write plugins in (to some extent this applies to UAD too, but they're clearly much stricter who is invited in). If you can make a strong first party lineup, and/or a compelling 3rd party platform, then moving a lot of plugin load en-mass to the GPU via a well coordinated server could be very appealing.

I'd encourage you at a very early stage to consider macOS and any of Apple's eGPU quirks in your designs. If possible use OpenCL rather than Cuda given Apple seems to have very little interest in NVidia these days. The Mac is such an important segment of the audio market that it needs to be part of your plan at an early stage, if this turns into a great success you'll kick yourself if you fundamentally limit it to Windows and you realise your revenue could have been double.

I wish you the best of luck with the development of this as a platform, at this stage I suspect most of the engineering challenge is behind you.

skrasms · Post by **skrasms** » Thu Jun 14, 2018 4:03 am

liquidsonics wrote:If possible use OpenCL rather than Cuda...

I used CUDA for ~6 months and have recently been diving into OpenCL. They are massively different in practice due to things like vendor support. AMD hasn’t touched their OpenCL Windows support in years. They seem to have switched all their internal resources over to ROCm and HIP (their version of CUDA) that they only support on Linux. They removed most links to their Windows OpenCL stuff from their website.

Intel updates their OpenCL support regularly but also frequently breaks features in my limited experience.

Apple just announced that OpenCL is a “legacy” technology they are deprecating.

Nvidia’s official OpenCL support only goes up to version 1.2, and that excludes features that I have found to be required in order to achieve low latency.

I am just bringing this up because it sounds like you might have some firsthand experience. Is there something that I am missing about OpenCL?

liquidsonics · Post by **liquidsonics** » Thu Jun 14, 2018 9:38 am

skrasms wrote:Apple just announced that OpenCL is a “legacy” technology they are deprecating.

My thinking was along the lines of targeting something with cross-platform support so that the plugins only would need to be written once. A couple days later, Apple stuck a knife in that one for me

skrasms wrote:AMD hasn’t touched their OpenCL Windows support in years.

Disappointing. Seems like we don't have a good way to achieve cross-platform GPU plugin development then unless an SDK was written that could abstract the useful features for audio. Seems an unlikely prospect.

skrasms wrote:Nvidia’s official OpenCL support only goes up to version 1.2, and that excludes features that I have found to be required in order to achieve low latency.

I'd be interested to know what features you're thinking of. I'm not familiar enough with what's in which versions in comparison to CUDA.

skrasms wrote:I am just bringing this up because it sounds like you might have some firsthand experience.

I have a CUDA convolution plugin, written some years ago now. I was always interested in ways to improve its low latency performance. The server architecture discussed here always felt like a good way to go to minimise transfers to the GPU as the number of instances scales. That said, my interest in investing time here was always tempered by my desire to also support macOS (which was tricky as Apple moved around between GPU vendors and never supported external GPUs until recently). Porting it to OpenCL didn't ever appeal to me, and now even less so.

skrasms wrote:Is there something that I am missing about OpenCL?

No it sounds like you have more experience of OpenCL than I do.

skrasms · Post by **skrasms** » Fri Jun 15, 2018 5:21 am

liquidsonics wrote:
skrasms wrote:Nvidia’s official OpenCL support only goes up to version 1.2, and that excludes features that I have found to be required in order to achieve low latency.
I'd be interested to know what features you're thinking of. I'm not familiar enough with what's in which versions in comparison to CUDA.

In my case I am passing around small data buffers with a lot of processing, and the limiting GPU latency factor almost always ends up being the drivers. Windows graphics drivers may be fast on average, but the worst-case numbers are terrible. Something that typically takes 10us may sometimes unpredictably take 5ms just due to the driver.

The way around that is to do everything possible to avoid using the drivers at run-time. For example, have the GPU itself launch kernels so that the driver on the CPU is not involved. Another example is shared virtual memory atomics. Those allow the Host and Device to communicate asynchronously without going through the driver. With OpenCL, you need at least version 2.0 for both of those examples.

I really hoped I was missing something about OpenCL, because it would be great if there was a good cross-platform option.

liquidsonics · Post by **liquidsonics** » Fri Jun 15, 2018 7:56 am

skrasms wrote:In my case I am passing around small data buffers with a lot of processing, and the limiting GPU latency factor almost always ends up being the drivers. Windows graphics drivers may be fast on average, but the worst-case numbers are terrible. Something that typically takes 10us may sometimes unpredictably take 5ms just due to the driver.

My findings too, hence the challenges getting down to latencies in single digit ms with rock solid performance and low CPU overhead. GPU drivers are just not optimised with this use in mind. Putting everything in a server process hides all of this from the host which was the only metric given, which is why I'm interested to see the overall system load of the demonstrated technology especially when at very low GPU transfer block sizes.

Basilbasil · Post by **Basilbasil** » Fri Feb 05, 2021 7:15 pm

www.braingines.com

jens · Post by **jens** » Fri Feb 05, 2021 7:54 pm

Wrong forum? The development-section is here over:

viewforum.php?f=33

That was a couple of wasted minutes I'll never get back.

I suggest you come back here whenever you've got an actual product for the end-user.

GPU AUDIO · Post by **GPU AUDIO** » Tue Mar 22, 2022 10:14 pm

AUTO-ADMIN: Non-MP3, WAV, OGG, SoundCloud, YouTube, Vimeo, Twitter and Facebook links in this post have been protected automatically. Once the member reaches 5 posts the links will function as normal.

https://earlyaccess.gpu.audio (https://earlyaccess.gpu.audio)

GPU AUDIO Technology Thread