Convolution Reverb for NVidia and ATI GPUs - saving CPU time

VST, AU, AAX, CLAP, etc. Plugin Virtual Effects Discussion
Post Reply New Topic
RELATED
PRODUCTS

Post

jupiter8 wrote:The loading browser takes like 10 seconds to open.
The loading browser is only a system function that I call, I can't do anything here. Do you have any CD-roms inserted that are hard to read, or something else? I do nothing else than "open the file dialog"
https://k1v.nilsschneider.de - Kawai K1 emulated as VSTi/AU
https://heatvst.com - Android Synthesizer with full VST integration
https://gpuimpulsereverb.de - Use your GPU as reverberation DSP

Post

I find it is unable to load a 32-bit file saved in type 3 mode (where the internal structure is tag=3, align=4*num_channels, bits=32). An example of these files can be found at http://www.memi.de/echochamber/responses/index.html (http://www.memi.de/echochamber/response ... _32bit.zip). The internal structure of these files is such that they need no scaling, just load it right up into your float array (I guess you could probably work that out for yourself).

Matt
Matt

Pi is exactly three

Post

Weeehaaaaa ... that's just great !!!! :)

Just tested it on a GForce 8800 GT:
A 3 second impulse gave me 1% GPU-load (while about 6% on one core of an E6600 processor if used in a conventional convolution reverb).



I spotted a bug:
If I load a new impulse file, the dry and wet settings are reset (even if the faders stay on their position). Then I have to touch em short with the mouse to get the real setting back.
Edit: It's the same with the in/out settings.

I'm really looking forward to this technology. I hope there'll be a version without latency. That would be great.

I think it's great, that finally people start developing audio-effect-plugins on that plugins. I think that's the future of DSP for audio .. I hope others will follow.

Post

btw. wouldn't it be better to implement softsynths on the GPU? imho it would help to reduce the latency problem as you don't have to do the whole roundtrip for the audio signal. you would only need to send control signals which doesn't need to be sample accurate and the processing could be done eg. per voice.

i don't really have experience with GPU programming so this are just some things which would me interest.

Post

The latency problem is caused by the algorithm, not because of the data transfer. The transfer is less than 1 megabyte per second for one instance, which is nothing for the PCIe bus.

A softsynth is a problem, because it can't be computed in parallel. It would be possible for oscillators, a GPU would really excel here, but filters are a problem because sample n depends on the output of sample n-1, which can't be done on a gpu that computes every sample independent from each other.

I'll take a look at the issues that have been mentioned, thanks for reporting.

32 bit float wave files should already work, as I tested it with files that I have here.
https://k1v.nilsschneider.de - Kawai K1 emulated as VSTi/AU
https://heatvst.com - Android Synthesizer with full VST integration
https://gpuimpulsereverb.de - Use your GPU as reverberation DSP

Post

Works well for me in Tracktion, with between 1-3% on a 9600GT (which you can pick up for not much more than $100). Given how insanely powerful GPUs are, I've been wondering how long it would take for people to start looking into this. The BionicFX thing was over a year ago, after all.

Post

Nils Schneider wrote: 32 bit float wave files should already work, as I tested it with files that I have here.
There is more than 1 32-bit float format, the one where the internal audioformat tag is set to 1 work, where it is set as 3 it doesn't work, check out the files I linked.
Matt

Pi is exactly three

Post

Nils Schneider wrote:The latency problem is caused by the algorithm, not because of the data transfer. The transfer is less than 1 megabyte per second for one instance, which is nothing for the PCIe bus.

A softsynth is a problem, because it can't be computed in parallel. It would be possible for oscillators, a GPU would really excel here, but filters are a problem because sample n depends on the output of sample n-1, which can't be done on a gpu that computes every sample independent from each other.

I'll take a look at the issues that have been mentioned, thanks for reporting.

32 bit float wave files should already work, as I tested it with files that I have here.
thanks for explanation.

and for the filter, is there not a way to store the samples in the GPU RAM so you can use the sample n-1?

Post

cYrus wrote:
and for the filter, is there not a way to store the samples in the GPU RAM so you can use the sample n-1?
The problem is, that GPU derives it's huge performance from the fact that it can do a lot of calculations in parallel while a filter that needs the previous output to calculate the next output, is inherently a serial process.

If you think GPU as a large group of women, they'll be able to do a large number of babies in just 9 months.. but you still will have to wait 9 months for the first one to finish. :P

Post

mystran wrote:
cYrus wrote:
and for the filter, is there not a way to store the samples in the GPU RAM so you can use the sample n-1?
The problem is, that GPU derives it's huge performance from the fact that it can do a lot of calculations in parallel while a filter that needs the previous output to calculate the next output, is inherently a serial process.

If you think GPU as a large group of women, they'll be able to do a large number of babies in just 9 months.. but you still will have to wait 9 months for the first one to finish. :P
hehe, yeah i know. but like you said it would help to do a large number of babies (a lot of OSCs, synths, ...). so even if it takes some time (serial) to process a OSC it can handle a lot of OSCs/synths.

i think it's more likely to look at it in a real world project where you use a lot of different processing tools and even if one of them is a serial process it would be able to process a lot of those in parallel.

Post

cYrus wrote: hehe, yeah i know. but like you said it would help to do a large number of babies (a lot of OSCs, synths, ...). so even if it takes some time (serial) to process a OSC it can handle a lot of OSCs/synths.

i think it's more likely to look at it in a real world project where you use a lot of different processing tools and even if one of them is a serial process it would be able to process a lot of those in parallel.
Well I personally don't know for sure, but I am under the impression, that simply getting simple serial process run at normal audio rates in real-time on GPU could be a problem (though shaders keep getting fancier so if it isn't possible yet, maybe it is tomorrow). On the other hand, I guess one could still do crazy unison wavetables or maybe additive stuff on the GPU, and just process it with CPU filters afterward..

Post

found 2 bugs in ableton live:

1. when the plugin window is open, and when i switch to another application, i can't go back to live until i press restore on windows tab. when i close the plugin window, everything is fine.

2. when dry is at 0%, loading new impulse will give 100%. same thing happens with wet at 0%.


btw, thanks for giving us the first usable cuda tool ;)

all the best
lesha



/edit
bug nr. 2 applies to any level in between
Last edited by Lesha on Mon Aug 25, 2008 5:41 pm, edited 1 time in total.
It's easy if you know how

Post

It works without any problems under Cubase SX3 (on my 8600GTS), when I try to load it in REAPER, the host freezes. Hope it helps, keep on that good work.

Greetings
Juggernaut

Post

confirm the dry/wet fader resetting.

I get occasional massive spikes in GPU usage, well over 100%, which freeze, then drop quickly back to their correct level (i.e. 1-5%). This occurs when changing the dry-wet level. Using IR's from Kontakt (some short, some long, some 16bit, some 24bit - in other words no clear pattern), but I will test with some free IR's and return if I can confirm.
This could of course easily be specific to my system, so don't worry about it until other's confirm.

For those getting the "no CUDA capable device" message, you may simply have Forceware drivers which are not CUDA enabled. I would suggest you try those found at Laptopvideo2go
and read the instructions on replacing the .inf before installing. I am using these and did not need to install the CUDA kit.

Post

Great job Nils! Thank you :)
I have started 4 copies of a plug-in on my 8400 GS and all works! In my host (Cubase) at closing and opening of the interface of a plug-in the information on the name of the loaded impulse and on loading GPU ceases to be displayed. Besides I cannot load more than 4 instance of a plug-in, the host takes off, in spite of the fact that to resource GPU still is.
However, I am very glad, that business with development Cuda in the field of VST plug-ins move quite successfully! :love:

Post Reply

Return to “Effects”