Convolution Reverb for NVidia and ATI GPUs - saving CPU time

VST, AU, etc. plug-in Virtual Effects discussion
2 posts since 13 Sep, 2011

Post Wed Sep 14, 2011 7:39 am

Nils Schneider wrote:It may be the case that the system ID has changed,

You are totally right, thank you. I verified that the system ID changes whenever I enable or disable an unused LAN port in Windows' device manager.
Nils Schneider wrote:You may want to re-download the demo and you will be happy again :)
No, thank you very much. I don't like your home-made challenge response "copy protection," especially considering its brittleness as illustrated by my own experience with it.

In contrast, LiquidSonics follows a comparatively friendly strategy, with a personalized license file that identifies me as the purchaser, but is not tied to any particular hardware at all. This no-nonsense approach is the primary reason why today I bought their slightly more expensive (60 GBP) Reverberate bundle, and not your plugin.

I imagine that you might not want to throw away all the hard work that went into your "copy protection" code, but I would also hazard a guess that it is costing you more paying customers than it prevents piracy.

1 posts since 21 Jul, 2012

Post Sat Jul 21, 2012 10:08 am

I tried the demo version and i have some problem.
The plugin's try to use the CPU not the GPU.
When i tried to load an impulse it happens an "openCL error code 7"
My Video card is an ATI Radeon HD 5750 with the latest drivers (12.6), the OS is XP.
Do you think that my system in not compatible or does the demo plugin can't run on my system ?

Best Regards
Jean Michel

1189 posts since 3 Mar, 2009 from Colorado Springs

Post Sat Jul 21, 2012 2:51 pm

Compyfox wrote:Just tested this reverb, and I'm a bit... baffled so to speak.

First and foremost, after an upgrade of my drivers, this thing works like it should. But not as expected. I expected a GPU load of about 5% per instance. Yet I got 25% with short IRs from StudioCat in either 44kHz or 48kHz. I'm running on an i7 with Windows in 32bit, tested in Cubase 5 and Cubase 6 with a sampling rate of 48kHz and a bitrate of 24bits.

On top of that... I can only use one instance simultaneously? Loading a second instance of this reverb with it's independent IR resultet in a GPU load of 88% on instance one, and way over 95% on instance two. Resulting in a slowdown of my rig. Again, doesn't matter if it's a 44kHz IR or 48kHz IR.

It is an interesting concept now doubt, but at the moment, not usable for my needs. Maybe it's due to my GPU (which is a GeForce 9500GT with 32 cores and 1GB RAM), maybe it's the code. I can't tell what's going on.

I can however assume, that one plugin eats about 8 cores per instance. Unless somebody has a CUDA core monitor to see how many codes are actually used up.

Guess I'll wait for an update then.
Here's the deal... All cards are not created equal. GPGPU and especially CUDA lives or dies on the performance of the actual SMs/CUDA cores and the amount, speed, and bus width of your VRAM.

Basically your card has what is, for today, an incredibly tiny amount of old and slow CUDA cores. Those usage numbers are pretty much what I'd expect. nVidia's GPGPU didn't really kick off 'til the GT200 chips, which at their highest had 200+ CUDA cores of a much more solid nature; then the GF100 chip which was the Fermi architecture, the high end cards of that generation's consumer lineup (GTX 480, GTX 570, GTX 580) had several key improvements to the CUDA architecture, including hot-clocking the CUDA cores for two operations per execution cycle, and a yet higher core count and core base clock for that hot clock to work from. A GTX 570 will outperform the entry-level Fermi-based Quadro (workstation) card in Mercury Engine by Adobe because it's a GF110 part, and has lots of fast and relatively high precision CUDA cores. GF110 featured 1/8 DP FP precision performance, which, while still artificially handicapped (would be 1/4 DP precision otherwise) for market segmentation reasons, is really superb for a consumer card. Modern cards based on the Kepler architecture are extremely slow at GPGPU by comparison, optimized heavily toward using the SMX modules (aka CUDA cores) for gaming tasks. They'll put out shader performance like nobody's business, tessellation, all the goodies from DX9 through DX11.1 extremely well... But don't expect impressive CUDA performance from them. nVidia doesn't allow it. They have reasons.

I have a GTX 580 as a CUDA processor in my system and a quick check shows... 512 CUDA cores, running at over 1400MHz each, with 1.5 gigs of GDDR5 on a 384-bit memory bus. That's good for a memory bandwidth just shy of 200GB/sec. So it's got a lot of power and it can access it quickly. Fault tolerance for running out of memory in the CUDA API is "stick it back on the processor," I'd imagine ATI's OpenCL GPGPU is similar, so if people just can't use it, blame the card for not having sufficient oomph to process what you're trying to run through it.

A quick bit of trivia - nVidia learned a hard lesson from Fermi and as a result the followup, Kepler, has very badly kneecapped CUDA performance in its SMX (new name, similar function) models, on the GK104 chips currently available. They actually went with a more ATI-like solution - still unified shaders, but no longer hot clocked. And to prevent people from using consumer cards as powerhouses in prosumer setups, they've limited both single-precision and double-precision operations very substantially in the GTX 680 and GTX 670. It won't be until GK114 comes out later that we see what a Kepler card oriented specifically toward CUDA/GPGPU in general can do.

Also, workstation cards vs. gaming cards - more memory, ECC memory (after all, while background radiation may not strike twice in the same location, it pays to have high-value calculations done correctly the first time in high-performance simulations) instead of GDDR5, less optimizations toward gaming (shader performance tends to suck) and more optimization toward, depending on if it's a workstation generalist (Quadro, FireGL) or a discrete compute card (Tesla), either high levels of high-precision antialiasing for working with CAD, etc., or no GPU output at all.

All this GPGPU stuff is really cool and I'd love to see more developments done using it, but people need to understand that if you're going to use the GPU as a DSP card, you'll be limited by the capabilities of the GPU itself. It's been many years since the 9500 GT came out, and with only 32 CUDA cores of that generation, it's scarcely better, if it is better, than Intel's HD4000 graphics on Intel's Ivy Bridge chips. While graphics performance doesn't map one-to-one with GPGPU performance - ATI for example has traditionally swept nVidia for OpenCL performance, while CUDA is the API that has lots of money behind it for market penetration at the top-down level - if your card is struggling running modern games or just can't run many of them, if it's two generations old at this point, there have been a lot of computing advancements for GPGPU since then and you shouldn't be too taken aback when it won't deliver performance substantially better or different than native.
Last edited by Agreed on Sat Jul 21, 2012 3:03 pm, edited 1 time in total.
My Guitar Software Review Blog
Now featuring: Amplitube 4 w/ MESA/Boogie Col. & Fender Col. 2
Upcoming: S-Gear v2.6

1189 posts since 3 Mar, 2009 from Colorado Springs

Post Sat Jul 21, 2012 3:00 pm

If you have a PCI-e 2.0 interface of at least 8X (or a PCI-e 1.1 interface of 16X) you can pick up a GTX 560Ti-448 as a pretty good entry point into consumer CUDA that doesn't cost an arm and a leg. It's not a GF104 part (budget Fermi, much more limited CUDA performance, though in the GTX 560Ti NON 448, just standard GTX 560Ti, it still has great PhysX performance... not germane to this conversation, but if you're a gamer, it's a relatively low power draw card that'll let you turn on PhysX without your framerate eating crap). The GTX 560Ti-448 is actually a GF110-based card, but with one fewer SMs and some other minor reductions in performance compared to a GTX 570. If you can find one for a low price, the 448 stands for "448 cores" (like the GT200 chip card the GTX 260-216 had, at the time, a higher shader module count/CUDA core count than its GTX 260 predecessor and performed much better). That's not as high as 512, but for audio work like this I think you'll find that 448 CUDA cores from the Fermi generation are GREAT, very cost effective DSP if you can find cards to use 'em.

Also, just in general for gaming, it's a poor decision to get versions of cards that have double the reference amount of VRAM. However, for CUDA performance, since insufficient VRAM will usually shunt processing back to the CPU rather than the GPU&SMs/CUDA cores, it can be a good buy.
My Guitar Software Review Blog
Now featuring: Amplitube 4 w/ MESA/Boogie Col. & Fender Col. 2
Upcoming: S-Gear v2.6

Return to “Effects”