Compyfox wrote:Just tested this reverb, and I'm a bit... baffled so to speak.
First and foremost, after an upgrade of my drivers, this thing works like it should. But not as expected. I expected a GPU load of about 5% per instance. Yet I got 25% with short IRs from StudioCat in either 44kHz or 48kHz. I'm running on an i7 with Windows in 32bit, tested in Cubase 5 and Cubase 6 with a sampling rate of 48kHz and a bitrate of 24bits.
On top of that... I can only use one instance simultaneously? Loading a second instance of this reverb with it's independent IR resultet in a GPU load of 88% on instance one, and way over 95% on instance two. Resulting in a slowdown of my rig. Again, doesn't matter if it's a 44kHz IR or 48kHz IR.
It is an interesting concept now doubt, but at the moment, not usable for my needs. Maybe it's due to my GPU (which is a GeForce 9500GT with 32 cores and 1GB RAM), maybe it's the code. I can't tell what's going on.
I can however assume, that one plugin eats about 8 cores per instance. Unless somebody has a CUDA core monitor to see how many codes are actually used up.
Guess I'll wait for an update then.
Here's the deal... All cards are not created equal. GPGPU and especially CUDA lives or dies on the performance of the actual SMs/CUDA cores and the amount, speed, and bus width of your VRAM.
Basically your card has what is, for today, an incredibly tiny amount of old and slow CUDA cores. Those usage numbers are pretty much what I'd expect. nVidia's GPGPU didn't really kick off 'til the GT200 chips, which at their highest had 200+ CUDA cores of a much more solid nature; then the GF100 chip which was the Fermi architecture, the high end cards of that generation's consumer lineup (GTX 480, GTX 570, GTX 580) had several key improvements to the CUDA architecture, including hot-clocking the CUDA cores for two operations per execution cycle, and a yet higher core count and core base clock for that hot clock to work from. A GTX 570 will outperform the entry-level Fermi-based Quadro (workstation) card in Mercury Engine by Adobe because it's a GF110 part, and has lots of fast and relatively high precision CUDA cores. GF110 featured 1/8 DP FP precision performance, which, while still artificially handicapped (would be 1/4 DP precision otherwise) for market segmentation reasons, is really superb for a consumer card. Modern cards based on the Kepler architecture are extremely slow at GPGPU by comparison, optimized heavily toward using the SMX modules (aka CUDA cores) for gaming tasks. They'll put out shader performance like nobody's business, tessellation, all the goodies from DX9 through DX11.1 extremely well... But don't expect impressive CUDA performance from them. nVidia doesn't allow it. They have reasons.
I have a GTX 580 as a CUDA processor in my system and a quick check shows... 512 CUDA cores, running at over 1400MHz each, with 1.5 gigs of GDDR5 on a 384-bit memory bus. That's good for a memory bandwidth just shy of 200GB/sec. So it's got a lot of power and it can access it quickly. Fault tolerance for running out of memory in the CUDA API is "stick it back on the processor," I'd imagine ATI's OpenCL GPGPU is similar, so if people just can't use it, blame the card for not having sufficient oomph to process what you're trying to run through it.
A quick bit of trivia - nVidia learned a hard lesson from Fermi and as a result the followup, Kepler, has very badly kneecapped CUDA performance in its SMX (new name, similar function) models, on the GK104 chips currently available. They actually went with a more ATI-like solution - still unified shaders, but no longer hot clocked. And to prevent people from using consumer cards as powerhouses in prosumer setups, they've limited both single-precision and double-precision operations very substantially in the GTX 680 and GTX 670. It won't be until GK114 comes out later that we see what a Kepler card oriented specifically toward CUDA/GPGPU in general can do.
Also, workstation cards vs. gaming cards - more memory, ECC memory (after all, while background radiation may not strike twice in the same location, it pays to have high-value calculations done correctly the first time in high-performance simulations) instead of GDDR5, less optimizations toward gaming (shader performance tends to suck) and more optimization toward, depending on if it's a workstation generalist (Quadro, FireGL) or a discrete compute card (Tesla), either high levels of high-precision antialiasing for working with CAD, etc., or no GPU output at all.
All this GPGPU stuff is really cool and I'd love to see more developments done using it, but people need to understand that if you're going to use the GPU as a DSP card, you'll be limited by the capabilities of the GPU itself. It's been many years since the 9500 GT came out, and with only 32 CUDA cores of that generation, it's scarcely better, if it is better, than Intel's HD4000 graphics on Intel's Ivy Bridge chips. While graphics performance doesn't map one-to-one with GPGPU performance - ATI for example has traditionally swept nVidia for OpenCL performance, while CUDA is the API that has lots of money behind it for market penetration at the top-down level - if your card is struggling running modern games or just can't run many of them, if it's two generations old at this point, there have been a lot of computing advancements for GPGPU since then and you shouldn't be too taken aback when it won't deliver performance substantially better or different than native.