- KVRian
- 1331 posts since 26 Apr, 2004, from UK
As usual, the issue is what kind of algorithms are efficiently run on these board. And audio are not part of them.
- KVRAF
- 6256 posts since 30 Dec, 2004, from London uk
Its basically a standalone CUDA board. CUDA isn't that useful for much on realtime audio. Audio is a serial task, not greatly suited to CUDA. The problem is apparently latency. If you want serious audio realtime DSP, SHARC is what you want :
http://www.analog.com/en/processors-dsp ... index.html
Less powerful but more popular is Arduino :
http://www.instructables.com/id/Lo-fi-A ... tar-Pedal/
http://playground.arduino.cc/Main/ArduinoSynth
http://www.analog.com/en/processors-dsp ... index.html
Less powerful but more popular is Arduino :
http://www.instructables.com/id/Lo-fi-A ... tar-Pedal/
http://playground.arduino.cc/Main/ArduinoSynth
- KVRist
- 493 posts since 30 Jan, 2009, from UK
I've usually found the worst factor in audio with CUDA to be getting low latency, the problem being that passing data between the GPU memory and system memory takes a little too much CPU effort to make it worthwhile in small blocks. If this system has a shared CPU and GPU memory architecture that may well no longer be a problem.
Whether it is worth putting tight audio loops with feedback on a GPU, totally different issue and the answer probably is go SHARC, but block based operations that don't need feedback like convolution are interesting on this kind of platform.
That said, it may not be worth it. I'm able to do very long convolutions on mobile processors in Mobile Convolution without using more than 10-20% of a modern iOS CPU since the vector maths optimisation in a decent ARM chip is so good, and that is probably embedded enough to satisfy my whims. I've not tried, but it's probably possible to squeeze impressive performance out of a BeagleBone black or similar.
Whether it is worth putting tight audio loops with feedback on a GPU, totally different issue and the answer probably is go SHARC, but block based operations that don't need feedback like convolution are interesting on this kind of platform.
That said, it may not be worth it. I'm able to do very long convolutions on mobile processors in Mobile Convolution without using more than 10-20% of a modern iOS CPU since the vector maths optimisation in a decent ARM chip is so good, and that is probably embedded enough to satisfy my whims. I've not tried, but it's probably possible to squeeze impressive performance out of a BeagleBone black or similar.
- KVRist
- 389 posts since 28 Jul, 2003
On jetson the cpu and gpu memory are on the same address space and the Cuda api offers a zero copy transfer mode between host and gpu. If some transfer function of interest can be approximated with a CNN then a gpu could be useful, however given what has been achieved with sharc dsps in fractal audio gear for instance, dsp is a sure path.
-
- KVRist
- 152 posts since 21 Sep, 2015, from Grenoble
Have you considered Intel SoC + OpenCL? Their GPU is integrated and have shared memory with the CPU.
-
- KVRist
- 115 posts since 25 Sep, 2001, from Paris, France
You should have a look this http://bela.io/
Powerful enough for convolution and sub millisecond latency + open source
Powerful enough for convolution and sub millisecond latency + open source
Lorcan | lmdsp audio plug-ins
- KVRAF
- 1698 posts since 29 May, 2012
On jetson the cpu and gpu memory are on the same address space and the Cuda api offers a zero copy transfer mode between host and gpu.
Hi,
Is this 'zero copy' feature also mean 'near zero latency'? It processing latency affected by the fact that the operating system or some other app may also be using the same GPU? It's a shared resource, after all. I wonder how fast/frequently that resource may change hands.
Thanks
~stratum~
- KVRian
- 774 posts since 13 Mar, 2012
Have Jetson TK1 Pro here right in front of me.
"Pro" is the "automotive-grade" version of that platform, looks like this

And.. I like it
ARM performance is better than in other similar SoCs.
Actually there are not a lof quard-core (+ battery safing core) Cortex A15 SoCs arround.
The TK1 is way better an i.e. a dual-core A15 Ti OMAP5.
Don't have a Allwinner A80 for comparing.. think that one would come pretty close according to the specs.
So for a Coretex-A15 based Soc, the TK1 is propably the fastest chip you can get right now.
(If you need more power you need to select a different architecture).
NEON performance scales simliar to the ARM. Top A15-based system I have seen so far.
Now.. when it comes to audio you must know that the TK1 has no special DSP extensions (such as a DAC).
It has NEON, but that's it.
If you can use CUDA for you audio processing, do it. IMHO using CUDA is the only reason that justifies paying the TK1 price. If cannot/don't want to use it, better look at TexasInstruments, Qualcomm, AllWinner, .. SoCs. They cost less usually.
On my "pro board" there are AK4618VQ + AD1937 Audio Codecs. I have not used the AK4618VQ yet, but the AD1937 (192 kHz, 24-Bit ) is really good. So check what chip is on your system.. will affect your audio quality significantly .. like.. use the most advanced algorithms on the most powrefull platform to process your audio .. and then convert it on a 0.09$ no-name DAC to noise with some music mixed into.
About Intel:
Intel is popular on PC, but that's not the case for embedded.
Their old Atoms sucks balls. With Gen7 we have seen some major improvements, but still they are behing Ti, QCT & co on almost all KPIs..
"Pro" is the "automotive-grade" version of that platform, looks like this

And.. I like it

ARM performance is better than in other similar SoCs.
Actually there are not a lof quard-core (+ battery safing core) Cortex A15 SoCs arround.
The TK1 is way better an i.e. a dual-core A15 Ti OMAP5.
Don't have a Allwinner A80 for comparing.. think that one would come pretty close according to the specs.
So for a Coretex-A15 based Soc, the TK1 is propably the fastest chip you can get right now.
(If you need more power you need to select a different architecture).
NEON performance scales simliar to the ARM. Top A15-based system I have seen so far.
Now.. when it comes to audio you must know that the TK1 has no special DSP extensions (such as a DAC).
It has NEON, but that's it.
If you can use CUDA for you audio processing, do it. IMHO using CUDA is the only reason that justifies paying the TK1 price. If cannot/don't want to use it, better look at TexasInstruments, Qualcomm, AllWinner, .. SoCs. They cost less usually.
On my "pro board" there are AK4618VQ + AD1937 Audio Codecs. I have not used the AK4618VQ yet, but the AD1937 (192 kHz, 24-Bit ) is really good. So check what chip is on your system.. will affect your audio quality significantly .. like.. use the most advanced algorithms on the most powrefull platform to process your audio .. and then convert it on a 0.09$ no-name DAC to noise with some music mixed into.
About Intel:
Intel is popular on PC, but that's not the case for embedded.
Their old Atoms sucks balls. With Gen7 we have seen some major improvements, but still they are behing Ti, QCT & co on almost all KPIs..
Last edited by PurpleSunray on Thu Mar 16, 2017 6:31 am, edited 8 times in total.
~~ ॐ http://soundcloud.com/mfr ॐ ~~
- KVRian
- 774 posts since 13 Mar, 2012
Is this 'zero copy' feature also mean 'near zero latency'? It processing latency affected by the fact that the operating system or some other app may also be using the same GPU? It's a shared resource, after all. I wonder how fast/frequently that resource may change hands.
You are on an embedded system here, that is differnt to like running an App on a Windows PC.
One big difference:
You don't want to boot into the OS desktop shell, but you to boot your app (and display it on screen).
So the OS does not need to be owner of the GPU as a shared resource.
You can also run the OS without any window system and render you app into framebuffer directly via GBM / KMS / DRI. So you can be "owner" of the GPU resource if you want (ok, the "owner" will be the dirver, you own the pipeline that sends commands ot the driver).
~~ ॐ http://soundcloud.com/mfr ॐ ~~
- KVRian
- 1331 posts since 26 Apr, 2004, from UK
PurpleSunray wrote:If you can use CUDA for you audio processing, do it. IMHO using CUDA is the only reason that justifies paying the TK1 price. If cannot/don't want to use it, better look at TexasInstruments, Qualcomm, AllWinner, .. SoCs. They cost less usually.
Which means, everything is limited to FIR and waveshapers...
- KVRian
- 774 posts since 13 Mar, 2012
Miles1981 wrote:Which means, everything is limited to FIR and waveshapers...
... and analysis (FFT runs super-fast on CUDA).
~~ ॐ http://soundcloud.com/mfr ॐ ~~
-
- KVRist
- 245 posts since 7 Feb, 2017
PurpleSunray wrote:Miles1981 wrote:Which means, everything is limited to FIR and waveshapers...
... and analysis (FFT runs super-fast on CUDA).
My experience was that only batched CUDA FFTs run fast. Anything to minimize memory transfers between host-device and touching too much of global memory for that matter improved performance.
- KVRian
- 774 posts since 13 Mar, 2012
nonnaci wrote:Anything to minimize memory transfers between host-device and touching too much of global memory for that matter improved performance.
What is memory transfers between host-device?
We talk about a SoC here. There is no data transfer such as on the PCIe bus with a PC archtiecture. No idea how they manage CUDA buffers in detail, but graphic buffers are mapped to CPU in uncached mode and then they simply switch domain when driver need to access it.
~~ ॐ http://soundcloud.com/mfr ॐ ~~