Tracktion and CPU Optimization, part 2

Discussion about: tracktion.com
RELATED
PRODUCTS

Post

This thread is meant for the engineers at Tracktion. I would like to document this stuff for myself and future users.I wrote about this before and here is a follow up after more testing.

I have a big PC and the CPU is seriously underutilized at less than 7% when Tracktion claims 75% and starts stuttering.

Specs For Context
(Skip this part if you don’t need specs)
My PC has a Ryzen 7 CPU with 8 cores. I use it to edit 4k video. It packs a punch.

When I work in Tracktion I keep an eye on the CPU indicator. When Tracktion says we are at 60%, my PC reports 5% actual utilization. When Tracktion starts stuttering at around 75%, the actual CPU reported by the Win10 Resource Monitor is at less than 7%, with no single core at even 30% utilization.

Tracktion and all plugins are on an SSD drive. I have a GPU card with enough horse power for an F1 car (not that T7 would need it). RAM is at 30% or so utilization. As far as I can see there is no bottleneck on the PC specs.


Optimizing: What I know So Far
My understanding is that T7 is using all the cores. The settings say 16 cores, which is a bit odd since the processor only has 8, but it does have 16 threads. So I’m guessing that every 2 threads go into a core. This could be a bottleneck.

I believe T7 is sending each track to a ‘core’ (CPU thread). This means in theory that I could max out the first 8 tracks and each should have plenty of power to handle any plugin configuration. How does this work specifically?

I ask these questions in order to be conscious about track and plugin placements:

- For a CPU with 8 cores: Are tracks 1-8 going into individual cores? Would track 9-16 go into the same cores in the same order (thus reducing their processing power in half)?
- For a PC with 16 threads (like Ryzen 7) Does it mean that if I’m using 16 tracks, each goes into a CPU thread, and how is that processed by the 8 cores?
- Where does the Master go?

More observations:
- For a test project with 4 tracks, I noticed via Resource Monitor that CPU thread #1 is doing nothing, while threads 2 and 4 are doing most of the lifting (still under 30% utilization though). Is this because some other system task has thread #1 reserved, so T7 is being given access to threads 2 and 4 but not other threads? If so, what are good recommendations to follow in terms of making sure that more cores/threads are available to Tracktion?

- Using the T7 CPU Usage dialog, I’m not seeing the Master bus. Plugins in that space appear as “(Inside Rack)”. Those don’t seem to be measured. I also have a rack with reverb on one of the tracks. When I drop a hungry tube saturator onto the master, it crashes the audio even though it is the only plugin on the Master. Does this mean that all racks are on the same thread/core, including the Master bus?

- When I route a track into another track so I can put other plugins there (like the saturation), feels that the load remains on the previous track. Does this means that tracks the feed into another track also share the same thread/core?

Finally, I tried adjusting the buffer settings to no avail. I increased the samples and latency allowances significantly but it seems to have 0 effect. What settings would you recommend to make maximum use of a powerful CPU?

I’m happy to perform any tests and report back and upload logs or anything that’s needed. Please point me in the right direction.

Post

If you haven't seen it already this post from ScanAudio has some interesting info on Ryzen and DAWs.
http://www.scanproaudio.info/2017/03/02 ... for-audio/

Over at the KVR Computer Setup and System Configuration forum there are several users having good results with Ryzen but I get it that Waveform maybe a unique case. You might get some insights there. Kaine is very respected over there.

Post

thank you for that link. Their benchmarking suggests that Ryzen 7 has a bottleneck at the buffering stage. Their tests showed that the audio breaks at less than 100% CPU utilization, and even 15-20% lower than Intel architecture.

Another big question I have is what exactly is Tracktion measuring and reporting as CPU utilization? What exactly is hitting 75% utilization within T7 when audio breaks, since my PC reports the CPU at 6%?

The benchmarking link has a test that shows audio breaking at 70-85% 'load'. Is this the same type of load that Tracktion is reporting, or the same type of load the Win10 Resource Monitor is reporting (around 6%)?

I'll continue looking into this and reporting findings. I would really like to hear insights from the Tracktion team.

Post

I think that data transport from the harddrive plays an important role too.
My notebook has an Intel Rapid Storage system driver that seems to optimize a lot of that.
Perhaps it is not always clear where the waiting time for the disk controller would show up, but certainly it will make our CPU appear lazy or too idle.
This would be another issue to be part of that complex problem.

Post

re: transport from the harddrive
That's a point to consider. I figured in my case being on SSD should be enough to not have to worry about that, but there could be a bottle neck in the data transferring regardless.

I have enough RAM that I would want Tracktion to just load all files and data to RAM so there would be no drive transfer needed, but looking at the small RAM utilization it seems that Tracktion 7 is doing most stuff from the disk.

But monitoring the disk usage I'm seeing less than 3mb/s read from T7 during play, which should be more than 10x leeway for any hard drive. In my case Resource Monitor describes the disk utilization as less than 7%. I figure this is probably the reason T7 doesn't bother loading stuff to RAM.

Post

I dont see the point of this. The version of Tracktion you're using is two versions old, and anything you're talking about could have been oboleted by newer versions twice over.
Anything directed 'for the engineers' should be based on the behaviour of Waveform 9 to be relevant.
OFFICIAL free competition : If you' see martiu post 'why you mad homie' or similar after this post, you are a prizewinner, and a free reverb plugin is yours. PM martiu immediately, and ask for your prize. He'll give it to you, baby!

Post

Then I would like to hear an answer on W9. All points stand: how are tracks, racks and master routed into cores, what are the bottlenecks to look out for, how to optimize for Ryzen 7? If Waveform has been optimized significantly, I'd like to hear about it.

But even as a user of T7, I don't think it is over the top to ask questions about my software. I believe this stuff should have been documented regardless. I'm not asking to start a philosophical discussion - I actually am using T7 and want to know.

Post

The the problem is that this is a moving subject. We're constantly optimizing things when we spot bottlenecks so any advice given may be out of date fairly rapidly. The other thing is that if you're re-structuring Edits to take advantage of how we've internally optimised, then we've failed somewhere...

Having said that, (and with the caveat that this is subject to change) here's roughly how we do things:

• The number of cores available to W9 are logical cores, this basically takes in to account hyperthreading to double the number of physical cores. Take a look at Windows Task Manager and you'll see the same thing. It's just how modern CPUs work.

• We parallelise processing by track (where possible) so essentially multiple tracks can be being processed by different cores at the same time. There are some exceptions to this such as submix tracks which need to processed in series.

• Audio file reading is done by memory mapping the files and using the OS to page in the sections about to be read.

• If you want to free up resources, the best thing to do is simply freeze some tracks.

• In our experience, it's almost always plugins that use the most CPU so maybe prod them to optimise a bit if you've got some particularly hungry ones.

Post

Thank you for jumping in. For clarity:

How is the master handled, and how do track racks play into things?

About Submixes
I understand these run on the same core, is it because of sync/timing concerns?

Does that mean that if I have a vocal submix with 4 tracks of processing, all of these tracks and effects are being handled by the same CPU thread?


Tracktion CPU Meter
Can you give me an idea of what this is measuring? Seems this is more than just the reported CPU utilization by Win10. Are you doing a calculation that includes buffering/bandwidth?


Re: Hungry plugins
It seems that the best sounding ones are the hungriest. Usually doing some oversampling and saturation processing. I expect that as computers get more powerful, plugin designers will get bolder with the use of CPU instead of the other way around. But at the same time, CPUs with 8-16 cores are becoming the norm.

Post

>> Audio file reading is done by memory mapping the files and using the OS to page in the sections about to be read.

so this is the point where tuning on OS level becomes very relevant.
my intel enhanced driver seems to support this concept well.
win7 taskmanager tells me that T6 already is using more than 80% of my i7 as a whole. (4 physical kernels)

another crucial point seems to be: set the audio system to exclusive tracktion use.

Post

is the track-based thread philosophy rooting in JUCE?

it might be a very difficult project for later,
but innovation in processing of the net graph structure, including a very general latency sync algorithm, and using a new paradigm that assigns threads - if it helps a congested track - on plugin granularity (might be some smart bucket brigade technique), hopefully would be the real big deal.

Post

It's not really anything to do with JUCE, the audio graph is largely proprietary (apart from the plugin hosting obviously). It's more because it didn't used to be multi-threaded and that was a straightforward way to do it.

Yes, we would like re-work the audio graph to be more granularly processed, it's just a very difficult and time consuming job which we absolutely don't have time to do at the moment. I did write about the process on here a while ago in detail though.

Post

Re: track based thread philosophy...

From the discussions that I have seen on the subject, audio processing being track based per core is a necessity of processing the audio chain sequentially. Left to right, input to output, including any plugins.

All audio events on a single track in the signal chain are dependent on the computation that precedes them. So there is just no way to do these operations out of order, or even in parallel. That make the question of multi-threading the track and the processing of multiple track insert plugins on multiple cores a moot question.

This concept would also lead one to realize that audio summing operations involved in submix tracks would have a similar dependency.

I would be interested in how the master bus or track plays into this as well. Would you dedicate a single core to this? I would assume that it would also be necessary for the master bus summing calculations be done on one core.
Windows 10 and too many plugins

Post

Processors are going the way of slower clock speed and more cores. Eventually it will be paramount to be able to utilize those cores aggressively.

Without presuming I know how a modern DAW is wired inside, I understand that compromises need to be made to keep latency down and tracks synced, but for editing (not recording) I don't see why we are not using a more powerful wiring using render buffers.

For instance, in edit mode, the DAW could start rendering 1 sec before actually playing sound to the speakers. This would allow a ton of processing to happen, and the ability to compute the audio far enough to send the audio to additional tracks to be processed by a separate CPU thread.

We already have the technology to do that. The freeze plugin works that way (making a rendered buffer). I'm suggesting using a smaller render buffer of about a sec, which would be enough to reach the Aux Send and start the next track as a separate process off of that rendered buffer.

For the user this means a very short delay when you first press play. Not a big deal at all. But the ability to break up tracks and plugins can be massive in terms of usability and maximizing processor usage.

The next level beyond could allow the computer to process mixes that simply wouldn't be supported in real time. Using this buffering process, it could precalculate how far behind the live playback is falling (based on CPU usage, dropped samples) and expand the time delay upfront in order to cope. So if I load a track with 15% more than my CPU can handle in live playback, no problem, just use the buffer system and apply a 4 second delay before sending audio to the speakers.

Post

The only way that could work well in real-time processing is if you had some sort of quantum computer that could look into the future to see events that haven't happened yet, then return those events back to the present for processing, so that you get your processed output 'now'.

The alternative is to create a buffer that collects events that have already happened, and process them as needed. That is not real-time, but more of a rendering process. Works great for video games or video editing, huge number crunching tasks, but not so much for real-time audio though.
Windows 10 and too many plugins

Post Reply

Return to “Tracktion”