Quartz/CoreGraphics is broken. What to do now?

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

I am using cairo software rendering with my Youlean Loudness Meter plugin. I have tried to set up the OpenGL backend on the Windows but it was really complicated and didn't work in the end. Also, I have tried hiring someone to do that job, but this didn't work in the end too. On the mac I have not tried it yet, but with moltengl it might be easier stuff to do. So, this stuff can be really pain to setup.

Now about rendering: Hardware acerelated graphics will be awesome for fills, but stroke will always use some CPU. In my expirience some implementations are better than other, but all suck. Cairo OpenGL backed on windows with strokes uses a lot of CPU to calculate the stroke path, and basically if you are stroking you will get similar performance like cairo software rendering.

Now, cairo actually have path caching so it will help for static content, but a (good) alternative for graphics NanoVG does not have path caching (unless they have added it latelly) and it will perform even worse even though it is graphics accerelated as well.

For plugin GUIs the best solution is the NanoVG since it supports OpenGL and Metal backend and it is very light lib and easy to implement. If you want the best performance and features, there is no better than Skia, but good luck setting that up.

You can take a look at IPlug2. There you can check and benchmark the NanoVG Metal, OpenGL, Cairo Quartz and Cairo Software rendering. https://github.com/iPlug2/iPlug2

In the end since my plugin uses a lot of stroking I have decided that the best way to improve the performance is actually to reduce the draw points. You never want to draw more points than the pixels on the screen. If you don't have a lib to downscale the arrays dynamically send me a PM and I will give you the mine.

Post

Unfortunately Arne's fix only works from 10.8 and we're still on 10.7 due to 32-bit Carbon support. We might have to drop Carbon and/or 32-bit. Ah well, maybe it's time...

I'll first try to reduce everything else. We already have a very good optimizePath() function and a super fast path2Outline, which I'll try with CGFillPath, which seems a lot faster than CGStrokePath. I'll add an path optimization accuracy parameter to our graphs so that we can measure performance vs. visual tradeoffs.

We also got the recommendation to use CALayers instead of CGImages. Maybe it'll help if we just draw our graphs into CALayers and let them render asynchronously.

Another idea is to start profiling UI drawing right in our render function and maybe display options for the user to switch certain features on or off to improve performance.

We'll see...

Post

Yes, please drop 32bit and carbon for sake of a performance speedup :)

Post

Hanz Meyzer wrote: Fri Mar 15, 2019 1:34 pm Yes, please drop 32bit and carbon for sake of a performance speedup :)
Well, it's not even clear how much work it really is and how much gain.

I have some brutal ideas though. We could just layer two NSViews. One for all things background, knobs, graphs and imagery, one on top of it which just does text. We'd just need to design the UI in a way that text is never covered by anything else.

Post

I have a 2018 Mac mini and performance regarding OpenGL is horrible. It eats up so much CPU. Looking forward to whatever changes you implement. :)

Post

flocked wrote: Fri Mar 15, 2019 5:03 pm I have a 2018 Mac mini and performance regarding OpenGL is horrible. It eats up so much CPU. Looking forward to whatever changes you implement. :)
Have you tried an older Mac SDK? I ask because there are reports that the latest one slows down buffer swapping.

Post

Youlean wrote: Fri Mar 15, 2019 1:02 pm Now about rendering: Hardware acerelated graphics will be awesome for fills, but stroke will always use some CPU. In my expirience some implementations are better than other, but all suck. Cairo OpenGL backed on windows with strokes uses a lot of CPU to calculate the stroke path, and basically if you are stroking you will get similar performance like cairo software rendering.
I could be wrong, but I'm under the impression that Cairo tesselates the geometry into either triangles or spans on the CPU. The thing is, for end-to-end performance (fresh paths, can't cache) and complex paths (and strokes tend to be somewhat of a worst case), this is essentially the same basic algorithm as the slowest part of a software scanline rasteriser. In fact, for complex enough strokes (hello FFT spectrum plots), the actual pixel processing cost is usually close to irrelevant in comparison.

In fact, it turns out that it is potentially faster to just rasterise an alpha mask directly and upload it as a texture. I believe this is what Skia does(?) for complex paths, at least when NV_path_rendering is not available. I'm also under the impression that Skia at least in the past had some issues (especially on mobile devices) with upload bandwidth, because you might end up having to upload quite a bit more data than what you'd need if you just rendered it all on CPU. Still, this approach has it's benefits when you can cache and/or want to mix things with other GPU content.

The NV_path_rendering approach of stencil+cover can be done without fancy extension as well and it's actually reasonably fast. That's what QT has been doing for ages, unless they've changed it recently. The problem with this approach is that it can get kinda fill-rate heavy, takes a lot of state switching and usually relies on hardware MSAA for anti-aliasing, so it relies somewhat more heavily on the quality of the GPU available. I'd also imagine it probably works very poorly with tile-rendering mobile GPUs.

Personally, I've pretty much just give up and just use a custom all-software rendering pipeline.

Post

Interesting thread!
Here is a different approach that we developed recently because of the same problems. We're getting 60FPS on iOS, Win and Mac without problems.

At the bottom there is a simple OGL (later Vulkan i guess) based sprite renderer. Its rendering sprites from a sprite-atlas. The sprite atlas is composed of text, knobs, etc - vector graphics rendered on the CPU (using a slow software renderer) upon first use when the size is know, and rasterised to pixelperfect size, and uploaded to the texture atlas dynamically.

The UI code composing these sprites is a immediate UI mode, where everything is rendered at every frame. The UI code heavily inspired from game development, where you think of the screen as compsed of "slots" that needs to be filled out by some (changing and moving) sprites. Thus, the sprite hierachy form a kind of virtual DOM like in React.

Its still work-in-progress, and there are drawbacks, but I like the feel of it and it performs well despite we didnt get to do optimizations yet.
JamOrigin.com

Like us on Facebook.com/JamOrigin and follow us on Twitter @JamOrigin

Post

JamOrigin wrote: Sat Mar 16, 2019 8:43 am The UI code composing these sprites is a immediate UI mode, where everything is rendered at every frame.
It's probably worth keeping in mind that immediate mode UIs on mobile devices (whether it's a phone or a laptop) are great at draining battery, even when the application is otherwise idle. One can argue whether that's a huge concern for an audio application that is probably going to be draining a lot of battery anyway, but in general minimizing the amount of work that runs on a fixed timer definitely has a non-trivial impact on battery life.

Post

Yes, surely the GPU will drain some battery at 60 FPS, but thats also the case also with CoreGraphics if you want to render at a reasonable smooth frame rate - even more so.

In our framework the OpenGL renderer runs in its own thread, so for audio meters and stuff we could do with say 10 FPS, similar to what you can do with CoreGraphics, but at lower power drain.

EDIT:
An interesting optimization that we havent implemented yet is to DIFF the entiere sprite hierachy each frame. In our case the sprite is just a boudning box and a texture id (~20 bytes). For most audio plugins, frames are almost identical (some meters or envelopes moved), so the amount of data to upload to the GPU is very sparse. I believe React does exactrly this with its virtual DOM tree.
JamOrigin.com

Like us on Facebook.com/JamOrigin and follow us on Twitter @JamOrigin

Post

I also seriously recommend Skia as the graphics solution for DAWs and VST plugins. It's not a simple library, but 2D graphics is not a simple problem, and the difficulty of setting up your development environment is a piece of cake compared to solving the shortcomings of all other alternatives like NanoVG. I believe the Skia team has pushed the envelope of 2D graphics on all fronts.

A disadvantage is that the latest version requires MacOS 10.10+, which in this industry would drop support for around 5-15% of Mac users.

OpenGL on MacOS 10.14 is hilariously bad. Especially now that everyone has >2560x1600 pixel displays with (IMO) less-than-sufficient GPUs to drive that resolution. It seems that 9% CPU on one core is the best anyone can get for drawing a triangle at 60 fps. Windows and Linux can do it with double-buffering in less than 1% CPU. I don't really see a solution to fixing OpenGL on Mojave, so I'd recommend just using Skia+Vulkan+MoltenVK or MetalNanoVG. The Metal backend of Skia is in development but far from completion, although this is good to keep in mind in a year when it's time to drop MacOS <10.10 support.

As a side note, if anyone wants to move toward browser-based audio software, the most performant option is to maintain an SVG document as your render state. This will take advantage of Skia caching in Firefox and Chrome, which you don't get in canvas 2d or webgl unless you write your own bounding box cache manually.
VCV Rack, the Eurorack simulator

Post

mystran wrote: Thu Mar 14, 2019 10:35 pm
Urs wrote: Thu Mar 14, 2019 10:29 pm
mystran wrote: Thu Mar 14, 2019 10:08 pm Write an OpenGL code-path (deprecated or not) that resolves the pressing issues, then file a TODO item about replacing said code-path with a Metal alternative "when time permits." :D
Argh. I really wish for something a tad more permanent. I've done OpenGL 20 years ago, I know the pains.
[…]

Keep in mind that once you require Metal, you'll also require macOS 10.11 minimum. Whether that's a concern for you, I don't know... but it's something to keep in mind.
macOS 10.11 minimum on a recent enough machine, compatible with macOS 10.14 Mojave. I believe that is:

MacBook (Early 2015 or newer)
MacBook Pro (Mid 2012 or newer)
MacBook Air (Mid 2012 or newer)
Mac mini (Late 2012 or newer)
iMac (Late 2012 or newer)
Mac Pro (Late 2013)

For instance, my MacBook Pro is from 2011, is running macOS 10.13 High Sierra and is not Metal nor Mojave compatible.

Post

JamOrigin wrote: Sat Mar 16, 2019 8:43 am Interesting thread!
Here is a different approach that we developed recently because of the same problems. We're getting 60FPS on iOS, Win and Mac without problems.

At the bottom there is a simple OGL (later Vulkan i guess) based sprite renderer. Its rendering sprites from a sprite-atlas. The sprite atlas is composed of text, knobs, etc - vector graphics rendered on the CPU (using a slow software renderer) upon first use when the size is know, and rasterised to pixelperfect size, and uploaded to the texture atlas dynamically.

The UI code composing these sprites is a immediate UI mode, where everything is rendered at every frame. The UI code heavily inspired from game development, where you think of the screen as compsed of "slots" that needs to be filled out by some (changing and moving) sprites. Thus, the sprite hierachy form a kind of virtual DOM like in React.

Its still work-in-progress, and there are drawbacks, but I like the feel of it and it performs well despite we didnt get to do optimizations yet.
I'm doing something very similar.

I have layer of abstraction so that I can use DirectX 11 on Windows 7+ and OpenGL 3.2 on macOS 10.9+. Very light on CPU (barely tickles the average computer at 60 fps) but you have to code as though you're writing a 2D game using a texture atlas and sprite batching - only render "paths" when you can't use "sprites" - you certainly wouldn't render text as paths, you would pre-rasterize it in a texture sitting on the GPU. A modern PC can render millions of sprites per frame, whereas a plugin will be rendering a few hundred. If anyone is interested, Microsoft provide an example of a sprite batcher (a key building block for 2D graphics on the GPU) in cpp: https://github.com/Microsoft/DirectXTK/wiki/SpriteBatch.

GPU based rendering is not something you could just shim in to existing plugin code and expect to get good performance. When you're using the GPU both your architecture and your graphics need to be designed around the way the GPU wants you to do things.

I use a combination of ActionScript and Python to compile texture atlases directly from adobe illustrator artwork at predefined pixel densities and spit out cpp files with pngs embedded as byte arrays ready to decompress and send to the GPU on startup. The render code is pretty simple compared to the build pipeline for setting it up. A good tip I got given was do as much heavy lifting at build time as possible, especially if you're dealing with multiple platforms. The downside is it will be a lot of effort to make the necessary changes to your code, the upside is, it will be very fast if it's done right.

If you've got an abstraction layer, metal can be added when Apple turns off OpenGL support, so you're effectively future proof... until quantum GPUs at least :-)

Post

Have a look @ NanoVG.
It has ports for OpenGL & Metal & DX11

Post Reply

Return to “DSP and Plugin Development”