u-he on Apple silicon (Updated)

Official support for: u-he.com
KVRAF
8285 posts since 16 Aug, 2006

Post Mon Nov 16, 2020 12:24 pm

david.beholder wrote:
Mon Nov 16, 2020 12:21 pm
Depends on definition of "moderate". If you want to run Ableton and several instances of Diva don't even look at low performance laptops YMMV of course.
I've got a beefy desktop for serious work, but I'd like to be able to run a few VSTi's without my laptop glitching and stuttering (which is my current state). My current laptop really was made for web browsing and productivity, and is several years old. Would just like something that could handle basic audio work without major glitches.

KVRist
104 posts since 6 Apr, 2020

Post Thu Nov 19, 2020 2:28 am

There was a benchmark done with Logic and Diva:

Mac Mini M1 vs. iMac i9 10-core:

Mac Mini could handle 24 instances of Diva and the iMac could handle 68 instances.

So in terms of Diva, the M1 can provide 35% of the performance of an i9 10-core. But it just costs 26% of the price of the i9 ($699 vs $2699). Yes, the iMac has a display. So YMMV.

But I am eager to see if u-he can optimize the performance. Right now it looks like it is holding up well. But I guess this is all using Rosetta 2 with Diva.

LINK:
https://www.youtube.com/watch?v=SBRSjA5zB8Q

User avatar
Urs
u-he

Topic Starter

25500 posts since 8 Aug, 2002 from Berlin

Post Thu Nov 19, 2020 2:49 am

Not sure it's a 100% valid comparison.

Still waiting our M1 Macs to arrive... probably next week.

KVRist
104 posts since 6 Apr, 2020

Post Thu Nov 19, 2020 5:34 am

What would be a valid comparison? i7 8-Core?

User avatar
Urs
u-he

Topic Starter

25500 posts since 8 Aug, 2002 from Berlin

Post Thu Nov 19, 2020 5:39 am

tslays wrote:
Thu Nov 19, 2020 5:34 am
What would be a valid comparison? i7 8-Core?
One with software that was compiled for the system its tested on, i.e. not running in an emulation system.

Still, for that, the performance is truly amazing.

KVRist
104 posts since 6 Apr, 2020

Post Thu Nov 19, 2020 5:45 am

In other benchmarks, after optimization for Apple Silicon, there was performance improvement of about 30-40%.
How hard will it be for you to optimize for Apple Silicon? Will that take a considerable amount of development resources?

User avatar
Urs
u-he

Topic Starter

25500 posts since 8 Aug, 2002 from Berlin

Post Thu Nov 19, 2020 5:57 am

tslays wrote:
Thu Nov 19, 2020 5:45 am
In other benchmarks, after optimization for Apple Silicon, there was performance improvement of about 30-40%.
How hard will it be for you to optimize for Apple Silicon? Will that take a considerable amount of development resources?
It's already happening, we're testing our internal builds for Apple Silicon on the Apple Transition Kit, which we've had for a few months, and - from next week - on M1 Macs.

KVRian
599 posts since 14 Mar, 2002

Post Thu Nov 19, 2020 5:58 am

😬
adapt or die
to me apple, despite its “size”, compared to Intel or amd or nvidia is so immensely quick in adaption it’s a miracle.
to me it’s a bold move, and the right one, too.

I remember my younger me self teaching ARM risc assembly on a Acorn Archimedes computer, within a week or so. It’s so efficient and easy, and was light years ahead of Intel pentium processor performance. Wonder why it took so long for someone to bring it to the desktop masses.

KVRist
165 posts since 20 May, 2014

Post Thu Nov 19, 2020 9:39 am

tslays wrote:
Thu Nov 19, 2020 5:45 am
In other benchmarks, after optimization for Apple Silicon, there was performance improvement of about 30-40%.
How hard will it be for you to optimize for Apple Silicon? Will that take a considerable amount of development resources?
It's not even about optimisation. It is far more basic than that. It is about compiling a native binary for the Apple's new chip. The example you've posted is a Diva version built for Intel emulated on a Risc platform (using Apple's Rosetta emulation layer). And in spite of this the numbers are actually pretty impressive. The comparison aforementioned is I think good news but it is far from what native performance will be I suspect.

User avatar
KVRAF
21411 posts since 7 Jan, 2009 from Croatia

Post Thu Nov 19, 2020 10:16 am

Rosetta 2 is actually not emulating AFAIK, it is converting Intel instruction set to matching ARM instructions at install time and creates a new binary during that process. It seems they also do some clever optimizations while at it (probably courtesy of clang or LLVM?).

KVRer
3 posts since 15 Jan, 2020

Post Thu Nov 19, 2020 4:04 pm

EvilDragon wrote:
Thu Nov 19, 2020 10:16 am
Rosetta 2 is actually not emulating AFAIK, it is converting Intel instruction set to matching ARM instructions at install time and creates a new binary during that process. It seems they also do some clever optimizations while at it (probably courtesy of clang or LLVM?).
LLVM optimizations are definitely doing a bunch of heavy lifting here, but there are plenty of additional optimizations that can only be deemed safe when coming from the original source code (and that's where Clang would play its role.) Intel architectures have different semantics in a number of areas that mean matching their behavior can be inherently inefficient when you're on a processor built around a different set of assumptions. Given that you can't mix and match Rosetta-generated code and natively compiled code in the same process I assume they're also sticking to the Intel calling conventions in order to make strict translation possible, and this likely means they're missing out on additional opportunities around register allocation that native compilation will bring. It may also suggest that there's a slim adaptation layer required between system API and translated code.

I'm very much looking forward to seeing my U-he products updated to Universal versions!

User avatar
Urs
u-he

Topic Starter

25500 posts since 8 Aug, 2002 from Berlin

Post Fri Nov 20, 2020 4:15 am

The machine code will still use the limited number of registers that Intel machines provide for, whereas RISC machines usually have a lot of them. Many Intel instructions will have to be broken down into multiple instructions for an ARM port, which only a compiler from source will be able to optimize for.

OTOH maybe they are decompiling with something like IDA and then recompiling. But that would take very long, and I don't see this happening.

User avatar
KVRAF
21411 posts since 7 Jan, 2009 from Croatia

Post Fri Nov 20, 2020 3:12 pm

Ummm there is a buuuuuuunch of instructions if you count all 4 versions of SSE and all the AVXs... NEON doesn't even cover them all I don't think.


EDIT: Oh right. Registers vs instructions! Made a mixup there.

KVRist
62 posts since 14 Nov, 2014

Post Thu Nov 26, 2020 9:42 pm

Any update on m1 benchmark yet? Did you receive your arm macs Urs? :)

User avatar
Urs
u-he

Topic Starter

25500 posts since 8 Aug, 2002 from Berlin

Post Thu Nov 26, 2020 11:09 pm

We got the Macs, but they're still nicely wrapped and/or on their way to people working remotely (Covid 'n stuff).

I guess next week we'll have an idea.

On a related note, while we can compile natively for Arm, we are currently relying on the compiler to do some crucial optimisations ("auto vectorisation"). Apparently, these are not as good as we would have hoped for. Hence, we'll most certainly have to spend 2-3 weeks adding NEON support to our vector library (one large text file that unifies SIMD like SSE and AltiVec for us).

Return to “u-he”