Apple will switch to ARM processors: what does it mean for plugin developers?

DSP, Plugin and Host development discussion.
Post Reply New Topic
RELATED
PRODUCTS

Post

Urs wrote: Tue Jul 14, 2020 9:42 am
syntonica wrote: Mon Jul 13, 2020 8:06 pm (Off to read about Neon...)
I'll have a go at it in 2-3 weeks, but from a glance at the instruction set I think NEON has everything that SSE has, and much more. Kind of paradox that a RISC platform has the more complex and thorough implementation of this concept than the CISC one.
The intrinsics look quite useful and easy to use. The downside is that, of course, they are different from SSE, causing another code fork. I hate platform-specific code.
I think I'll be able to get away with just the autovectorization. I revisited it and learned quite about where and why it works. I found 7 not very critical loops I could force to vectorize with a #pragma. The others all failed due to variable loop sizes, method calls, inline conditionals, or mostly, just not worth the bits to vectorize, according to the LLVM cost model. Maybe if I get bored, I'll test those out and see, but for now, I'm trusting my compiler.
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better? :(

Post

syntonica wrote: Wed Jul 15, 2020 7:48 amcausing another code fork
We have a single header file defining an abstraction for AltiVec, SSE and no SIMD at all, to which we will simply add NEON and probably Clang's built-in vector format. There's about a hundred of statements like this:

Code: Select all

// nand int32x4
static inline __vi _vi_nand ( __vi A, __vi B )
{
#if __SSE__
	return _mm_andnot_si128( A, B );
#elif __NOSIMD__
	__vi r;
	for(int i=0;i<4;i++)
		r.i[i] = (~ A.i[i])&B.i[i];
	return r;
#else
	return ~(vec_and( A, B ));
#endif
}
We basically reproduce the functionality of SSE in scalar and AltiVec (PowerPC) instructions.

There's no copy/paste of code then and of course no code forks.

Post

(or maybe we'll template this out to make it easy to write unit tests, running vector code against scalar code)

Post

You can use SIMDe (https://github.com/simd-everywhere/simde) and write your SIMDe instructions and it automatically gets converted to ARM NEON instructions.

Post

Urs wrote: Wed Jul 15, 2020 9:17 am
syntonica wrote: Wed Jul 15, 2020 7:48 amcausing another code fork
We have a single header file defining an abstraction for AltiVec, SSE and no SIMD at all, to which we will simply add NEON and probably Clang's built-in vector format. There's about a hundred of statements like this:

Code: Select all

// nand int32x4
static inline __vi _vi_nand ( __vi A, __vi B )
{
#if __SSE__
	return _mm_andnot_si128( A, B );
#elif __NOSIMD__
	__vi r;
	for(int i=0;i<4;i++)
		r.i[i] = (~ A.i[i])&B.i[i];
	return r;
#else
	return ~(vec_and( A, B ));
#endif
}
We basically reproduce the functionality of SSE in scalar and AltiVec (PowerPC) instructions.

There's no copy/paste of code then and of course no code forks.
Ew. :lol:

The bane of my existence. At least once that tedium is complete, though, one can focus on the real programming chores.
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better? :(

Post

syntonica wrote: Wed Jul 15, 2020 4:03 pmThe bane of my existence. At least once that tedium is complete, though, one can focus on the real programming chores.
LOL, yeah... the fun thing is, we abstracted this even further with vector objects that have a complete set of overloaded operators with various types, e.g. float_vector * float. So now we program C++ with vectors or floats as template parameters and compile the same code to scalar and vector types. That's maybe 200 lines extra and it's enabled developers to write SIMD code when they had next to no experience with SSE.

Post

FigBug wrote: Wed Jul 15, 2020 12:26 pm You can use SIMDe (https://github.com/simd-everywhere/simde) and write your SIMDe instructions and it automatically gets converted to ARM NEON instructions.
Hah! I knew someone would do this.

I've been eyeing boost.SIMD, but it seems like it never really took off.

Anyhow, I've designed our library to be big endian so I can use the same indices for someVector[ x ] as for someSampleBuffer[ x ]. I presume pretty much all SSE-centric libraries are little endian and that always confuses the hell out of my head.

Post

ARM (by Apple) is not endian specific is it? I read that there is possibility to change endianness on ARM processors?

Post

As far as I can tell, big or little-endian data can be loaded. Since QuickTime has always been little endian, there's little use for this feature in audio. (PPCs could switch endianness, so QuickTime was designed to work on either PPC or Intel. Once the G5 dropped the ability, Apple dropped the G5...)

I hear it's still of use in networking, but that's not my bag, man...
I started on Logic 5 with a PowerBook G4 550Mhz. I now have a MacBook Air M1 and it's ~165x faster! So, why is my music not proportionally better? :(

Post

It's 15 years since I wrote said headers file. Back then I'm sure that in a union of float[ 4 ] and whatever little endian vec_f4, when accessing the vector element by it index, myFloat[ 0 ] becomes myVec_f4[ 3 ]. Meaning, the order of elements is reversed and that kind of seemed to me like an invitation to a bug fest. I was used to AltiVec/PowerPC back then which was big endian for all vector elements.

Post

nvmd

Post Reply

Return to “DSP and Plugin Development”