KVR Audio

syntonica · Post by **syntonica** » Wed Jul 15, 2020 7:48 am

Urs wrote: Tue Jul 14, 2020 9:42 am
syntonica wrote: Mon Jul 13, 2020 8:06 pm (Off to read about Neon...)
I'll have a go at it in 2-3 weeks, but from a glance at the instruction set I think NEON has everything that SSE has, and much more. Kind of paradox that a RISC platform has the more complex and thorough implementation of this concept than the CISC one.

The intrinsics look quite useful and easy to use. The downside is that, of course, they are different from SSE, causing another code fork. I hate platform-specific code.
I think I'll be able to get away with just the autovectorization. I revisited it and learned quite about where and why it works. I found 7 not very critical loops I could force to vectorize with a #pragma. The others all failed due to variable loop sizes, method calls, inline conditionals, or mostly, just not worth the bits to vectorize, according to the LLVM cost model. Maybe if I get bored, I'll test those out and see, but for now, I'm trusting my compiler.

Urs · Post by **Urs** » Wed Jul 15, 2020 9:17 am

syntonica wrote: Wed Jul 15, 2020 7:48 amcausing another code fork

We have a single header file defining an abstraction for AltiVec, SSE and no SIMD at all, to which we will simply add NEON and probably Clang's built-in vector format. There's about a hundred of statements like this:

Code: Select all

// nand int32x4
static inline __vi _vi_nand ( __vi A, __vi B )
{
#if __SSE__
	return _mm_andnot_si128( A, B );
#elif __NOSIMD__
	__vi r;
	for(int i=0;i<4;i++)
		r.i[i] = (~ A.i[i])&B.i[i];
	return r;
#else
	return ~(vec_and( A, B ));
#endif
}

We basically reproduce the functionality of SSE in scalar and AltiVec (PowerPC) instructions.

There's no copy/paste of code then and of course no code forks.

Urs · Post by **Urs** » Wed Jul 15, 2020 9:20 am

(or maybe we'll template this out to make it easy to write unit tests, running vector code against scalar code)

FigBug · Post by **FigBug** » Wed Jul 15, 2020 12:26 pm

You can use SIMDe (https://github.com/simd-everywhere/simde) and write your SIMDe instructions and it automatically gets converted to ARM NEON instructions.

syntonica · Post by **syntonica** » Wed Jul 15, 2020 4:03 pm

Urs wrote: Wed Jul 15, 2020 9:17 am
syntonica wrote: Wed Jul 15, 2020 7:48 amcausing another code fork
We have a single header file defining an abstraction for AltiVec, SSE and no SIMD at all, to which we will simply add NEON and probably Clang's built-in vector format. There's about a hundred of statements like this:
Code: Select all
// nand int32x4
static inline __vi _vi_nand ( __vi A, __vi B )
{
#if __SSE__
	return _mm_andnot_si128( A, B );
#elif __NOSIMD__
	__vi r;
	for(int i=0;i<4;i++)
		r.i[i] = (~ A.i[i])&B.i[i];
	return r;
#else
	return ~(vec_and( A, B ));
#endif
}
We basically reproduce the functionality of SSE in scalar and AltiVec (PowerPC) instructions.

There's no copy/paste of code then and of course no code forks.

Ew.

The bane of my existence. At least once that tedium is complete, though, one can focus on the real programming chores.

Urs · Post by **Urs** » Wed Jul 15, 2020 4:21 pm

syntonica wrote: Wed Jul 15, 2020 4:03 pmThe bane of my existence. At least once that tedium is complete, though, one can focus on the real programming chores.

LOL, yeah... the fun thing is, we abstracted this even further with vector objects that have a complete set of overloaded operators with various types, e.g. float_vector * float. So now we program C++ with vectors or floats as template parameters and compile the same code to scalar and vector types. That's maybe 200 lines extra and it's enabled developers to write SIMD code when they had next to no experience with SSE.

Urs · Post by **Urs** » Wed Jul 15, 2020 4:25 pm

FigBug wrote: Wed Jul 15, 2020 12:26 pm You can use SIMDe (https://github.com/simd-everywhere/simde) and write your SIMDe instructions and it automatically gets converted to ARM NEON instructions.

Hah! I knew someone would do this.

I've been eyeing boost.SIMD, but it seems like it never really took off.

Anyhow, I've designed our library to be big endian so I can use the same indices for someVector[ x ] as for someSampleBuffer[ x ]. I presume pretty much all SSE-centric libraries are little endian and that always confuses the hell out of my head.

camsr · Post by **camsr** » Wed Jul 15, 2020 11:31 pm

ARM (by Apple) is not endian specific is it? I read that there is possibility to change endianness on ARM processors?

syntonica · Post by **syntonica** » Thu Jul 16, 2020 12:41 am

As far as I can tell, big or little-endian data can be loaded. Since QuickTime has always been little endian, there's little use for this feature in audio. (PPCs could switch endianness, so QuickTime was designed to work on either PPC or Intel. Once the G5 dropped the ability, Apple dropped the G5...)

I hear it's still of use in networking, but that's not my bag, man...

Urs · Post by **Urs** » Thu Jul 16, 2020 8:36 am

It's 15 years since I wrote said headers file. Back then I'm sure that in a union of float[ 4 ] and whatever little endian vec_f4, when accessing the vector element by it index, myFloat[ 0 ] becomes myVec_f4[ 3 ]. Meaning, the order of elements is reversed and that kind of seemed to me like an invitation to a bug fest. I was used to AltiVec/PowerPC back then which was big endian for all vector elements.

chk071 · Post by **chk071** » Sat Nov 14, 2020 11:03 am

Apple will switch to ARM processors: what does it mean for plugin developers?