KVR Audio

Borogove · Post by **Borogove** » Tue Sep 06, 2011 9:18 pm

valhallasound wrote:The linear detuning will result in constant beat rates between the various detuned oscillators. The beating between 0 and 0.1 is equal to that between 0.1 and 0.2, which is identical to that between 0.2 and 0.3. Sum these together, and you get a really strong single beat rate, with strange phase relationships, instead of the complex multiple beat relationships heard in SuperSaw.

I wrote a quick script to dump all the beat frequencies for the supersaw tuning. There are some duplicates - the 0v1 beat is the same as the 5v6, 0v5=1v6, 2v3=3v4, so they weren't selected for maximum dispersion either.

xoxos · Post by **xoxos** » Tue Sep 06, 2011 9:36 pm

contrasting linear unison with examples of nonlinear

xoxos · Post by **xoxos** » Tue Sep 06, 2011 9:37 pm

valhallasound wrote:you would see lots of convoluted mathematical explanations for what should be fairly simple functions.

Sean Costello

dsp standup :p

jupiter8 · Post by **jupiter8** » Tue Sep 06, 2011 9:50 pm

Borogove wrote:an 11th-order polynomial fit to what's obviously a two-segment piecewise linear function...

The odd thing is he actually notices that but ignores it for some reason.

The curve only has a slight increase
until the detune value reaches 0.5, where it turns
steeper, and has a drastic rise after 0.9.

mystran · Post by **mystran** » Wed Sep 07, 2011 6:42 am

Oden wrote: 1) The detune algo is almost ALWAYS linear. And I don't mean the detune knob(though that is non-linear too), but the detune itself. In JP8000 the detune values are far from linear as you just pointed. The relative values are: .893, .939, .980, 1.0, 1.020, 1.064, 1.110. Instead of the commonly used -0.3, -0.2. -0.1, 0, 0.1, 0.2, 0.3.

When you make the detuned frequencies multiples of the base note frequency, you maintain constant pitch range in terms of cents (or semitones). With linear detuning you get constant beat rates, but the higher notes will sound a lot less detuned. With exponential detuning (ie freq multiples) the beat rate varies, but the detuning in terms of pitch range (in terms of cents or semitones) stays constant whatever the frequency, so you can have half-semitone spread from the lowest bass frequencies all the way to Nyquist.

I've always wondered why everyone insists on linear detunes when the expo version sounds so much better to my ears unless one limits everything to a very small pitch range. Exponential is closer to how choirs, ensembles, even analog detuning tends to work. Things like chorus also gives you similar spreads.

As far as the exact ratios, it's not a huge deal what you use. A distribution where the "middle" frequencies are closer to each other and the "edge" frequencies spread more tends to maintain pitch perception better as there is more energy concentrated close to the nominal pitch. You might also want to sanity check that the beating ratios are sufficiently "random" that you don't get strong periodicity. I bet Roland's engineers just tried some arbitrary numbers and tweaked until it sounded/behaved nicely. Significantly different distributions DO give significantly different character, but ignoring the "badly behaving" coefficient set, slight variation isn't very noticeable most of the time.

Swiss Frank · Post by **Swiss Frank** » Wed Sep 07, 2011 10:45 am

Key for me is that we don't want all, or actually even 3 or more, ever going back to zero phase (aka flyback, aka bumping) simultaneously. (Let's call this "anti-flanging phase" property or AFP.) I wonder if we can mechanically find start phases that assure AFP. For instance if you have a first sideband at 1.1x, then it will flange with the carrier every 22pi radians. Another sideband at .9x would flange every 18pi radians, and all three flange together every 396pi. But if the .9 sideband starts at an offset by pi, then is there any amount of time that all three will flange together? And can we add the next sideband at pi/2, 3pi/2, pi/4, 3pi/4, 5pi/4 etc. etc.?

We don't want the same phase at the start of each note. But once we set up this delicate latice of AFP, DON'T throw it away by adding a random phase to each subwave. Instead add a random amount of time, for instance 0-20 minutes' worth of time. AFP would be preserved.

At that point, who cares if the sound repeats every month, day, or even 5 minutes? When its devoid of marked flange effects it'd be hard for even a trained ear to spot a repeat even if the note was played in silence, much less in a composition. And in fact if the ratios DO have common multiples, and yet have AFP, then it'd much easier to guarantee that flange effect won't creep in even after rounding errors, long notes, etc.

logicalhippo · Post by **logicalhippo** » Thu Sep 08, 2011 2:15 am

Urs wrote: (if someone could explain how they did that sawtooth in the alpha Junos... I'm all ears... there are some half-sineish plots in the service manual but d'oh... it doesn't change shape with frequency thus it's not the static highpass - which btw. is behind the VCAs)

The alpha juno is of course a DCO analog machine, which is why the sawtooth looks so unrelated to common digital saw waves. Although the actual wave generation is done inside a custom IC, we can make some pretty strong guesses about how it works based on the previous Juno models and Roland's history of reusing discrete circuit ideas in ASIC form. The idea behind the older Juno oscillators is that the CPU outputs a square wave at the desired frequency, and a CV of the frequency via a D/A converter. Then, in analog-land it integrates the CV and uses a differentiated version of the square as a reset for the saw. I think this is a brilliant way of cheaply constructing a saw wave. This is a popular topic for analog synth bloggers, check out http://www.electricdruid.net/index.php? ... o.junodcos for a more detailed explanation.

tony tony chopper · Post by **tony tony chopper** » Thu Sep 08, 2011 2:49 am

I made quite some researches about supersaw/unison/whatever you call it for my synth, I think the best way to view/understand it, is to see it as a per-harmonic amp modulation.
Then the kind of supersaw you're after is only a matter of how the amp modulation is & how it relates to other harmonics.

Here are 4 "supersaws"
1. with uniform detuning (same gap (in octave, not Hz) between each saw). So you hear the repetition/beating

2. with uneven (can be random gaps or primes) gap between saws, there's less beating & still some phasing. That's the usual supersaw to me, and I haven't found anything that sounds "better". How much random those gaps are matters but not THAT much, what matters a lot is the initial phases of each saw (or how much you advanced in time)

3. same as above but with randomized partial phases. With this you get zero period, it's all smooth. But, because for low notes the harmonics are so close to each other, when you randomize their phases you get noise (which isn't that bad, especially when you filter it). I tried a lot of approaches here, like grouping the dephasing of partials to reduce the noise, I expected better results, but not really.

4. detuning in Hz between partials, for the fun of it. So it becomes a per-partial unison, each "superpartial" still taking the same place. Sounds like shit, but it becomes ok when you limit this unison to the upper harmonics (that are less important for pitch perception).

http://flstudio.image-line.com/help/pub ... unison.mp3

xoxos · Post by **xoxos** » Thu Sep 08, 2011 3:07 am

i don't know if you ever implemented #3, or if you've had much success shifting it with the audio dsp purchasing/using public. it's different though, thanks for sharing

tony tony chopper · Post by **tony tony chopper** » Thu Sep 08, 2011 3:50 am

#3 is in Harmor (when you set the unison phase to the max). It's doable in a classic time-domain synth, providing you precompute a couple of oscillators with their harmonic phases randomized, at note-trigger time, but that depends how the oscillators are generated.
Maybe a bunch of allpasses would work as well to mess up the phases.

Urs · Post by **Urs** » Thu Sep 08, 2011 9:38 am

logicalhippo wrote: The alpha juno is of course a DCO analog machine, which is why the sawtooth looks so unrelated to common digital saw waves. Although the actual wave generation is done inside a custom IC, we can make some pretty strong guesses about how it works based on the previous Juno models and Roland's history of reusing discrete circuit ideas in ASIC form.

Well, the sawtooths of the old Junos look pretty ordinary. It's the highpass filter that bends them. Before highpass/dc blocker they are quite perfect.

The sawtooth of the Alphas otoh does not look nor sound like a sawtooth - even without highpass. Thus I presume that they changed the method to quite a degree (which is also immediately clear by looking at the other waveform types).

Swiss Frank · Post by **Swiss Frank** » Sun Sep 11, 2011 9:58 am

Borogove, I'm comparing your int16 naive supersaw and my double bumpup.

10 billion times looping? How'd you do that with an int count? (Just a second loop 1 to 10?

Since you're saving into a huge huge huge array, whichever algo runs first would have to have the MMU/OS create a bunch of pages for the first algo that ran. When I try that running the two algos A B A B in one run, both B's are the same while the A is much faster the second time.

Just to check: I calculate your incr[] value for middle C to be 777.59, rounding to 777, which I think is 1.31 cents or .4Hz off, right? In the vicinity of that frequency, the 16-bit incr has a precision of about 2.23 cents? In the specific case of a fat supersaw, with so many frequencies, that sounds acceptable, but just curious, is that the standard to which a lot of sound software works?

----------------------------------------------------------

Tests on a 2.6GHz i5 laptop running SuSE Linux.

> g++ supersaw.cxx -O0 -o supersaw
> ./supersaw
bumpup_double: 2.90 sec 104 MHz <-- practice run to create memory pages
naiveSaws_int16: 3.31 sec 91 MHz
bumpup_double: 2.00 sec 150 MHz
naiveSaws_int16: 3.31 sec 91 MHz

> g++ supersaw.cxx -O9 -o supersaw
> ./supersaw
bumpup_double: 1.79 sec 168 MHz <-- practice run to create memory pages
naiveSaws_int16: 1.12 sec 269 MHz
bumpup_double: 0.91 sec 328 MHz
naiveSaws_int16: 1.15 sec 262 MHz

> g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-suse-linux/4.5/lto-wrapper
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.5 --enable-ssp --disable-libssp --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --program-suffix=-4.5 --enable-linux-futex --without-system-libunwind --enable-gold --with-plugin-ld=/usr/bin/gold --with-arch-32=i586 --with-tune=generic --build=x86_64-suse-linux
Thread model: posix
gcc version 4.5.1 20101208 [gcc-4_5-branch revision 167585] (SUSE Linux)

> uname -a
Linux slim.site 2.6.37.1-1.2-desktop #1 SMP PREEMPT 2011-02-21 10:34:10 +0100 x86_64 x86_64 x86_64 GNU/Linux

> lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
CPU socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 42
Stepping: 7
CPU MHz: 800.000
BogoMIPS: 4984.13
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 3072K
NUMA node0 CPU(s): 0-3

Swiss Frank · Post by **Swiss Frank** » Sun Sep 11, 2011 1:30 pm

Borogove,

0) I'm positive BumpUp Double is not 3x faster than Naive Int16 as you reported. I see 10% improvement at C4.

1) I'm not sure why your first attempt to fix the steady climbing you saw in my #2 algo didn't work, but I've definitely fixed it by making the bumpup simply by adding dDecrement * dDelay.

2) I've tried converting the dMilestone/dDelay to ints. Its a speedup of 15% but means the minimum frequency diff is 17 cents at A440. (EG granularity of pitch is a step from 100-sample wavelength to 99-sample.) Seems like too much.

3) Generally, the naive int16 implementation (under full optimization, at least) is spending most of its time just converting the output to doubles. I haven't tried writing a "naive double" algo because it would lack the int16's "magic wraparound" so I think it'd be much slower. Generally, the bumpup method isn't condusive to ints as it needs those decimal places to avoid systematic issues. Conversely, converting BumpUp to int makes it super slow, not super saw.

4) BumpUp really does well in bass notes, where its 33% faster than Naive int16. This is because mostly its just doing one increment (sample number), one decrement (outputlevel), and one compare (is this the sample that I bump up next?) For really high frequencies (eg C8) it is 6x SLOWER than Naive int16.

5) I've tried with narrow detune ranges. Performance of BumpUp is the same.

In summary:

-- BumpUp CPU usage is too dependent on note frequency to be necessarily practical

-- since converstions to/from floating point take so long, if you want int output you should use an int algo, which rules out BumpUp. If you want double output, and can stand the variable CPU load, BumpUp should be much faster at bass frequencies though may use equal or greater CPU than a naive float or double version by C5 or C6.

Borogove · Post by **Borogove** » Mon Sep 12, 2011 9:36 pm

Swiss Frank wrote:Borogove, I'm comparing your int16 naive supersaw and my double bumpup.

10 billion times looping? How'd you do that with an int count? (Just a second loop 1 to 10?

Since you're saving into a huge huge huge array, whichever algo runs first would have to have the MMU/OS create a bunch of pages for the first algo that ran.

I ran 10,000 trials of 1,000,000 samples each. I alloc and free a buffer of 1 million floats = 4MB on each trial, so yeah, there's a small amount of OS mem management overhead on the very first trial, but I expect that to be swamped by the computation. I also didn't do much to ensure the rest of the system was quiet during the tests. But yeah, I should have pulled the memory management out of the tests.

(Done; short/float naive is 79 seconds, 312M samples/sec, float bump is 21 seconds, 476M samples/sec. Try changing yours to 10K trials of 1M samples each. That's -O2, gcc 4.2.1, OSX 10.6; I don't see any difference between -O2 and -O9.)

Just to check: I calculate your incr[] value for middle C to be 777.59, rounding to 777, which I think is 1.31 cents or .4Hz off, right? In the vicinity of that frequency, the 16-bit incr has a precision of about 2.23 cents? In the specific case of a fat supersaw, with so many frequencies, that sounds acceptable, but just curious, is that the standard to which a lot of sound software works?

Correct, my incrs are 769,772,776,777,779,782,786. 2-cent resolution is... let's say barely acceptable back when the JP-8000 was new? I wouldn't tolerate it in modern software, but on a modern CPU it would be as fast [or faster!*] to use a 32-bit int as the per-wave phasor and a 64-bit int as the accumulator.

0) I'm positive BumpUp Double is not 3x faster than Naive Int16 as you reported. I see 10% improvement at C4.

I never saw or reported a 3x difference. Naive short double was 98-99 seconds, bumpup double was 64-69 seconds, about 20 seconds of that was just writing doubles to memory.

(Also, and I'm just being a nitpicky little bitch here, middle C = C4 = ~262Hz; my tests were done at C above, C5, 523Hz.)

3) Generally, the naive int16 implementation (under full optimization, at least) is spending most of its time just converting the output to doubles.

Converting them, or storing them to memory? I had to store to keep the optimizer from just degenerating all the computation to one big no-op, and once I was doing that it wasn't fair to store 16 bits per sample for the naive saw and 32 or 64 for the bump-up. I don't see a "fair" way to resolve that.

The point of benchmarking it, for me, was to establish (a) whether naive saw via integer wraparound was a credible candidate for the JP-8000 implementation, and (b) whether bump-up saw was a drastically faster way to emulate it. Answers (a) yes, modern hardware can do it 2000+ times faster than real time, so cheap 1997 hardware should have been able to handle it, and (b) faster but not drastically so.

The frequency dependence of the bump algorithm is interesting, too.

* In 32-bit x86 native code, 16-bit operations are distinguished with an opcode prefix, so all 16-bit ALU operations are one byte longer than their 32-bit counterparts, so code is bigger, code cache pressure higher, etc.

Swiss Frank · Post by **Swiss Frank** » Mon Sep 12, 2011 11:03 pm

Middle C--I stand corrected.

BTW I would say the bump algo CPU is actually more proportional to note frequency than sampling frequency. It just occured to me that given a fixed sampling rate of 44.1kHz, BumpUp is a relative pig at e.g. C6. But, if instead you ask, given a polyphony and notes to play, what the highest sampling rate you can get with a given CPU, the Bump method could let you go much higher.

Or in other words, by 768kHz, BumpUp double can play C7 faster than Naive Int16.

Interesting comments about 16-bit vs. 32-bit. I haven't really done assembly since 6502 on my Apple II when I was in 7th-8th grade. I've just ported my main work-related library (an in-memory database) to its first 64-bit OS and will start looking at the tradeoff between 32- and 64-bit ints.

By "-O9" I meant to convey maximum optimization and save you a trip to the manual

Most compilers with the -Ox notation accept 9 as an alias for their top optimazation.

Roland Supersaw - any idea how the original was done?