Roland Supersaw - any idea how the original was done?

DSP, Plugin and Host development discussion.
Post Reply New Topic
RELATED
PRODUCTS

Post

Swiss Frank wrote:A question about the killer article at: http://www.ghostfact.com/jp-8000-supersaw/

"This makes sense when you think about it - efficiency was king, and it's hard to beat the efficiency of two extra adds per sample - things are a lot easier when you embrace aliasing rather than reject it, huh?"

I don't get it-two extra adds to do what? Or do you mean the 7 sawtooths really are just 7 naive, non-BWL sawtooths (which I guess may only take a few additions each)?! Thats the secret?! And if so why only for non-BWL sawtooths?
That must be two adds per wave per sample. One add to increment the phase of the naive saw, one add to sum the waves together.

The spectra shown in that article strongly suggest naive aliased saws with a highpass, but I'd like to see the plots for higher pitched notes where the aliasing would be more dominant.
Image
Don't do it my way.

Post

Borogove wrote:
Swiss Frank wrote:A question about the killer article at: http://www.ghostfact.com/jp-8000-supersaw/

"This makes sense when you think about it - efficiency was king, and it's hard to beat the efficiency of two extra adds per sample - things are a lot easier when you embrace aliasing rather than reject it, huh?"

I don't get it-two extra adds to do what? Or do you mean the 7 sawtooths really are just 7 naive, non-BWL sawtooths (which I guess may only take a few additions each)?! Thats the secret?! And if so why only for non-BWL sawtooths?
That must be two adds per wave per sample. One add to increment the phase of the naive saw, one add to sum the waves together.

The spectra shown in that article strongly suggest naive aliased saws with a highpass, but I'd like to see the plots for higher pitched notes where the aliasing would be more dominant.
My guess is that there was a multiply or two involved in there, as odds are this was in fixed point, where you would need to scale before summing in order to avoid clipping. If multiplications were dear, a simple bit shift might have been used to scale the volume down.

Nowadays, using the ramp->parabolic shaping->differentiate trick looks like a good way of reducing aliasing for supersaw generation on the cheap. But odds are good that people will be nostalgic for the "classic" aliased supersaw sound.

Sean Costello
Last edited by valhallasound on Fri Sep 02, 2011 10:52 pm, edited 1 time in total.

Post

valhallasound wrote:
Borogove wrote: That must be two adds per wave per sample. One add to increment the phase of the naive saw, one add to sum the waves together.
My guess is that there was a multiply or two involved in there, as odds are this was in fixed point, where you would need to scale before summing in order to avoid clipping. If multiplications were dear, a simple bit shift might have been used to scale the volume down.
I don't have any experience with fixed-point DSP hardware. Do their instruction sets generally give you multiple options for word width, and is addition wrap-around or saturated? On x86, for example, you could do each of the seven saws as a 16-bit 2s-complement signed add, summing the result into a 32-bit accumulator, and bit shift only once when you were done.
Image
Don't do it my way.

Post

Borogove wrote:
valhallasound wrote:
Borogove wrote: That must be two adds per wave per sample. One add to increment the phase of the naive saw, one add to sum the waves together.
My guess is that there was a multiply or two involved in there, as odds are this was in fixed point, where you would need to scale before summing in order to avoid clipping. If multiplications were dear, a simple bit shift might have been used to scale the volume down.
I don't have any experience with fixed-point DSP hardware. Do their instruction sets generally give you multiple options for word width, and is addition wrap-around or saturated? On x86, for example, you could do each of the seven saws as a 16-bit 2s-complement signed add, summing the result into a 32-bit accumulator, and bit shift only once when you were done.
Saturation is common, and is sometimes an option you can choose. It is nice to be able to leave saturation off - say, for allowing a sawtooth to wrap around.

My guess is that Roland used some ASICs in the JP8000, so the accumulator width is probably tweaked to exactly what they want to work with versus the cost of the chip. Although they might be using the DREAM chips (from Atmel, or whoever makes them now). I know that Motorola DSPs had a 56-bit accumulator, and a 24-bit word width, so there was plenty of headroom there for most operations. The Blackfin had less room, but the details have blissfully left my mind at the moment.

Sean Costello

Post

I'm pretty sure it's the parabolic trick. I couldn't get naive or even bandlimited saws to sound like the JP8080 we have around, no matter what highpass filter was used. Just squaring the phase was instant gratification with a simple 2-pole hp.

OTOH one can't just open the box and tap into the circuitry. More factors might be involved.

Urs

Post

Urs wrote:I'm pretty sure it's the parabolic trick. I couldn't get naive or even bandlimited saws to sound like the JP8080 we have around, no matter what highpass filter was used. Just squaring the phase was instant gratification with a simple 2-pole hp.
Interesting. I presume that the HP takes care of the differentiation?

EDIT: So do you think that the Roland folks had figured out the parabolic trick back in the late 1990's? I wouldn't be surprised, as this sort of convergent evolution happens a lot. I know a lot of things that got published as new, such as the saturation modeling in the Moog filter, were commonly in use for a long time before being published. I had never heard of the parabolic trick before those papers were published, though - it seems like a great trick.

Sean Costello

Post

You guys obviously know what you're talking about, but if someone's trying to answer my questions, the answers are so far over my head I don't even recognise them as such 8-)

1. Is the conclusion of that article, which is based on this thread, that there was no special trick behind the supersaw (other than the HPF(s)? That the multiple sawtooth output is most likely produced by simply summing multiple sawtooths?

2. And if so, why stop at multiple sawtooths? Why not multiple squares, sines, non-square pulses? Why not lookups in waveform ROMs or BWL saws? And why always 7? Why do they let you adjust the volume and spread of the sidebands but not the count?

I'm not an audio or math expert, but the fact that even Fantom G and V-Synth GT still only support "super-" for "-saw" tells me that in step one there is indeed some kind a trick, involving counters or loops...



BTW,

3. anyone have a good reference for this "parabolic trick"?

Post

Swiss Frank wrote: 1. Is the conclusion of that article, which is based on this thread, that there was no special trick behind the supersaw (other than the HPF(s)? That the multiple sawtooth output is most likely produced by simply summing multiple sawtooths?
Yes. Urs thinks it's DPW saw ("parabolic trick") but the spectral plots in that article suggest naive saw.
2. And if so, why stop at multiple sawtooths? Why not multiple squares, sines, non-square pulses? Why not lookups in waveform ROMs or BWL saws? And why always 7? Why do they let you adjust the volume and spread of the sidebands but not the count?
Saw is uniquely simple in that the phasor is the output*. No conditionals, no lookups. Nothing particularly special about 7, that was probably just the amount they comfortably had CPU power for.
3. anyone have a good reference for this "parabolic trick"?
Google 'differentiated parabolic waveform' - you generate a naive saw, square it, then differentiate and amplify. The aliasing is applied to the squared wave, which has a sharper spectral falloff than the naive saw, so the alias frequency components are reduced in amplitude.

* shut up modulo scale and offset nerrrrrrrd depending on how you manage your phasor shut up
Image
Don't do it my way.

Post

Swiss Frank wrote:Why not multiple squares, sines, non-square pulses?
Well, sawtooths have odd and even harmonics. Thus there's more going on in the spectrum with detuned saws than with detuned squares or other classical waveforms.

You get a lot of movement with narrow pulses. But these have pronounced harmonics and thus might cause more audible aliasing...

@Russell: I'll try naive sawtooths again. In A/B comparisons the JP output just looks more "spikey" on the oscilloscope which let me believe that they simply square and highpass filter. Seeing the tricks they pulled off on earlier sawtooths (D50, aJuno), I do think that Roland has a great history of making sawtooth oscillators from virtually *anything*.

(if someone could explain how they did that sawtooth in the alpha Junos... I'm all ears... there are some half-sineish plots in the service manual but d'oh... it doesn't change shape with frequency thus it's not the static highpass - which btw. is behind the VCAs)

Post

OK, I think I've got this one.



> Saw is uniquely simple in that the phasor is the output.

To be specific, a naive saw.

Actually think of it this way. The output of a naive saw is simply a constantly-decrementing voltage, that is periodically "bumped up." Whether its one sawtooth or 100 simultaneous saws at different frequencies (normalized for volume), all you do is decrement the output each cycle and count how long until the next bump-up.

That is NOT true of BWL saws, or sines, waveROMs and so on.

To produce a single sawtooth, you bump it up every "wavelength" samples.

To produce what LOOKS like two sawtooths, you merely have to bump it up in a syncopated pattern.

Without loss of generality, lets take A4 = 441Hz tuning, and 44.1kHz sampling. Each waveform should be 100 samples.

Single saw? Reset by the height of a waveform every 100 samples. Duh.

Two saws seperated by 1 cent? (4.41Hz?) Bump up half a waveform height after a delay of 99 samples, then again after a delay of 1 sample. Then in 98, then in 2. Then in 97, then 3.

Three saws seperated by 1 and 2 cents? (4.41Hz then 8.82Hz?) Bump up by one third a waveform height after 97 samples, 1 sample, 2. Next cycle is 94, 2, 4.

These delays could be pregenerated into an delay lookup table: 97 1 2 94 2 4 etc. Since the delays follow a nice pattern, I suspect there might be a way to reproduce the pattern with a couple variables being incremented and moduloed too. Now, for 7 saws detuned by mutually prime numbers, it should be theoretically be a very long time before the phasing repeats, which would call for a long lookup table. OTOH, has anyone said Roland actually has a long time before repeat? And with this sound SOOOO fat, what if you just jumped randomly into a random place in the delay lookup table now and then? There's probably ways to do that that aren't audibly discontinuous. And finally if they're all a bit out of phase you can engineer it such that they never all cancel each other out either, so a short repeat wouldn't be audible. If the table "should" be long but you just use a short one over and over, it would introduce some super-low frequencies. No problem if you have a handy HPF though!

-------------------------

Here's a C program (untested--in fact incomplete) that would output such a wave as floating point numbers [-1..1] (plus some DC?) to stdout:

Code: Select all

int main() {

  int    iWaves = 3; iDelayPtr = 0;  // iWaves is coordinated with adDelay
  double dSampFreq = 44100.0, dNoteFreq = 523.251; // CD-quality middle C
  double adDelay[] = [.97, .01, .02, .94, .02, .04 ........]; // in percent
  double dOutput = (iWaves-1.0)/iWaves, dCounter = 0; // counter is in samples
  
  while (1) {

      dOutput  -= 2.0 * dNoteFreq / dSampFreq;
      dCounter -= 1;

      while ( dCounter < 0 ) {
           dOutput   += 2.0 / iWaves;
           dCounter  += adDelay[ iDelayPtr++ ] * dSampFreq / dNoteFreq;
           iDelayPtr %= sizeof( adDelay ) / sizeof( double );
      }

      printf( "%f\n", dOutput );
  }
}
It seems to me you could generate a perfect approximation of an arbitrary number of naive sawtooths like this, with maybe only a little more CPU than 1 plain old naive sawtooth. Yes, some individual samples will hit the internal loop, but in our 441Hz example, with a 100-sample waveform, we only execute the internal while 3 times per 100 samples. Maybe 5% overhead over a single sawtooth.

In fact, the fact that the sound is so thick and detuned means its not that important to get the numbers exactly right. Its a license to cut corners. And if you cut a lot corners, you could end up with a lot of DC. For instance, say you initialize the pointer into the lookup table randomly so each note doesn't start with the same phase. You may find yourself bumping up after 97% of the waveform goes by... or 1%. So if you initialize the initial output value to the same value of 0 or 1, your output would have random DC in it. Rounding errors from the repeated addition over time would also introduce DC. Hence a need to HPF (sound familiar?).

There'd be similar algos that would simulate N squares or tris too. Squares: instead of auto-decrement with occasional bump-ups, would hold their output with the occasional bump up OR down. (An extra flag in the lookup table.) Triangles, instead of always decrementing by a fixed value, would decrement or increment by a delta. Instead of bumping the output up/down, the delta would bump up/down. OTOH, super-tri? Super-square? Doesn't sound that appealing.

--------------

No idea if this is what they're doing, but it would explain 1) why they only do it with naive saws, 2) why there's a mandatory and secret HPF, and 3) why they went from right to 7 waveforms instead of say 2-5 with the first model, then more later (because the CPU load may only be slightly affected by number of saws).

Post

Supersaw Implementation #2

Compared to the above implementation, this one has the benefit of using a small amount of memory no matter how many saws we have, and will have an extremely long repeat (eg days, years?) if you select sidebands with a high-enough least common multiple. It has the cost of a very few samples requiring significantly more time to calculate, and the total overhead may be twice the first implementation's.

Instead of pre-calculating delays between sawtooths' rising edges (which I call "bump ups"), this #2 algo 1)keeps track of how many samples the note has sounded; 2) for each of n sawtooths we precalculate how many samples long the waveform is (eg about 169 for middle C); 3) we keep a sorted, circular buffer of at which sample number "milestone" the next sawtooth to bump up will bump up the output.

Don't be freaked out by the hairy "while ( iMilestone )" loop. We only go
in there when a sawtooth is resetting to phase 0. With the sample frequencies
in this demo that only happens (.893+.939+.980+1.0+1.020+1.064+1.110) = 7.006 times per waveform, or about 4.1% of the samples for middle C. Furthermore, you can read that while as an "if" except for the rare chance that two of those 7.006 occurances are happening at the same sample. (To spread the pain out, you could even make a rule that you only bump one saw per sample. This code will work fine if you replace that while with an if.)

Furthermore the inside "while j" sorting loop only has to swap even once about 43% of the time we bump the highest saw (with these sample detunes) and much less for the other detunes.

In short, while this means a few cycles out a million will take A LOT longer to calculate, any set of say 128 samples will take a pretty reliable time and more specifically one only a very little higher than a single naive saw.




Code: Select all

typedef struct {

  double dMilestone; // after how many samples do we bump up?
  double dDelay;     // and how much to increase dMilestone when we do.

} BumpUp_T;



int main() { 

  int    iWaves = 7, iBase = 0; 
  double dSampFreq = 44100.0, dNoteFreq = 523.251; // CD-quality middle C
  double dOutput = 1.0, dSampleCount= 0;  // 1.0 OK if all saw start phase 0
  BumpUp_T abu[ iWaves ];
 
  abu[ 0 ].dDelay = dSampReq / dNotFreq * 0.893;
  abu[ 1 ].dDelay = dSampReq / dNotFreq * 0.939;
  abu[ 2 ].dDelay = dSampReq / dNotFreq * 0.980;
  abu[ 3 ].dDelay = dSampReq / dNotFreq * 1.000;
  abu[ 4 ].dDelay = dSampReq / dNotFreq * 1.020;
  abu[ 5 ].dDelay = dSampReq / dNotFreq * 1.064;
  abu[ 6 ].dDelay = dSampReq / dNotFreq * 1.110;

  // This could either be randomized, or specificly selected so no three
  // or more sawtooths all reset at the same time.  For this demo, we
  // have all saws start at phase 0 at note-down.

  for ( int i = 0; i < iWaves; i++ )
      abu[ i ].dMilestone = abu[ i ].dDelay;

  // Since this example has created the queue items sorted by dMilestone,
  // no initial sort is required.



  // Now play the note forever.
  
  while (1) { 

      dOutput      -= 2.0 * dNoteFreq / dSampFreq; 
      dSampleCount += 1; 

      // Although this loop looks really hairy, it only acts once (or
      // extremely rarely, for the saws higher than the main freq, twice)
      // per waveform.  So for Middle C (169 samples or so) and 7 saws,
      // we only go into this loop 4-5% of the time.  And the only reason
      // its a loop is to handle rare case >=2 saws both wrapping to phase 0
      // simultaneously, which only happens 3% of the time that we go in.
 
      while ( dSampleCount >= abu[ iBase ].dMilestone ) {

           // Bump up the output waveform.
           dOutput   += 2.0 / iWaves;

           // Find the next time we have to bump up for this sawtooth.
           abu[ iBase ].dMilestone += abu[ iBase ].dDelay;

           // Lets now say the front of the queue is the next item.  This
           // has the simultaneous effect of "pulling" the milestone we just
           // passed off the front, and "pushing" the same milestone thats
           // been incremented by a full waveperiod onto the end of the queue.

           iBase = ( iBase + 1 ) % iWaves;



           // Since the queue is circular, the first element is now the last;
           // since we just added a full delay to it its also PROBABLY in
           // the right place.  However if we've just recently bumped 1 or
           // more lower-frequency saws, and put them on the back of the
           // queue, very rarely their longer period means the saw we've just
           // bumped will need bumping again, BEFORE the longer-period saw(s).
           // In the VERY worst case, if we've very recently bumped ALL other
           // saws, and are now bumping the highest-frequency/shortest period
           // saw, we may have to do this bubble iWaves-1 times.

           // Don't worry, even with big detunes we'll only have to swap
           // once about 10% of the time, twice about 1%, three times .01%.
           // And considering we only even get here 7 samples out of 169
           // in the first place (for Middle C, 7 saws) its a very rare
           // operation.

           while ( int j = iWaves - 2; j >= 0; j-- ) {
               int iHigh = ( iBase + j + 1 ) % iWaves;
               int iLow  = ( iBase + j     ) % iWaves;
               if ( abu[ iLow ].dMilestone > abu[ iHigh ].dMilestone ) {
                   BumpUp_T buTemp = abu[ iLow ];
                   abu[ iLow ] = abu[ iHigh ];
                   abu[ iHigh] = abu[ buTemp ];
               } else
                   break;
           }
      } 



      // OK, whether we bumped our saw up or not, output our output value.

      printf( "%f\n", dOutput ); 
  } 
}

Post

Swiss Frank wrote:Supersaw Implementation #2
I corrected a couple of compile errors in that, limited the loop, hoisted the print out of the loop, and benchmarked it against an equivalent naive saw summation (computing each of the 7 saws in 16-bit integer and summing to 32-bit integer before conversion to double).

The good news for your implementation: it's faster - to produce 10 billion samples, your method takes 64 seconds, mine 99.

The bad news - there seems to be a bug in the result; my output stays within the range +/-0.879 (if all the waves hit flyback at the same time, I'd expect +/- 1.0), but yours ranges from a reasonable -0.780 to +89.779. I don't know if that means you're occasionally doing an extra bump-up, or if there's some other bug, or if that's an expected result that you want the HPF to deal with. The first three samples out of your code look fine - they're the same as mine except I ramp up from 0 and you ramp down from 1.

Code: Select all

double* naiveSaws( int count )
{
    double* result = new double[count];

    int    iWaves = 7; 
    double dSampFreq = 44100.0, dNoteFreq = 523.251; // CD-quality middle C 

    const double FULL_SWING = 65536.0;
    const double DESCALE = 1.0 / (iWaves*32767.0);

    // Manage the per-wave levels and per-sample increments
    // in 16-bit integers. This takes advantage of integer 
    // math wraparound, but might add a little less-than-CD-
    // quality grit. On a more modern machine you could scale
    // everything up by another 2^16, compute each wave in a 
    // 32-bit int and accumulate into 64-bit. 

    short incr[ iWaves ];
    short phasor[ iWaves ];

    for (int i = 0; i < iWaves; i++)
    {
        phasor[i]= 0;
    }

    incr[0] = short( FULL_SWING * dNoteFreq * 0.893 / dSampFreq );
    incr[1] = short( FULL_SWING * dNoteFreq * 0.939 / dSampFreq );
    incr[2] = short( FULL_SWING * dNoteFreq * 0.980 / dSampFreq );
    incr[3] = short( FULL_SWING * dNoteFreq * 1.000 / dSampFreq );
    incr[4] = short( FULL_SWING * dNoteFreq * 1.020 / dSampFreq );
    incr[5] = short( FULL_SWING * dNoteFreq * 1.064 / dSampFreq );
    incr[6] = short( FULL_SWING * dNoteFreq * 1.110 / dSampFreq );

    // Now play the note forever. 

    for (int x = 0; x < count; x++)
    {
        int iOutput;
        // unrolled for performance                     
        phasor[0] += incr[0];
        phasor[1] += incr[1];
        phasor[2] += incr[2];
        phasor[3] += incr[3];
        phasor[4] += incr[4];
        phasor[5] += incr[5];
        phasor[6] += incr[6];

        iOutput = phasor[0];
        iOutput += phasor[1];
        iOutput += phasor[2];
        iOutput += phasor[3];
        iOutput += phasor[4];
        iOutput += phasor[5];
        iOutput += phasor[6];

        // iOutput is now in the range +/- (7*32K)
        double dOutput = double(iOutput)*DESCALE;
        result[x] = dOutput;  
    } 
      
    return result;
} 
Image
Don't do it my way.

Post

I am guessing that DC is coming from the fact that the sum of the detune factors is a little more than 7, and the per sample sloping isn't accounting for that, so you're getting more bump ups than you want.

Edit: Tried correcting for that - it didn't help.
Image
Don't do it my way.

Post

Wow, thx Borogove.

That was supposed to be pseudocode--suprised it worked at all! Very impressive determination on your part.

Sounds like you've already diagnosed one problem. Since the detunes I happened to use (from the earlier discussions) add up to 7.006 total waveforms per waveperiod of the carrier (due to the fact that the higher-frequency sidebands bump up more than once per carrier waveperiod), dividing the "bump ups"' magnitude by 7 (the number of waves) guaranteed creep up.



OK, a few more ideas:

1. the "Swiss Supersaw Implementation #2" uses doubles throughout, but is already a bit faster than one using 16/32-bit ints. What about converting the SSI#2 to floats, or even 16 or 32 bit ints too? I feel that fat fat sound should disguise any number of rounding errors.

2. what about comparing SSI#2 to the baseline summation you wrote, but for 1/10 the amount of detune? I think SSI#2 would pick up hugely while the baseline wouldn't benefit.

3. what about trying with say 3 or 21 sawtooths?

4. I wonder how long the delay table for SSI#1 would need to be for 7 sawtooths. Regardless, the CPU usage shouldn't vary too much on sawtooth count, so it could be benchmarked with the given 6-delay minitable.

5. I think one might be able to take that sort step completely out of SSI#2. and still end up with something that sounds about right. The impact would be, in the cases that a higher-frequency saw "should" "bump up" twice before a lower-freq one that just bumped up, the higher-freq waveform's bumpup would instead happen simultaneously. This wouldn't introduce (more) DC because the right number of bumps would happen, merely that some would be a little delayed.

6. cache abu[ iBase ].dMilestone ) in an unsigned integer variable called like iNextMilestone, and make iCount an unsigned integer? (Wraps every 27 hours at CD freq) Could speed up a tiny amount?

7. better way to swap items in the queue, such as making it a circular queue of indexes into the real array? Then it could be 1 byte to swap not 16, but don't know if that makes up for the overhead of the extra level of indirection.

Post

Hold on: are the detunes of .893, .939, .980, 1.0, 1.020, 1.064, 1.110 (from http://www.ghostfact.com/jp-8000-supersaw/) actually reasonable? That means the top/bottom sidebands are TWO SEMITONES away from the carrier. When I first read this I assumed this meant more like 11 CENTS.

The "Swiss Supersaw Implementation #2" overhead over a single naive saw is probably very very proportional to detune, so getting this number accurate/realistic is critical in seeing whether SSI#2 has good performance.

Even if that setting is an actual Roland setting, is that commonly used, or do people use the lower (and much narrower) settings?

Post Reply

Return to “DSP and Plugin Development”