Correcting for that didn't fix the creep-up; I'm not sure what the root of the problem actually is.Swiss Frank wrote:Since the detunes I happened to use (from the earlier discussions) add up to 7.006 total waveforms per waveperiod of the carrier (due to the fact that the higher-frequency sidebands bump up more than once per carrier waveperiod), dividing the "bump ups"' magnitude by 7 (the number of waves) guaranteed creep up.
naive short/double: 98 seconds1. the "Swiss Supersaw Implementation #2" uses doubles throughout, but is already a bit faster than one using 16/32-bit ints. What about converting the SSI#2 to floats, or even 16 or 32 bit ints too? I feel that fat fat sound should disguise any number of rounding errors.
naive short/float: 75 seconds
bump double: 69 seconds
bump float: 35 seconds
For the naive saw, that's got to be almost all memory bandwidth because very little of the work is done in floating point. So ~23 seconds of the float/double bump version difference is also memory bandwidth. I'm not up to converting the bump version to int. Theoretically a Core 2 can do a lot more integer math per clock than float, but I don't know if that holds up in practice.
Reducing the detune didn't make a noticeable difference; the normal code path dominates the runtime, not the shuffle.2. what about comparing SSI#2 to the baseline summation you wrote, but for 1/10 the amount of detune? I think SSI#2 would pick up hugely while the baseline wouldn't benefit.
I would guess that bump style improves its lead as the number of sawtooths increases.3. what about trying with say 3 or 21 sawtooths?
3 saws naive short/float: 33 sec
3 saws bump float: 24 sec
5. I think one might be able to take that sort step completely out of SSI#2. and still end up with something that sounds about right.
Probably unpleasantly noisy, but might be worth a try.
By Amdahl's law it won't make a significant difference.7. better way to swap items in the queue, such as making it a circular queue of indexes into the real array? Then it could be 1 byte to swap not 16, but don't know if that makes up for the overhead of the extra level of indirection.


