KVR Audio

xipix · Post by **xipix** » Mon Feb 12, 2024 9:13 am

Lately, I built some software to time stretch audio. As of this weekend, it's online and I'm looking for opinions of how it sounds. I made two versions, one simple, open-source algorithm and the other "Pro", more sophisticated and optimised.

Version that runs in real time in your browser: https://bungee.parabolaresearch.com/bungee-web-demo

Comparison with other time stretchers: https://bungee.parabolaresearch.com/compare

This is the first time I've shared publicly so please go easy... but I'd love to hear constructive criticism and advice! Thank you

gambero · Post by **gambero** » Mon Feb 12, 2024 11:01 am

Oh wow this is great. On vocals it's really impressive.

VariKusBrainZ · Post by **VariKusBrainZ** » Mon Feb 12, 2024 2:26 pm

Good name for it

quikquak · Post by **quikquak** » Tue Feb 13, 2024 1:54 pm

Very nice. All I can say is that the pitched-down bongos miss the transients sometimes, and pitching up can get quite wobbly, but that all depends on the source material. But overall very good indeed.

Can I ask, is it written with WebGPU - with it's compute capabilities, or is it all JS, or something else?

xipix · Post by **xipix** » Tue Feb 13, 2024 3:02 pm

Thanks for the feedback. slowing down / up-pitching is better tuned than down-pitching / speeding up. Is the wobbliness at extreme pitch changes?
It uses Web Assembler with SIMD128. Which your browser will compile to native SIMD instructions (e.g. SSE or Neon) on your device.

RunBeerRun · Post by **RunBeerRun** » Tue Feb 13, 2024 4:14 pm

Pretty trippy. I like slowing down audio stuff and stretches too!

signalsmith · Post by **signalsmith** » Tue Feb 13, 2024 6:26 pm

Nice! Sounds good, and I love the negative-speed support.

Is the code for the comparison (with the excellent analysis plots) somewhere I could run, or PR? I'd love to add my own to the list.

quikquak · Post by **quikquak** » Tue Feb 13, 2024 8:15 pm

xipix wrote: ↑Tue Feb 13, 2024 3:02 pm Thanks for the feedback. slowing down / up-pitching is better tuned than down-pitching / speeding up. Is the wobbliness at extreme pitch changes?
It uses Web Assembler with SIMD128. Which your browser will compile to native SIMD instructions (e.g. SSE or Neon) on your device.

I'm guessing that lowing the pitch means time stretching faster then paying back slower. I remember transient detection wasn't a flat frequency trigger level in my own work...

https://www.quikquak.com/prod_Copula.html

I’ve never considered Web Assembly before. It looks like it has good browser support. Thanks.

xipix · Post by **xipix** » Wed Feb 14, 2024 9:25 am

signalsmith wrote: ↑Tue Feb 13, 2024 6:26 pm Is the code for the comparison (with the excellent analysis plots) somewhere I could run, or PR? I'd love to add my own to the list.

Nice work. Love the presentation. It should be easy to add your code on my side: your code is published, right? There are already too many different algorithms so I have to think of a better way to present the comparison.... possibly transpose each table or give user checkboxes to show/hide algorithms.

mystran · Post by **mystran** » Wed Feb 14, 2024 9:51 am

It sounds pretty good, but I think for vocals it could benefit from some formant shifting (or preservation in the case you're trying to correct pitch).

CinningBao · Post by **CinningBao** » Wed Feb 14, 2024 10:18 am

Yeah, this is pretty damn good! It manages to maintain transients right down to a playback speed of 0.08. (with no pitch shifting) Beneath that everything gets smeared, in a nice way!

The glockenspiel demo does reveal a bit of aliasing which I guess can be oversampled-out, maybe?

It also doesn't sound like it's being deconstructed and rebuilt with <complex maths>, so is it working in the time domain or frequency domain? It isn't a lovely PSOLA algo is it? that would be very cool. This might explain why there is no formant shifting/correction. but I could be wrong.

xipix · Post by **xipix** » Wed Feb 14, 2024 11:25 am

mystran wrote: ↑Wed Feb 14, 2024 9:51 am It sounds pretty good, but I think for vocals it could benefit from some formant shifting (or preservation in the case you're trying to correct pitch).

Thanks. Yes, no formant correction yet, this should be easy to add.

xipix · Post by **xipix** » Wed Feb 14, 2024 11:33 am

CinningBao wrote: ↑Wed Feb 14, 2024 10:18 am Yeah, this is pretty damn good! It manages to maintain transients right down to a playback speed of 0.08. (with no pitch shifting) Beneath that everything gets smeared, in a nice way!

The glockenspiel demo does reveal a bit of aliasing which I guess can be oversampled-out, maybe?

It also doesn't sound like it's being deconstructed and rebuilt with <complex maths>, so is it working in the time domain or frequency domain? It isn't a lovely PSOLA algo is it? that would be very cool. This might explain why there is no formant shifting/correction. but I could be wrong.

Thank you

Time vs frequency domain: bit of both and more. I'd like to say it's more "sophisticated" than complex.

At which settings do you hear the aliasing and what does it sound like? The web demo uses cheap and ugly resampling by default but there's a near-perfect resampler built in. I'll look at making this default.

quikquak · Post by **quikquak** » Sun Feb 18, 2024 5:14 pm

xipix wrote: ↑Wed Feb 14, 2024 11:33 am
CinningBao wrote: ↑Wed Feb 14, 2024 10:18 am Yeah, this is pretty damn good! It manages to maintain transients right down to a playback speed of 0.08. (with no pitch shifting) Beneath that everything gets smeared, in a nice way!

The glockenspiel demo does reveal a bit of aliasing which I guess can be oversampled-out, maybe?

It also doesn't sound like it's being deconstructed and rebuilt with <complex maths>, so is it working in the time domain or frequency domain? It isn't a lovely PSOLA algo is it? that would be very cool. This might explain why there is no formant shifting/correction. but I could be wrong.
Thank you Time vs frequency domain: bit of both and more. I'd like to say it's more "sophisticated" than complex.

At which settings do you hear the aliasing and what does it sound like? The web demo uses cheap and ugly resampling by default but there's a near-perfect resampler built in. I'll look at making this default.

I can't blame you for being evasive in your answers, but you make it sound like you don't know what 'complex maths' means. Which is not true because you wrote this yourself, right?

signalsmith · Post by **signalsmith** » Mon Feb 19, 2024 9:29 am

CinningBao wrote: ↑Wed Feb 14, 2024 10:18 amis it working in the time domain or frequency domain?

The open-source version looks like it resamples each grain/block first (for the pitch change), and then groups the spectrum by local peaks. The peak of each group is phase-vocoder processed (to compensate for the resampling, and also for time-stretch), while the rest is phase-locked to those peaks.

If so, that's a pretty clean freq-domain algorithm with good results, although (since the peak is rounded to an integer index) I wouldn't be surprised by a bit of pitch-wobbliness in some cases.

There might be fancier time-domain stuff in the Pro version.

New "bungee" audio time stretcher - what do you think?