New "bungee" audio time stretcher - what do you think?

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

signalsmith wrote: Mon Feb 19, 2024 9:29 am
CinningBao wrote: Wed Feb 14, 2024 10:18 amis it working in the time domain or frequency domain?
The open-source version looks like it resamples each grain/block first (for the pitch change), and then groups the spectrum by local peaks. The peak of each group is phase-vocoder processed (to compensate for the resampling, and also for time-stretch), while the rest is phase-locked to those peaks.

If so, that's a pretty clean freq-domain algorithm with good results, although (since the peak is rounded to an integer index) I wouldn't be surprised by a bit of pitch-wobbliness in some cases.

There might be fancier time-domain stuff in the Pro version. 🤷
Yeah it does sound pretty good. Thanks for that deconstruction, I hope you haven't revealed any industry secrets! ;)

I wonder how the OP is able to monetise this. Or how it might be usable in a plugin. Is there much of a buffer? Other than selling the code to a DAW manufacturer, how do people go about taking advantage of their newly developed pitch/time manipulation code?

Post

xipix wrote: Wed Feb 14, 2024 9:25 am Nice work. Love the presentation. It should be easy to add your code on my side: your code is published, right?
Yeah - I linked the main project page above, but there's a GitHub mirror as well. Let me know (here/email/whatever) if you have any trouble getting it working.

I've just added an example-cmd branch which (on Mac at least!) works with 16-bit WAVs from the command-line, in case that's useful.
Last edited by signalsmith on Mon Feb 19, 2024 11:03 am, edited 2 times in total.

Post

CinningBao wrote: Mon Feb 19, 2024 10:14 am Yeah it does sound pretty good. Thanks for that deconstruction, I hope you haven't revealed any industry secrets! ;)
The basic building-blocks aren't exactly top-secret (plus there are loads of options, although I packed as many as I could into a 45-minute presentation).

(Specifically in this case: 9:39 for resampling before block-based processing when adapting a time-stretch for pitch-shifting, 16:09 for segmenting based on spectral peaks and phase-rotating those segments together, 20:05 for using that segmentation to make a time-stretcher even if you're not moving segments up/down the spectrum.)

But I wouldn't underestimate the importance of combining them well, or the implementation/tuning details which really make a difference.

Post

signalsmith wrote: Mon Feb 19, 2024 9:29 am The open-source version looks like it resamples each grain/block first (for the pitch change), and then groups the spectrum by local peaks. The peak of each group is phase-vocoder processed (to compensate for the resampling, and also for time-stretch), while the rest is phase-locked to those peaks.

If so, that's a pretty clean freq-domain algorithm with good results, although (since the peak is rounded to an integer index) I wouldn't be surprised by a bit of pitch-wobbliness in some cases.
The open source version is a pretty decent phase vocoder implementation. Yes, peaks are detected at integer bin indices but the algorithm extrapolates phase temporally so pitch should be spot on.

Post

signalsmith wrote: Mon Feb 19, 2024 10:48 am
CinningBao wrote: Mon Feb 19, 2024 10:14 am Yeah it does sound pretty good. Thanks for that deconstruction, I hope you haven't revealed any industry secrets! ;)
The basic building-blocks aren't exactly top-secret (plus there are loads of options, although I packed as many as I could into a 45-minute presentation).

(Specifically in this case: 9:39 for resampling before block-based processing when adapting a time-stretch for pitch-shifting, 16:09 for segmenting based on spectral peaks and phase-rotating those segments together, 20:05 for using that segmentation to make a time-stretcher even if you're not moving segments up/down the spectrum.)

But I wouldn't underestimate the importance of combining them well, or the implementation/tuning details which really make a difference.
Well you, sir, seem to know what the hell you're talking about :) I just watched that whole vid, at 9:30 am, that's a way to start the day..

I now understand a little more (to a degree) about the tools available to a dev and the different ways to approach the problems encountered when manipulating time & pitch. And nice dancing, the music deserved it :)

Thanks for the breakdown, it was very interesting. I'll probably end up going through some more.. I'm sure they'll help me on my incredibly slow path to releasing some audio manipulation gubbins. :)

Post

xipix wrote: Mon Feb 19, 2024 11:45 pmYes, peaks are detected at integer bin indices but the algorithm extrapolates phase temporally so pitch should be spot on.
This is true for simpler sounds.

However, in the case where there's a lot going on, the phase measurements for a particular harmonic will be affected by other nearby stuff. This isn't a big issue, but it means the frequency estimates will be slightly different depending which bin you use.

If it then gets into a situation where the selected peak (for a pure tone) flutters between two bin indices, then that means the frequency will also flutter - and we're more sensitive to pitch changes than absolute pitch.

That's a guess, though, and it would only happen in specific situations even if that's the case.

Post

signalsmith wrote: Tue Feb 20, 2024 12:23 pm This is true for simpler sounds.

However, in the case where there's a lot going on, the phase measurements for a particular harmonic will be affected by other nearby stuff. This isn't a big issue, but it means the frequency estimates will be slightly different depending which bin you use.
The amount of vertical interference between bins depends a lot on the analysis window you use. The Bungee open source project uses a Hann window which, for its simplicity, is very well behaved as an analysis window in a phase vocoder. Nice analysis window means tidy spectrum. Then, if something does manage to mess with the phase of a tone or harmonic, the chances are that the tone, and its frequency, aren't subjectively important.
signalsmith wrote: Tue Feb 20, 2024 12:23 pm If it then gets into a situation where the selected peak (for a pure tone) flutters between two bin indices, then that means the frequency will also flutter - and we're more sensitive to pitch changes than absolute pitch.
Bin hopping of the peak and phase measurement point doesn't seem to matter much. Frequency measurement and synthesis is based mainly upon measured horizontal phase change. The actual peak bin index used for phase measurement (plus or minus a few) doesn't change this much, certainly not for tones where you'd notice vibrato artefacts. Again, having a nicely behaved window helps.

Post

signalsmith wrote: Mon Feb 19, 2024 10:27 am I've just added an example-cmd branch which (on Mac at least!) works with 16-bit WAVs from the command-line, in case that's useful.
Thanks, had to tweak a couple of things for it to work with my 16-bit WAV and I dropped the results into a special build of my comparison that is here for now: https://bungee.parabolaresearch.com/ss/compare

This page is getting cluttered now so most likely what I'll do is add checkboxes to show only the subset of algorithms that the user wants to compare.

Post

xipix wrote: Tue Feb 20, 2024 5:46 pm Thanks, had to tweak a couple of things for it to work with my 16-bit WAV and I dropped the results into a special build of my comparison that is here for now: https://bungee.parabolaresearch.com/ss/compare
That's awesome - thanks so much for putting that in and doing the analysis. :D

Geraint

Post

AUTO-ADMIN: Non-MP3, WAV, OGG, SoundCloud, YouTube, Vimeo, Twitter and Facebook links in this post have been protected automatically. Once the member reaches 5 posts the links will function as normal.
xipix wrote: Tue Feb 20, 2024 5:46 pm
Thanks, had to tweak a couple of things for it to work with my 16-bit WAV and I dropped the results into a special build of my comparison that is here for now: https://bungee.parabolaresearch.com/ss/compare (https://bungee.parabolaresearch.com/ss/compare)

This page is getting cluttered now so most likely what I'll do is add checkboxes to show only the subset of algorithms that the user wants to compare.
Have you changed the link? I'm interested in hearing the comparisons between Bungee and Signalsmith Stretch but that link doesn't work anymore (and the normal page doesn't include Signalsmith).

Post

Binyamin wrote: Mon Feb 26, 2024 9:10 pm Have you changed the link? I'm interested in hearing the comparisons between Bungee and Signalsmith Stretch but that link doesn't work anymore (and the normal page doesn't include Signalsmith).
This is fixed now, you can see a comparison of all stretchers, including Signalsmith's library, here: https://bungee.parabolaresearch.com/compare

Post

signalsmith wrote: Mon Feb 19, 2024 9:29 am The open-source version looks like it resamples each grain/block first (for the pitch change), and then groups the spectrum by local peaks.
Where did you find the open source version of Bungee to make this comment? There is no link to it anywhere on the Bungee webpage nor in GitHub that I can find.
xipix wrote: Mon Feb 12, 2024 9:13 am I made two versions, one simple, open-source algorithm ...
Did you remove the link to the open source version? Would like to try it out!

Post

Fender19 wrote: Tue Mar 05, 2024 5:10 pm Where did you find the open source version of Bungee to make this comment? There is no link to it anywhere on the Bungee webpage nor in GitHub that I can find.
There's a GitHub icon in the top-right of the site. Resampling setup here, and slightly below it switches between resampling the input/output.
xipix wrote: Sun Mar 03, 2024 9:16 am This is fixed now, you can see a comparison of all stretchers, including Signalsmith's library, here: https://bungee.parabolaresearch.com/compare
Thanks again for this. 😄

I was wondering why you tested some quite strong time-stretch ratios (0.25-1.5x) but only modest pitch-shifts (±1 semitone, which is like 1.06x). You need ±6 semitones to retune something to an arbitrary key.

I'm definitely biased here (because Signalsmith Stretch does its pitch- and time-stretch separately, it's not done with resampling), and I absolutely appreciate that you've put loads of work into the comparisons, as well as the library itself! Plus, people can always listen to the web-based version. So I was mostly curious, and thought I'd ask if there's some reasoning behind it.

Post

signalsmith wrote: Wed Mar 06, 2024 12:12 am I was wondering why you tested some quite strong time-stretch ratios (0.25-1.5x) but only modest pitch-shifts (±1 semitone, which is like 1.06x). You need ±6 semitones to retune something to an arbitrary key.
No particular reason, nothing to hide. :) Bungee handles pitch shifts well, up to a couple of octaves and probably beyond, I never tested. That said, however you do it, most content sounds quite unnatural when shifted beyond a couple of semitones and since this is a listening test, I picked +1 and -1. For transposition to an arbitrary key, yes you may need up to 6 semitones shift, but at those extreme, users don't expect miracles.

It's more likely that someone wants to bring some music into an easier key for a particular instrument or ensemble, or to nudge an accompaniment to match a vocalist's range. In these cases, we're more likely to be shifting a couple of semitones at most.

Yes, the link to the open source implementation is this: https://github.com/kupix/bungee

Post Reply

Return to “DSP and Plugin Development”