## Need Help with Time-Stretching WSOLA Algorithm

DSP, Plug-in and Host development discussion.
Tekker2
KVRer
3 posts since 14 Sep, 2019
AUTO-ADMIN: Non-MP3, WAV, OGG, SoundCloud, YouTube, Vimeo, Twitter and Facebook links in this post have been protected automatically. Once the member reaches 5 posts the links will function as normal.
Hi all, I’m having problems implementing time-stretching using the Wave Similarity Overlap Add (WSOLA) algorithm. I got the basic OLA algorithm working first and then added wave alignment using cross-correlation but the wave alignment part doesn’t appear to be working at all and the result is identical to OLA.

I recorded the audio from my application and here are the waveforms. The first is the original sine wave, the second is the OLA version, and the last is the WSOLA version. I did a null test on the OLA and WSOLA versions and they null completely so they are producing identical results.
Waveforms2.png

I’m working off of the pseudo code from a PDF document called Time-Scale Modification Algorithms for Music Audio Signals by Jonathan Driedger.
Algorithm.png

The full explanation is in Chapter 4.
https://www.audiolabs-erlangen.de/conte ... Thesis.pdf (https://www.audiolabs-erlangen.de/content/05-fau/professor/00-mueller/01-students/2011_DriedgerJonathan_TSM_MasterThesis.pdf)

I’ve gone over the algorithm trying to find any mistakes I may have done and I’ve tried a few tweaks to my code but nothing has worked. I’m new to DSP programming so I’m kind of at a loss on how to fix it. Any help getting this working correctly would be greatly appreciated. Thanks.

Here is my code (Java):

Code: Select all (#)

``````public class TimePitch {
// CONSTANTS                                                        // EQUATION SYMBOLS
private final int           windowSizeMS = 37;
private final int           windowSizeSamples;                      // N
private final float[]       window;                                 // w
private final float         overlapFactor = 0.5f;                   // o
private final int           windowOffset;                           // ηow
private final int           offsetMax;                              // ∆max
private final float         offsetMaxFactor = 0.5f;
// VARIABLES
private int                 outputBufferSize;                       // length(y)
private int                 numberOfWindows;                        // length(γ) AND length(σ)
private int[]               outputWindowPositions;                  // γ
private int[]               inputWindowPositions;                   // σ

// TIME STRETCH VALUE
private float               timeStretch = 2.73f;                    // τ

public TimePitch(){
windowSizeSamples = (int)(windowSizeMS * 44.1);     // 44,100 / 1,000
windowOffset = (int)((1-overlapFactor) * windowSizeSamples);
offsetMax = (int)(windowSizeSamples * offsetMaxFactor);

// Fill Hanning Window
window = new float[windowSizeSamples];
for (int i=0, n=window.length; i<n; i++){
window[i] = (float)(0.5 * (1.0 - Math.cos(2.0 * Math.PI * i / n)));
}
}

public float[] processBuffer(final float bufferIn[]){
// IF no time stretch THEN return original input
if (timeStretch == 1) return bufferIn;

outputBufferSize = (int)(bufferIn.length * timeStretch);
final float[] bufferOut = new float[outputBufferSize];

numberOfWindows = outputBufferSize / windowOffset;
outputWindowPositions = new int[numberOfWindows];
inputWindowPositions = new int[numberOfWindows];

// COMPUTE array of output window positions (γ)
outputWindowPositions = 0;
for (int i=1, n=numberOfWindows; i<n; i++){
outputWindowPositions[i] = outputWindowPositions[i-1] + windowOffset;
}
// COMPUTE array of input window positions (σ)
for (int i=0, n=numberOfWindows; i<n; i++){
inputWindowPositions[i] = (int)(outputWindowPositions[i] / timeStretch);
}

//************************************************
// (OLA) Overlap and Add
//************************************************
//        for (int i=0, n=numberOfWindows; i<n; i++) {
//            // windowed audio segment from input
//            final float[] windowedInput = new float[windowSizeSamples];
//            for (int j=0, m=windowSizeSamples; j<m; j++) {
//                final int index = inputWindowPositions[i] + j;
//                if (index < bufferIn.length) windowedInput[j] = bufferIn[index] * window[j];
//            }
//            // output
//            for (int j=0, m=windowSizeSamples; j<m; j++) {
//                final int index = outputWindowPositions[i] + j;
//                if (index < bufferOut.length) bufferOut[index] = bufferOut[index] + windowedInput[j];
//            }
//        }
//************************************************
// (WSOLA) Cross-Correlation AND Overlap and Add
//************************************************
final double[] frameAdj = new double[numberOfWindows];
final double[] frameNext = new double[numberOfWindows + offsetMax * 2];
final int[] offsets = new int[numberOfWindows];
Arrays.fill(offsets, 0);
for (int i=1, n=numberOfWindows; i<n; i++) {
// OVERLAP AND ADD
// windowed audio segment from input
final float[] windowedInput = new float[windowSizeSamples];
for (int j=0, m=windowSizeSamples; j<m; j++) {
final int index = inputWindowPositions[i-1] + offsets[i-1] + j;
if (0 < index && index < bufferIn.length) windowedInput[j] = bufferIn[index] * window[j];
}
// output
for (int j=0, m=windowSizeSamples; j<m; j++) {
final int index = outputWindowPositions[i-1] + j;
if (0 < index && index < bufferOut.length) bufferOut[index] = bufferOut[index] + windowedInput[j];
}

// FIND NEXT OFFSET
// clear input frames
Arrays.fill(frameNext, 0);
// fill adjacent frame for cross-correlation
for (int j=0, m=windowSizeSamples; j<m; j++) {
final int index = inputWindowPositions[i-1] + offsets[i-1] + windowOffset + j;
if (0 < index && index < frameAdj.length) frameAdj[j] = inputWindowPositions[index];
}
// fill next frame for cross-correlation
for (int j=0, m=windowSizeSamples + offsetMax * 2; j<m; j++) {
final int index = inputWindowPositions[i] - offsetMax + j;
if (0 < index && index < inputWindowPositions.length) frameNext[j] = inputWindowPositions[index];
}
offsets[i] = (int) xcorrMax(frameAdj, frameNext, offsetMax);
}
//************************************************

return bufferOut;
}

// CROSS-CORRELATION
// maximum from cross-correlation
private double xcorrMax(double[] a, double[] b, int maxlag){
final double[] xcorr = xcorr(a, b, maxlag);
double max = 0;
for (int i=0, n=xcorr.length; i<n; i++){
max = (xcorr[i] > max) ? xcorr[i] : max;
}
return max;
}
// cross-correlation
private double[] xcorr(double[] a, double[] b, int maxlag) {
final double[] y = new double[2*maxlag+1];
Arrays.fill(y, 0);

for(int lag = b.length-1, idx = maxlag-b.length+1; lag > -a.length; lag--, idx++) {
if(idx < 0) continue;
if(idx >= y.length) break;

// where do the two signals overlap?
int start = 0;
// we can't start past the left end of b
if(lag < 0) start = -lag;

int end = a.length-1;
// we can't go past the right end of b
if(end > b.length-lag-1) end = b.length-lag-1;

for(int n = start; n <= end; n++) {
y[idx] += a[n]*b[lag+n];
}
}

return(y);
}
}
``````
You do not have the required permissions to view the files attached to this post.

juha_p
KVRian
554 posts since 21 Feb, 2006 from FI
If you have Matlab in hands there's this implementation you could examine... ?

Tekker2
KVRer
3 posts since 14 Sep, 2019
Hi juha_p, thanks for the reply. Unfortunately, I do not have Matlab and I've never used it either. My program is written in Java and I'm not experienced enough in DSP yet to be able to reverse engineer their Matlab code and make it work in my Java app. I don't know how common Java is used in DSP, but if you know of a C/C++ implementation that would also work as it's similar enough to Java that I should be able to modify it to make it work in my program. Thanks again.

juha_p
KVRian
554 posts since 21 Feb, 2006 from FI
You could also examine this implementation in DSP library for Taros. Direct JAR download link.

Google might give few resources as well.

Tekker2
KVRer
3 posts since 14 Sep, 2019
Thanks again for the reply juha_p. I tested their program with a pure sine wave and it extended it perfectly. I also tried a single piano note but that sounded fairly glitchy so I’ll have to play with their parameters a bit more to see if I can clean it up.

Their time stretch code in the WaveformSimilarityBasedOverlapAdd file isn’t self contained within that one file and it’s interconnected to other parts of their program. I tried stripping their code down to see if I could make it work in my program but ran into one major problem (so far), from what I understand their variable “tempo” (which sets the amount of time stretch) manipulates their audio player instead of processing the raw data.

In my program I need to be able to process individual notes separately (it’s kind of like a music sequencer) so if possible, I would like for their processFull(...) method to take the input data as a float[], process the data, and then return the processed data as a float[] without interacting with the rest of the application (aside from GUI controls). But I’m not sure how to untangle their WSOLA code from their other files to make the processing self contained within one file.

It looks like all of the relevant google results come up with that same TarsosDSP program. I also searched for C++ examples, but did not find anything that was helpful.