Need Help with Time-Stretching WSOLA Algorithm

DSP, Plug-in and Host development discussion.
Tekker2
KVRer
3 posts since 14 Sep, 2019
Hi all, I’m having problems implementing time-stretching using the Wave Similarity Overlap Add (WSOLA) algorithm. I got the basic OLA algorithm working first and then added wave alignment using cross-correlation but the wave alignment part doesn’t appear to be working at all and the result is identical to OLA.

I recorded the audio from my application and here are the waveforms. The first is the original sine wave, the second is the OLA version, and the last is the WSOLA version. I did a null test on the OLA and WSOLA versions and they null completely so they are producing identical results.
Waveforms2.png

I’m working off of the pseudo code from a PDF document called Time-Scale Modification Algorithms for Music Audio Signals by Jonathan Driedger.
Algorithm.png

The full explanation is in Chapter 4.
https://www.audiolabs-erlangen.de/conte ... Thesis.pdf (https://www.audiolabs-erlangen.de/content/05-fau/professor/00-mueller/01-students/2011_DriedgerJonathan_TSM_MasterThesis.pdf)

I’ve gone over the algorithm trying to find any mistakes I may have done and I’ve tried a few tweaks to my code but nothing has worked. I’m new to DSP programming so I’m kind of at a loss on how to fix it. Any help getting this working correctly would be greatly appreciated. Thanks.

Here is my code (Java):

Code: Select all (#)

``````public class TimePitch {
// CONSTANTS                                                        // EQUATION SYMBOLS
private final int           windowSizeMS = 37;
private final int           windowSizeSamples;                      // N
private final float[]       window;                                 // w
private final float         overlapFactor = 0.5f;                   // o
private final int           windowOffset;                           // ηow
private final int           offsetMax;                              // ∆max
private final float         offsetMaxFactor = 0.5f;
// VARIABLES
private int                 outputBufferSize;                       // length(y)
private int                 numberOfWindows;                        // length(γ) AND length(σ)
private int[]               outputWindowPositions;                  // γ
private int[]               inputWindowPositions;                   // σ

// TIME STRETCH VALUE
private float               timeStretch = 2.73f;                    // τ

public TimePitch(){
windowSizeSamples = (int)(windowSizeMS * 44.1);     // 44,100 / 1,000
windowOffset = (int)((1-overlapFactor) * windowSizeSamples);
offsetMax = (int)(windowSizeSamples * offsetMaxFactor);

// Fill Hanning Window
window = new float[windowSizeSamples];
for (int i=0, n=window.length; i<n; i++){
window[i] = (float)(0.5 * (1.0 - Math.cos(2.0 * Math.PI * i / n)));
}
}

public float[] processBuffer(final float bufferIn[]){
// IF no time stretch THEN return original input
if (timeStretch == 1) return bufferIn;

outputBufferSize = (int)(bufferIn.length * timeStretch);
final float[] bufferOut = new float[outputBufferSize];

numberOfWindows = outputBufferSize / windowOffset;
outputWindowPositions = new int[numberOfWindows];
inputWindowPositions = new int[numberOfWindows];

// COMPUTE array of output window positions (γ)
outputWindowPositions[0] = 0;
for (int i=1, n=numberOfWindows; i<n; i++){
outputWindowPositions[i] = outputWindowPositions[i-1] + windowOffset;
}
// COMPUTE array of input window positions (σ)
for (int i=0, n=numberOfWindows; i<n; i++){
inputWindowPositions[i] = (int)(outputWindowPositions[i] / timeStretch);
}

//************************************************
//************************************************
//        for (int i=0, n=numberOfWindows; i<n; i++) {
//            // windowed audio segment from input
//            final float[] windowedInput = new float[windowSizeSamples];
//            for (int j=0, m=windowSizeSamples; j<m; j++) {
//                final int index = inputWindowPositions[i] + j;
//                if (index < bufferIn.length) windowedInput[j] = bufferIn[index] * window[j];
//            }
//            // output
//            for (int j=0, m=windowSizeSamples; j<m; j++) {
//                final int index = outputWindowPositions[i] + j;
//                if (index < bufferOut.length) bufferOut[index] = bufferOut[index] + windowedInput[j];
//            }
//        }
//************************************************
// (WSOLA) Cross-Correlation AND Overlap and Add
//************************************************
final double[] frameAdj = new double[numberOfWindows];
final double[] frameNext = new double[numberOfWindows + offsetMax * 2];
final int[] offsets = new int[numberOfWindows];
Arrays.fill(offsets, 0);
for (int i=1, n=numberOfWindows; i<n; i++) {
// windowed audio segment from input
final float[] windowedInput = new float[windowSizeSamples];
for (int j=0, m=windowSizeSamples; j<m; j++) {
final int index = inputWindowPositions[i-1] + offsets[i-1] + j;
if (0 < index && index < bufferIn.length) windowedInput[j] = bufferIn[index] * window[j];
}
// output
for (int j=0, m=windowSizeSamples; j<m; j++) {
final int index = outputWindowPositions[i-1] + j;
if (0 < index && index < bufferOut.length) bufferOut[index] = bufferOut[index] + windowedInput[j];
}

// FIND NEXT OFFSET
// clear input frames
Arrays.fill(frameNext, 0);
// fill adjacent frame for cross-correlation
for (int j=0, m=windowSizeSamples; j<m; j++) {
final int index = inputWindowPositions[i-1] + offsets[i-1] + windowOffset + j;
}
// fill next frame for cross-correlation
for (int j=0, m=windowSizeSamples + offsetMax * 2; j<m; j++) {
final int index = inputWindowPositions[i] - offsetMax + j;
if (0 < index && index < inputWindowPositions.length) frameNext[j] = inputWindowPositions[index];
}
offsets[i] = (int) xcorrMax(frameAdj, frameNext, offsetMax);
}
//************************************************

return bufferOut;
}

// CROSS-CORRELATION
// maximum from cross-correlation
private double xcorrMax(double[] a, double[] b, int maxlag){
final double[] xcorr = xcorr(a, b, maxlag);
double max = 0;
for (int i=0, n=xcorr.length; i<n; i++){
max = (xcorr[i] > max) ? xcorr[i] : max;
}
return max;
}
// cross-correlation
private double[] xcorr(double[] a, double[] b, int maxlag) {
final double[] y = new double[2*maxlag+1];
Arrays.fill(y, 0);

for(int lag = b.length-1, idx = maxlag-b.length+1; lag > -a.length; lag--, idx++) {
if(idx < 0) continue;
if(idx >= y.length) break;

// where do the two signals overlap?
int start = 0;
// we can't start past the left end of b
if(lag < 0) start = -lag;

int end = a.length-1;
// we can't go past the right end of b
if(end > b.length-lag-1) end = b.length-lag-1;

for(int n = start; n <= end; n++) {
y[idx] += a[n]*b[lag+n];
}
}

return(y);
}
}
``````
You do not have the required permissions to view the files attached to this post.

juha_p
KVRian
554 posts since 21 Feb, 2006 from FI
If you have Matlab in hands there's this implementation you could examine... ?

Tekker2
KVRer
3 posts since 14 Sep, 2019
Hi juha_p, thanks for the reply. Unfortunately, I do not have Matlab and I've never used it either. My program is written in Java and I'm not experienced enough in DSP yet to be able to reverse engineer their Matlab code and make it work in my Java app. I don't know how common Java is used in DSP, but if you know of a C/C++ implementation that would also work as it's similar enough to Java that I should be able to modify it to make it work in my program. Thanks again.

juha_p
KVRian
554 posts since 21 Feb, 2006 from FI