VST Plug-In: How to implement a “lookahead” buffer?

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Hello, I hope this the the right place to ask this :phones:

My goal is to write a VST plug-in that should work in programs like Audition and Audacity, so I'm planning to go with VST v2.x. I'm new to VST development, but I have studied the examples here. And so far most stuff looks pretty straight forward. The main "magic" seems to happen in the process() or processReplace() function. Not quite sure what is the benefit/drawback of these two functions though.

Now the "problem" is that my filter is going to need a "Lookahead" buffer of a few seconds (maybe longer, depends on setup). This means that, at the beginning of the process, I will need to fill my internal buffer. And, at the end of the process, I will need to flush the pending samples from that buffer.

I have been coding filters for SoX (Sound eXchange) before and their API is pretty similar to VST at a first glance. What is called process() in VST, is called flow() in SoX API. But there's one major difference: The flow() function in SoX API gets, as parameters, the number of samples available in the input buffer as well as the number of samples that fit into the output buffer. The flow() function then returns the number of samples it has taken from the input buffer as well as the number of samples it has written into the output buffer. It means I don't have to process all available input samples in every call. And a single call can return fewer samples than it has consumed. Consequently, at the beginning of the process, I can consume all input samples but return no output samples at all! This way I can fill my "lookahead" buffer at the beginning. Finally, SoX API has a drain() function that will be called by the Main application at the end of the process in order to flush the pending samples from the filter's internal buffer.

From what I understand about VST, the process() function only has a single parameter to indicate the number of input and output samples. And it has no way to limit the number of output samples. Apparently, process() assumes a simple "N samples in, N samples out" behavior. Is that right ???

If so, what is the recommended way to fill my internal lookahead buffer in VST? I know that I could, of course, fill my internal buffer by returning only "silence" in the first several process() calls, at the beginning of the process. But that would delay/shift the entire audio file, which is not wanted! Also it does not solve the problem how to flush the internal buffer at the end of the process :roll:

Lately, I tried working with the setInitialDelay() function. Calling this from inside my resume() function with the proper size of my "lookahead" buffer seems to compensate for the delay/shift in some applications. It works very nice in WaveLab, in Audition and in GoldWave. But it does not work at all in Audacity, Wavosaur or Ocenaudio - they still shift the audio and cut off the end. What am I doing wrong here ???

WaveLab Before/After (good):
* http://i.imgur.com/C8PKd2s.png
* http://i.imgur.com/VgESvUE.png
* http://i.imgur.com/AJavRME.png

GoldWave Before/After (good):
* http://i.imgur.com/q0UmVMa.png
* http://i.imgur.com/IPCROuf.png

Wavosaur Before/After (shifted+truncated):
* http://i.imgur.com/TPWbCBT.png
* http://i.imgur.com/Z8qoceA.png

Thank you for any advices :tu:
Last edited by KarLoff on Wed Aug 27, 2014 1:49 pm, edited 2 times in total.

Post

you should've posted it in DSP section.
I don't know what to write here that won't be censored, as I can only speak in profanity.

Post

Burillo wrote:you should've posted it in DSP section.
:dog:

Can you please move it? Thanks!

Post

VST effects always return the same number of samples they receive. VST plugins are designed to work in real-time, so there's no alternative.

If you need to look ahead N samples, you need a buffer at least N samples long, and you will introduce at least N samples of latency. When the effect first kicks in you have to return silence in those first N samples.

Not all VST hosts support latency compensation, and there's nothing you can really do about that.

Are you sure you need to look ahead? Things are simpler if you don't. IIR filters look backward by a few samples, and FIR filters can operate on a single sample at a time. Or are you using "filter" to mean something else?

Post

Thank you very much for reply!
foosnark wrote:VST effects always return the same number of samples they receive. VST plugins are designed to work in real-time, so there's no alternative.

If you need to look ahead N samples, you need a buffer at least N samples long, and you will introduce at least N samples of latency. When the effect first kicks in you have to return silence in those first N samples.

Not all VST hosts support latency compensation, and there's nothing you can really do about that.
Actually, that's what I'm doing right now. The problem is that, if I introduce N samples of latency, the audio editor will insert N samples of silence at the beginning of the file (which still might be acceptable) and will cut off the last N samples of the file (which clearly is not acceptable). Too my understanding, the setInitialDelay() function is exactly for that purpose! Unfortunately, the VST documentation is rather vague on this, especially on when exactly setInitialDelay() should be called and how it plays together with ioChanged(). Some sources suggest I need to call setInitialDelay() after ioChanged(), some suggest the other way around. In my tests, it didn't make a difference, even if I did not call ioChanged() at all. Currently I'm calling setInitialDelay() inside of my resume() function. And the fact that this leads to perfect results in WaveLab, Audition and GoldWave, makes me confident I'm not doing it entirely wrong. But why it doesn't work in other editors like Audacity, Wavosour and Ocenaudio then ??? :?:

I understand that compensating latency is more tricky in "live" mode. But even in "live" mode the application unavoidably has to deal with latencies of the input/output devices as well as the latencies that result from the pure computation time of each filter - even if the filter does no internal buffering at all, it cannot run in zero CPU time and thus introduces some delay. Anyway, an audio editor that is not running in "live" mode, clearly has no excuse for ignoring the setInitialDelay() value! So either Audacity, Wavosour and Ocenaudio all have serious bugs in their VST support, which is hard to believe for me (setInitialDelay has been part of VST since v2.0, from 1998), or I'm still doing something wrong. I guess the latter is more likely...

So, any suggestions where to go from here would be much appreciated. Especially an accurate specification on how the setInitialDelay() function is supposed to be used would be very helpful :pray:

Last but not least: How does the getGetTailSize() function fit into the whole thing? Again, the documentation is very vague here :roll:
foosnark wrote:Are you sure you need to look ahead? Things are simpler if you don't. IIR filters look backward by a few samples, and FIR filters can operate on a single sample at a time. Or are you using "filter" to mean something else?
Yes, I do, unfortunately. I'm using a "sliding window" approach - something that isn't uncommon in the field of audio processing. Any "sliding window" algorithm will necessarily need to fill its initial window at the beginning of the process. At the beginning of the process, when the "window" buffer is still empty, I unavoidably need to consume input samples without returning any output samples (yet). And, at the end, I need to flush the internal buffer. I don't think there's a way around that for any "sliding window" algorithm. The API of SoX does account for that - by allowing me to return the exact number of input/output samples that have actually been read/written in each process() call. It also has a dedicated drain() function to flush the pending samples at the end of the process. If there's no equivalent in VST for that, it seems like a shortcoming of the VST API...

Post

You just need to set your plugin delay. I've no idea what the actual function is called, but the VST SDK does have one, then the program should know to compensate.

Post

KarLoff, it sounds as if you are doing it as best you can already, and there is nothing else that you can fix from the plugin side. If you need the lookahead, and you have already added code which some hosts work properly with, then it is probably as good as it will get.

Some hosts do not support the latency compensation. If a user insists on using your plugin in a host which doesn't support latency compensation, then he can either use your plugin in a different host, or workaround in his current host. For instance if he knows you will consume exactly 2 seconds of latency, then he could pad his audio by +2 secs at the tail, run your plugin nonrealtime, then trim 2 seconds from the head afterwards.

If you want to make more money, supporting more users on more platforms, the only choice is to figure another way to skin the cat which doesn't need vast amounts of lookahead. Small lookahead buffers are easier for a user to workaround even in hosts which do not support latency compensation.

Sometimes supporting latency from the host is easy. Sometimes supporting latency from the host would be very hard. It wouldn't be right to slam a host programmer if he doesn't compensate for latency, because there is no way to judge how many resources he can throw at the problem. There is no way to judge whether what his program is trying to do, already has him up to his ash in alligators so deep that adding latency compensation might send him to the loony bin keeping up with all the interacting sync difficulties.

The hosts that can do heavy multitrack, with heavy per-track host-side manipulation, and also support latency compensation are smart dudes who require well deserved praise.

I'm semi-retired, but one program I worked on, tries to support multiple audio driver models, both vst and dx, does on the fly randomization of playback events, and multiple tracks of musically-aware pitch and tempo stretching, pseudo-randomly splicing the tracks together from disk audio snips to follow the chord progression and tempo map. If they wanted me to add latency compensation then I'll probably go fishing instead because I am probably not smart enough or patient enough to squeeze in "that one final feature" to make it "finally finished". :)

Post

The VST spec makes no facility for correcting (getting rid of) the initial delay. That would have to be implemented by the host. I suppose it could use a flag telling the host from the plugin that it's okay to rid the initial zeros, but I don't think it has one. FWIW, linear phase filters would have audio information in the pre-gap.

Post

Regarding process() vs. processReplacing() .. the difference is that process() is supposed to add the plugin output to the output buffers, where as processReplacing() is supposed to overwrite whatever data there might have been. You must always implement processReplacing() where as process() is just legacy compatibility for old hosts as far as VST2.4 goes.

As far as setInitialDelay() goes, it's pretty much just a hint to the host about the relative alignment of tracks. Many hosts (but not all of them) will then correct the relative alignment by adding a matching latency to other tracks (such that they all have the same final latency, and hence align as intended). If you're only processing a single track at a time, then setInitialDelay() is meaningless to you. [someone link the other track where we just discussed where to call setInitialDelay() .. but generally calling it in resume() works in most hosts at least; if you call ioChanged() that will trigger a suspend/resume cycle, which might or might not allow you to change the latency on the fly, depending on the host]

Just because you have latency does not necessarily mean that you are going to produce actual zeroes, it simply means that you are going to delay the signal by N samples. As camsr points out, linear-phase filters are a common example of a situation where you want to report latency (because the filter delays the signal, so it would be out of alignment with regards to other tracks unless compensated), but you don't usually want to discard the leading samples (because they contain the filter pre-ringing, which was the whole point of adding the delay in the first place).

As far as getTailSize() .. it apparently exists for the purpose of giving the host another hint: how long does it take for the plugin to go silent after the input becomes all zeroes. I don't think it's widely supported by either plugins or hosts, but I could be wrong on this one.

Post

Your only option is to have hosts like wavosaur implement support for this data. Unfortunately it is difficult to get even one host to support something like this, forget about having everything supported.

This is due to the poor documentation and standardization of VST from the very beginning.

Ideally, yes a tool like wavosaur should implement both AEffect::initialDelay and opcode effGetTailSize. The destination buffer should be allocated with [original length + latency + tail] and the user should be responsible for adjusting the buffer length after processing.

Unfortunately, this is obviously not intuitive and not all authors would agree with me.

With any luck in the future we can standardize a new VST2-like format to replace what we have now while ensuring that it is properly planned and documented.

All we really need to do is start coming up with solutions and motivating authors to implement them. We could most likely standardize on this sort of functionality exactly as-is without significant modification of the current VST2x implementations.

Correct me if I'm wrong:

1) Plugins must fill aeffect::initialDelay in the vstpluginmain entry-point / constructor with the maximum latency the plugin will have.

If the latency is due to a variable delay, an additional delay must be added such that the total latency is always equal to that specified when the plugin was instantiated.

Suggest renaming this parameter to "latency".

Suggest adding some "can-dos" such as "latency compensation" for both the plugin and host. The plugin could automatically entirely disable its fixed latency functionality (delay(max_latency - latency)) if the host does not respond to "latency compensation". This would need to include a manual configuration override switch to allow backward compatibility.

2) Plugins must report their anticipated tail size by updating their internal state before the next process() call but after initial setParameter() and effProcessEvents or other calls. Plugins may anticipate the maximum tail size during future blocks by handling events and making a prediction/estimate, or alternatively may present the maximum estimated tail size to the best of their ability.

Suggest extending this functionality to allow the host to request the maximum tail size given specific state input for the following specified number of samples. This would likely be compatible with the existing effGetTailSize opcode by passing block size for the next process call in the index parameter.

Unfortunately any practical implementation will have its limitations regarding identifying the absolute maximum tail size with variable state/input.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.

Post

Those are good ideas, aciddose.

Some hosts support initialDelay very well. If I ever write a host simple in other features, will certainly support it. I truly admire complex hosts which also support the feature.

A few of the "simpler" issues from the host side--

If you have multiple tracks, multiple inserts, multiple buss inserts, and the main output inserts, each of which allows multiple plugins in series-- Lets not even think about the implications with side-chains--

The user could insert an arbitrary number of plugins, all with different delay requirements. So the host would conceivably have to determine the total latency of each chain of plugins, and feed all the chains with different timing offsets, because there could be no delay on some tracks, or different delays on different tracks, or whatever.

Plugins inserted on the backend of synthesizers, with delay, means that those tracks of softsynths have to receive MIDI offset in time.

And tempo-maps-- If reading track audio or midi data offset many different offsets for different tracks, you are reading some data at different tempo than other data. Arrrgh!

And real-time playthru-- If somebody inserts a long-latency plugin on a play-thru synth track or audio overdub track, I just don't see any way to fix that, except for the user disabling the plugin as long as he is trying to record/play in real-time.

But some hosts manage remarkably well. I'm just complaining. Somebody call the waaambulance! :)

Post

It is definitely insanely difficult to manage all possibilities, which is most likely the reason many hosts don't include full support for these features.

For something like an audio clip editor handling the extra time (initialDelay + tail) when processing should be trivial, memory concerns aside, and I suspect that it is possible to handle complex signal paths trivially so long as there are no feedbacks.

I believe that it should work if each audio signal has an associated latency accumulator. When mixing two signals, one of the signals must be delayed to accommodate the latency of the higher latency signal so that they match. (Multiple inputs should be considered the same as mixing, even if those signals are not eventually mixed to the outputs.)

As long as all signals are feed-forward this shouldn't be too complicated.

You're right that midi latency makes it significantly more complex, but I suspect in most cases the latency issues are going to be related to effects chains rather than instruments and sequence data.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.

Post

Sorry, for late reply. I was away for a few days and when I came back, my main development machine wouldn't work anymore. Turned out the PSU had died. Problem has been fixed now...

Thanks for all the useful input! Looks like this topic is more controversial than I had assumed :wink:

To summarize, please correct me if I'm wrong:
  • The only way to implement a “lookahead” buffer with VST is delaying the samples, i.e. there is no way to get more input samples from the host application without returning some (dummy) samples first.
  • The best I can do to compensate for the unavoidable delay is having my plug-in call setInitialDelay() with the proper value. The correct place to call setInitialDelay() is when the host calls my resume() function. After having done that, a "well-behaved" audio editor will (a) discard the plug-in's first N output samples and (b) feed to the plug-in with N additional "dummy" samples at the end in order to flush the pending output samples. At least that is what most audio editors (e.g. WaveLab, Acoustica, Audtion, GoldWave and REAPER) seem to do.
  • Calling ioChanged() will trigger a suspend() and resume() cycle. I assume that I would call ioChanged() from my processReplace() function, when required. Furthermore, I guess that this would only be required if my delay changes, but not if the delay is fixed and the correct delay has already been set in the first resume() call. Is that right?
  • I'm still not quite sure how getTailSize() fits in. With those audio editors that corretly handle the delay set via setInitialDelay(), implementing getTailSize(), or not, doesn't seem to make any difference! The pending outputs are properly "flushed" at the end, even when getTailSize() is not implemented at all. On the other hand, with the audio editors that do not respect the delay set via setInitialDelay(), implementing getTailSize() doesn't help to prevent truncation either. So that function is pretty much a NOP :scared:
  • If my plug-in already does consider all of the above, but a specific audio editor still doesn't compensate the delay, i.e. the audio file ends up shifted and truncated, then this is a bug/limitation in the specific audio editor and there is really nothing I could do on my side.
Best Regards!

Post

Tail size is the signal you generate after the input is zero.

For example in an envelope generator, tail size would be the release time.

This is of major consequence in hosts that disable processing during silence. In these hosts, the processing will not stop until the input and output has been zero for at least (latency + tail size).

For example take a delay effect. Both the input and output might be zero. There may be a signal stored in the delay buffer however and up until the total delay time has passed you can't be certain it won't make its way to the output.

Each time a signal is detected at either input or output the silence counter is reset.
Free plug-ins for Windows, MacOS and Linux. Xhip Synthesizer v8.0 and Xhip Effects Bundle v6.7.
The coder's credo: We believe our work is neither clever nor difficult; it is done because we thought it would be easy.
Work less; get more done.

Post

VSTs are really designed for real time processing, not processing on fixed audio, so if I were you I'd look at other plugin formats. For instance, Audacity supports Ladspa plugins and Nyquist plugins (using some sort of LISP) and tbh I have no idea if these are better for non-realtime audio, but it might be a good idea to check. I think VST itself also has a specific mode for non-realtime audio but I don't think it's well supported at all.

I think getTailSize() is really just to tell the approximate length of a reverb or delay time, so I'd be really surprised if it gave any sorts of exact results. After all, real time audio never really "ends", so in the normal context where you'd use a VST, your output is never really truncated, it just goes on "forever".

Post Reply

Return to “DSP and Plugin Development”