## Lets talk DAW/Sequencer design and architecture

DSP, Plug-in and Host development discussion.
JCJR
KVRAF
2342 posts since 17 Apr, 2005 from S.E. TN
Miles1981 wrote:I pouls definitely use an int to track time, and not a float. Float are not precise on all their range, so it is not possible to accurately track time. Hence int.
For numerical data like sound waves, or other types of signal, this requirement for accuracy is not as stringent, so floats are better than int/fixed-point arithmetics.
Thanks Miles

The following approximation reasoning could be faulty. If there's something wrong with it, please explain--

A common sequencer Tick Per Quarter Note value is 480 PPQN. Because it is tempo-based, it offers better time resolution at faster tempos and poorer time resolution at slower tempos. This would also apply to a float64 timestamp of course. Faster tempos would have better time resolution even using float64 tempo-based Ticks.

It so happens that 480 PPQN at 125 Beats Per Minute tempo offers 1 millisecond resolution.

So how long could a float64 timestamp at 1 Tick Per Quarter Note run before it fails to deliver at least 1 ms resolution at 125 BPM?

At 4/4 time signature, if we tell the first bar to start at a float64 timestamp of 0.0, then second bar would start at timestamp of 4.0, the third bar at timestamp of 8.0, etc.

125 Beats Per Minute / 60 seconds = 2.0833... beats per second.
125 Beats Per Minute / 60000 milliseconds = 0.0020833... beats per millisecond

https://en.wikipedia.org/wiki/Double-pr ... int_format
The IEEE 754 standard specifies a binary64 as having:
Sign bit: 1 bit
Exponent: 11 bits
Significand precision: 53 bits (52 explicitly stored)
This gives 15–17 significant decimal digits precision. If a decimal string with at most 15 significant digits is converted to IEEE 754 double precision representation and then converted back to a string with the same number of significant digits, then the final string should match the original. If an IEEE 754 double precision is converted to a decimal string with at least 17 significant digits and then converted back to double, then the final number must match the original.
OK, if we can accurately represent up to 15 decimal digits, then the biggest time stamps with "about 1 ms resolution" at 125 BPM might be on the order of 99999999.0000000, 99999999.0020833, 99999999.0041667, etc. About 8 decimal significant digits for the rounded Quarter Note count.

Of course smaller Quarter Note Tick counts would free up more significant digits for better time resolution. At 99999999 Beats, maybe there is a little "fuzz" trying to EXACTLY describe millisecond boundaries. OTOH there should be about 20833 representable counts per millisecond even if we can't EXACTLY describe each millisecond boundary! At 99999999 Beats, we ought to have about 2.08333... microsecond resolution! Even at 99999999 Beats, at a samplerate of 44.1 k-- [ 44.1 samples per millisecond / 20833 counts per millisecond = 0.0021168 ] There would be float64 Tick timing resolution of about 0.002117 sample? That seems "pretty good".

So 99999999 quarter notes / 125 Beats Per Minute = about 799999 minutes, or 13333 hours, or 555.56 days of "at least 1 ms resolution".

Sure, after a month or two of continuous playback maybe there could be a little "fuzz" between individual millisecond boundaries. But it ought to be generally much better time resolution than integer 480 PPQN for ordinary-length songs. Or is there some serious flaw in the reasoning?

Miles1981
KVRian
1355 posts since 26 Apr, 2004 from UK
Well, the thing is that you explain, more or less, is that you have 52 significant bits, and this is what gives you your precision of 555.56 days. The question then is why would you go for a representation that could give you issues.
So what you are actually talking about is changing the 480 ticks per quarter to something that is adapted to the tempo. And you can do that with integers already.

syntonica
KVRist
424 posts since 25 Sep, 2014 from Specific Northwest
Chris-S wrote:
syntonica wrote:That's what I was talking about with fudge factors. For my conversions from floats->ints, I have to add .001 to each before conversion.
In some cases it's better to use round() instead of trunc().
I use floor out of habit since they're all positive reals. Same result.

Kraku
KVRian
1399 posts since 13 Oct, 2003 from Prague, Czech Republic
syntonica wrote:
Chris-S wrote:
syntonica wrote:That's what I was talking about with fudge factors. For my conversions from floats->ints, I have to add .001 to each before conversion.
In some cases it's better to use round() instead of trunc().
I use floor out of habit since they're all positive reals. Same result.
round() would fix the issue that .9999 decimal should have been the next integer value. trunc/floor drops the value to the previous integer value. round() picks the nearest integer value.

You can also switch the processors FPU to use a different rounding mode. This way you don't have to use extra trunc/round/floor commands, if you just keep the new rounding mode in mind and write your code accordingly.

For example if you change PurpleSunray's example from:
printf("%f => %d\n", f * PPQN, int(f * PPQN));
to:
printf("%f => %d\n", f * PPQN, int(round(f * PPQN)));

You'll get the correct value. That added round() does the proper rounding to the nearest integer value, which can be done automatically by changing the FPU's rounding mode.

JCJR
KVRAF
2342 posts since 17 Apr, 2005 from S.E. TN
Miles1981 wrote:Well, the thing is that you explain, more or less, is that you have 52 significant bits, and this is what gives you your precision of 555.56 days. The question then is why would you go for a representation that could give you issues.
So what you are actually talking about is changing the 480 ticks per quarter to something that is adapted to the tempo. And you can do that with integers already.
Thanks Miles

Was just responding to your earlier comment that floats are not accurate enough in some time ranges because of the "variable resolution" of floats--

(Assuming my reasoning didn't drop several decimal places somewhere) was just wondering what issues float64 would have for musical Tick Timestamps, if it apparently has better resolution than conventional int 480 PPQN even after 555 days playback? Ought to be no problemo for normal-duration sequences? After 10 minutes, or even a day of playback, float64 one tick per quarter note would have incredibly better than millisecond resolution at 125 BPM.

On early crude computers, it was possible for a sequencer program to "own" a hardware timer chip. And the OS/hardware was slim enough that the interrupt handler would get called real soon after the timer chip fires. So it was possible to hog a timer chip and set it to fire at 1 ms intervals, and simply use fixed math to add a tempo increment every millisecond, and get good tight ms-resolution timebase over the practical duration of a song. Or rather than use fixed-point math to derive the tempo Tick count, if the timer chip has deep enough registers, just set the timer period so it fires at whatever tempo you want at 480 PPQN (or whatever you like).

However, OS improvements even by the time of Windows 95 or MacOS after TimerManager and non-preemptive multitasking got added-- You couldn't "own" a hardware timer and the machine would not call a software timer routine solid enough to simply count tempo time by incrementing the Tempo Tick Count every time yer timer routine gets called.

On a modern machine you can make a timer callback or timer thread that gets called "frequently enough" for decent playback timing, however there is so much slop that the timing will drift too much by simply adding to the Tick Count every timer routine loop.

So maybe there are slicker ways, but what I did was run a software timer handler or timer thread that tries to loop at least 1000 times per second. And each time the timer gets called, it gets a steady time reference from the computer and calculates what the Tempo Tick Count ought to be based on the computer's "steady time base" you reference. If playing audio and MIDI, there can be three timebases to consider, sample time, "steady computer time" and Tempo Tick time.

Therefore, it isn't an accuracy issue of fixed-point adding maybe 1.0275 or whatever is needed to the Tempo Tick Count at every millisecond to keep up with the Tempo Tick Count. Possibly adding a tiny float increment to a float Tick Counter 1000 times per second could develop some slop even with float64.

Every time you need to know the Tempo Tick Count, you calculate it something like (CurrentComputerSteadyTime - SequenceStartSteadyTime) * TickTempoRatio.

So long as the Timer routine loops for instance 1000 times per second, but there is slop when each Timer loop happens, doing it thataway might have a millisecond of slop on any particular note, depending on how sloppy the Timer gets called, but the overall playback is tight to 1 or 2 milliseconds regardless of song tempo or song length (within reason).

Last time I was doing Mac sequencing, the "steady time" reference based on MachAbsoluteTime() or UpTime() seemed to work purt good. Maybe something else is better nowadays. Convert the high-res machine time to NanosecondsSinceSongStart, then calculate the Tempo Tick accordingly. On PC it was fuzzier last time I looked, because the hardware instruction counter wasn't always reliable on all machines. Maybe there is something solider but still universal on PC nowadays, than the old MMSYSTEM millisecond timeGetTime(). But of course if you are always running audio along with MIDI, you can also interpolate sample playback locations between audio callbacks.

So anyway, it seemed rather idiotic to derive steady time location purt close to the nanosecond, only to round that to a Tempo Tick Count with "about 1 ms resolution" depending on the tempo.

It gets trickier if the sequencer app does tempo maps. I suppose there are numerous ways to do it, but if you have a tempo track, you might keep the hi-rez steady time of song start, then figure the hi-rez steady time offset at the first tempo, to find the steady time offset of the second tempo event. Then take the second tempo and calculate the hi-rez steady time offset for the beginning of the third tempo event, etc.

For instance if you have a 120 bar song with 1 tempo change per bar, by the time you get to the end of the song you have done the math-chasing thru 120 Tick and Time offsets. If the song is in 4/4 and you have one tempo change event per beat, then you would be chasing thru 480 Tick and Time offsets. That chasing could get sloppy on long songs if there is much rounding that might happen on each chase calculation.

It can be done with int64 (or higher) fixed point math. I just suspect that if everything is float64, chasing tempo maps might be about as accurate as one might easily get, with simpler programming. But could easily be mistaken. For instance, Steady Time counted in float64 Seconds (regardless what timebase you use to derive Steady Time), and Tempo Tick Time float64 One Tick Per Quarter note.

Widowsky
KVRer
24 posts since 20 Aug, 2008
Hi,

Maybe the following infos could give you clues about MIDI sequencers' internal workings (although I'm far from technical enough to be sure).

I think I remember Steinberg introducing 15,360 pulses per quarter note internal resolution in Cubase VST24 4.0 Macintosh in 1998. This is consistent with what wikipedia says: “Cubase VST32 5.0 - Sep 2000 - Large update to the Windows product bringing it in sync with the Macintosh product which had included more features such as: 15,360 ppqn internal resolution” (https://en.wikipedia.org/wiki/Steinberg_Cubase#Versions).

Also I found this in the manual for Cubase VST/32 5.x Mac (September 2000):
“MROS Resolution:
This allows you to set the MIDI playback resolution of the program.
{…] If you for example feel the graphics aren't updated as quickly as you like, you could try lowering the resolution to 384 (or less) ticks per quarter note. On the other hand, if you need extremely high playback resolution, you should use the highest possible playback setting, 1920 ticks per quarter note (often called pulses per quarter note and hence abbreviated PPQN). No matter what this setting is, audio is always recorded and played back at 15360 PPQN.”

In today's versions, MIDI Display Resolution can be set to any value between 240 and 4000 ppqn.

HTH

Kraku
KVRian
1399 posts since 13 Oct, 2003 from Prague, Czech Republic
I don't want to be a thread necromancer, but this thread is the perfect place to ask this question:

Let's say I had 100% control over the hardware that runs my sequencer and my audio buffers would be about 0.3 ms long.

Would anyone in this case actually notice if all the triggered notes, events, envelope start/end points, etc. would always be aligned at the start border of the playing audio buffer frame? This would obviously be a problem when your audio buffer would be for example 6 ms long, but if it's only a tiny 0.3 ms, would there be audible jitter anymore? This approach would simplify the internal working of the audio rendering engine, since you wouldn't have to care where inside the buffer the note should start playing etc.

JCJR
KVRAF
2342 posts since 17 Apr, 2005 from S.E. TN
Hi Kraku

In my experience if you can accurately place notes on a quantized 1 ms grid, that very few if any people will notice any jitter in playback. So at 44.1 k samplerate, maybe 44 or 45 sample audio buffers.

The same should probably apply rendering to disk, but because you have the luxury of not having to worry about deadlines when rendering to disk, it would probably be better if possible to put each note as close as possible to the target sample rather than the target millisecond. Not necessarily that many would hear the improvement, but just in case people might hear it.

Using softsynths, if your render to disk strongly resembles the same code that plays in realtime (merely because it would be a chore to write the whole mess two times rather than one time) then if you decided to quantize playback to 1 ms then it would probably happen in disk render as well, unless you write two versions of at least part of the code.

Just sayin, if you figure out how to place notes "exact to the sample" with VST instruments (which isn't difficult) for render to disk, might as well do the same for playback. In which case 8 ms or 24 ms buffers would have as good "sample accurate" playback timing as the render to disk. And some softsynths become very inefficient if you feed them numerous tiny buffers rather than fewer large buffers. Just the internal processes of each softsynth-- I wouldn't venture to guess what details inside a softsynth make it more or less efficient when using tiny buffers.

But in my experience if you can get it accurate to 1 ms then most folks wouldn't notice additional accuracy. Which could be all wrong of course. Your mileage may vary.

PurpleSunray
KVRian
808 posts since 13 Mar, 2012
If you look for a tick frequency that can handle it all, we use 70Mhz on uint64 on your media player

Yes.. for audio-only that's overkill.. but with a 70Mhz resolution you are also able to tackle any kind of video related timings sich DTS/PTS/ATS/NTP/.. with enough accuracy
(70Mhz acutally came out by looking at all kind of different tick resolutions accross codecs/formats and trying to find a least common multiple, or something close it).

Kraku
KVRian
1399 posts since 13 Oct, 2003 from Prague, Czech Republic
JCJR:

Thanks! This information might become handy in a project I'm currently working on.

JCJR
KVRAF
2342 posts since 17 Apr, 2005 from S.E. TN
Kraku wrote:JCJR:

Thanks! This information might become handy in a project I'm currently working on.
I got too old and stupid to code for a living a few years back, but last time I looked, with both VSTi synths and Mac AU synths, if you are sending them notes as MIDI, you can timestamp each note with a buffer offset in samples, relative to the beginning of the current buffer you are preparing to tell the synth to render.

There may be other ways to tell synths to play notes rather than timestamped midi lists, and I never used them and do not know anything about it.

I didn't consider it safe to ask a synth to render any sample offset beyond the bounds of the current buffer. In other words, if the buffer is 512 samples long, maybe some synths would be smart enough to queue up a note for the next buffer if you pass a buffer offset of 600 samples, but I never even tried that. Similarly maybe some synths would be smart enough to respond intelligently to out-of-time-order MIDI, but I never tried that either. IOW, if the offsets for three notes are 100, 200, and 400 samples, I always sorted them smallest to biggest rather than hoping the synth would do the right thing if I maybe send them scrambled like 400, then 100, then 200 or whatever.

So with VSTi or AU you can get sample-accurate note-starting at any buffer size unless you find a synth with bugs.

The sticky point with too-big buffers is play-thru when the guy is playing live into softsynths or recording a track while monitoring softsynth audio. In that case too-big delays can get very messy. For instance if buffers are 20 ms long and you already sent a rendered 20 ms buffer to the soundcard and then "at the last nanosecond" receive a live midi note that by all rights ought to have been time-sorted into that buffer you just finished rendering and can't call back. It doesn't break anything except the nerves of the user trying to play/record under such conditions. That's where tiny buffers can make it easier on the user.

Kraku
KVRian
1399 posts since 13 Oct, 2003 from Prague, Czech Republic
JCJR:
What I'm working on is a HW project with fully custom software/firmware/DSP/etc., so no VST/AU/Windows/OSX is involved

JCJR
KVRAF
2342 posts since 17 Apr, 2005 from S.E. TN
Kraku wrote:What I'm working on is a HW project with fully custom software/firmware/DSP/etc., so no VST/AU/Windows/OSX is involved
Ooooops, sorry about that! Yeah I think 1 ms or tighter ought to sound solid to most ears. Good luck!

Kraku
KVRian
1399 posts since 13 Oct, 2003 from Prague, Czech Republic
Thanks!