KVR Audio

learjeff · Post by **learjeff** » Sun Oct 02, 2005 1:40 am

I'm seriously considering starting a programming project to create a GUI-based program to assemble soundfonts (or, more likely, sfz-format sample sets) automatically from a large collection of samples of the same instrument for a single patch, including lots of notes and lots of velocities.

I have a number of ideas that aren't incorporated in any existing products. The idea is to minimize the hand labor required to put together a multisample, multivelocity soundfont for an acoustic instrument like piano. The goal is to make it as easy as possible, so that we can have more high quality free soundfonts. Of course, the process still won't be trivial, and will still require some dedication and care on the part of the person recording the samples.

The key ideas involve automatically building the keyboard map, and feedback to help identify holes in the sample sets, as well as getting rid of clunkers. Resulting keyboard maps would not look like a rectangular grid, as they do for most sample sets. Most sample sets look this way because it's a compromise between the ideal keyboard map and manageability for the sound designer. I'm convinced that it's possible to minimize that manageability issue. Another feature would allow relatively simple creation of smaller sized soundfonts by eliminating some percentage of the samples, more or less automatically. The tool would focus on unlooped samples, because other tools already cover that pretty well. (Unless some programmer is willing to add that feature, of course!)

It's not rocket science. I've already prototyped tools that categorize samples by RMS level and MIDI note, and building keyboard maps. But these tools are coded in a script language (Python) that's not suitable for distribution, and the code is rather embarrasing to a pro programmer IYKWIM.

I'm interested in input from the community, as well as offers to help. Ideally, this would be an open source project, and would utilize existing open source subsystems & tools as much as possible. I admit I'm ignorant about most of that, but I'll be investigating the excellent information in the sticky post above. Also ideally, it would be platform-portable. But if it only runs on a Win PC, that's fine with me.

I haven't done any GUI coding since the late 80's (!), so I would definitely appreciate help in that regard, especially someone roughing together a working model GUI that I could tweak and plug in the functional code.

Also, while I've spent a good deal of time studying OO and I have a deep understanding of the concepts, I don't use OO languages professionally, so I'd appreciate any design assistance as well. I'm educated enough to realize that experience matters in this regard!

I won't be ready to dig into this seriously until early next year (2006). But I thought I'd raise the topic here and see what bounces off the fan, so to speak. If anyone has any good pointers to info that's not in the sticky topic above, please post it.

This is for fun and learning, not profit. I'd enjoy using the tool and one of my first projects using it would be to improve my jRhodes soundfonts. Next I might tackle acoustic guitar -- but I confess I haven't tried what's out there like I did for Rhodes before doing my own. And someday I might get the chance to sample a good grand piano.

Interested? Please let me know, or post any ideas you may have. It's brainstorming time, so no suggestions are "wrong"!

Thanks,
Jeff

learjeff · Post by **learjeff** » Sun Oct 02, 2005 2:03 am

Here are some areas where I'd especially appreciate contributors.

1) I've never cooperated in an open-source project. Anyone who knows how to create and manage one effectively would be apprecated.

2) Identifying potential existing modules to use.

3) DSP coders. For most purposes, DSP coding isn't required, but I'm sure my pitch-detection algorithm isn't robust enough. No doubt there will be other areas where audio experts would be helpful. I would prefer to be able to use existing soundfont player code (or programs), but if that isn't feasible, I'll need serious help here!

4) OO designers, as mentioned above.

5) GUI coders, as mentioned above.

6) Makefile gurus! Etc.

7) Idea people. I have a lot of ideas, but I'm sure there are more! For example, how could this be architected so that the same framework could be used but with different sample assignment algorithms, to help assemble drum kit formats?

No doubt I will be adding to this list!

PS: I haven't selected a language, but I suspect C++ is the default choice. Please feel free to point out alternatives. I'd really rather not have to deal with memory allocation issues (I get enough of that on my day job), but I worry that Java may not be the best vehicle for simple distribution or for performance reasons, for a few performance-critical operations.

pljones · Post by **pljones** » Sun Oct 02, 2005 7:59 am

Hi Jeff - nice project! All my free time's currently soaked up on non-music open source coding (for my wife, /sigh/) -- and the bits that aren't are filled with trying to get NS Kit 7 (full) into sfz

.

Ideas:

1) Go for SF2:
1.1) 16 bit at 44k1 is good enough, really.
1.2) sfz supports DFD for SF2 files so you can have huge samplesets

2) ...but support reading of sfz-format mappings as a starting point

2a) ... and other SF2 files (see 4). And maybe the files TiMidity++ uses.

3) With melodic mappings, you need to support velocity and pitch crossfades or stretching. With drum maps, you only need to worry about velocity. (Automatic, that is.)

4) Don't limit the program to a single patch. Allow multiple instruments and banks.

5) Drag'n'drop for as much as possible on the UI - avoid spreadsheet-like presentation if possible.

6) Context menus to access settings with every known parameter available.

7) Your file access/formatting routines shouldn't affect the UI design and should have limited effect on the overall program architecture. Allow for some way of communicating the "unsupported in this format" settings the user has chosen but don't prevent the file being written.

8.) You mention pitch detection but not loudness detection. I've used Voxengo's Leveler with some useful results - so a tool to do velocity mapping would be good.

TiMidity++ mentioned above is open source C. It's also very old and heavily hacked. (Alas, my only bit of code in there was superceded when two branches were merged... still in CVS though.

)

Project management of this kind of thing is a nightmare. The "other" project I'm working on is a plugin for an editor for Sims2 objects. The main program author has a site on SourceForge. I've started a separate SF project as I found it easier to work completely independently. The release process doesn't really help keep things controlled, either... But take something like the Linux Kernel as a counter example. Huge organisational structure, multiple trees but with a defined lifecycle. It probably is worth getting some ideas on this sorted out up front.

Java does mean you get a free IDE (Eclipse - with a free platform-native widget set) and cross-platform compatibility. I'd definitely recommend looking into Eclipse as the development environment, even if not using Java.

Muon Software Ltd · Post by **Muon Software Ltd** » Sun Oct 02, 2005 10:47 am

Could I make an architectural suggestion? it would be cool if it had a plugin design for the file import/export. Then for example someone could write an SFZ import/export plugin, an SF2 import/export plugin etc. so the capabilities of your app could be extended by other programmers.

If you concentrated your efforts on the mapping side of things and made it highly generic, other sampler formats could be added as and when needed.

HTH
Dave
Muon Software Ltd
www.muon-software.com

learjeff · Post by **learjeff** » Sun Oct 02, 2005 3:53 pm

Yes, Dave -- plugin for format would be best. I only plan to support one format, because there are plenty of good format converters out there and I see no need to reinvent that particular wheel. But if the core architecture is sufficiently robust, it could serve as the basis for any number of related uses. However, my main goal is assembling a single patch, which is the really time consuming part (after recording, that is) for building a good multisample soundfont.

Pljones, thanks for the suggestions, all good ones.

Most likely, I'd support sfz format first for simplicity, but as mentioned above a plugin interface for format would be ideal.

I hadn't planned to support velocity crossfades, since in my limited experience they sound terrible (phasing problems). But there's no reason why it can't be supported.

For melodic mappings, the mapping can be automatic using pitch detection. For drum kits, the mapping has to be manual -- unless there's some good way to recognize different kit instruments. I think that's way beyond my skill level!

By single-patch, I really meant single sample set for a single instrument, but multiple patches for different mappings. There are already many programs that can manage patches in soundfonts and bundle multiple different instruments for convenience. Again, with a good architecture, facilities like that should be able to be added by anyone interested in doing the work.

I plan to use RMS levels to map to velocities, along with a user-configured curve to calibrate velocity against RMS level across the keyboard. For example, RMS levels go down considerably as you go up the keyboard on acoustic instruments, and I'd provide a means for the user to adjust this graphically.

Thanks for mentioning Eclipse. I'll check it out.

I'd like to use Java but I wonder if it's fast enough for some of the numeric algorithms. For reasons I won't go into yet, performance will be important because there might be frequent rescanning of certain wave files containing lots of samples.

My only beef with Java is I prefer multiple inheritance (though I do understand why they left it out of Java). But I don't know whether that will even be much of an issue here. Certainly, with Java, the likelihood is a more robust program, especially since this would be hobby code and not rigorously bench tested.

pljones · Post by **pljones** » Sun Oct 02, 2005 4:34 pm

There really isn't a problem with no direct multiple inheritance: design your solution using interfaces, then package those interfaces up into base classes that implement multiple ones, then specialise from that base class. It's just a different approach.

I have to admit to having no idea how fast numerical routines are handled in java/c# vs C/C++.

Cross-fades: I wasn't expecting automation, just I wouldn't want the UI to prevent it. René reckons sfz does a good job on velocity layer crossfades.

Echoing Dave and as I said: keep the file format handling as separate as possible from the patch management. Thinking of it as a plugable interface is a good way of making as few assumptions as possible. A good, clean component-based design will make the whole project more manageable, too, even if you never actually have plugins.

learjeff · Post by **learjeff** » Sun Oct 02, 2005 4:59 pm

Here's my vision, along with a few implementation ideas that are of course subject to improvement.

BTW, I would prefer to do much of the work bottom-up: create useful individual (perhaps prototype) tools to provide key functions, and later integrate them into an easy-to-use turnkey application. This vision includes some of that bottom-up approach, using existing tools where possible to avoid having to implement everything at once.

First, set up my favorite DAW program to record in 24-bit format, and to save wave files in a given folder. The reason for 24-bit recording is covered below, but it isn't a requirement.

I run the soundfont assembler ("assembler") and direct it to that same folder. The assembler watches that folder for new wave files or changes to wave files. The assembler is also told (by the user) what format to assume for incomplete wave files so it can read files as they're being recorded, before the wave file header is appended or re-written. (Different DAW programs will have different conventions to understand.)

I adjust recording controls so that the very loudest notes I will sample peak out near 0dBFS, for optimal recording. I probably also verify that the quietest notes I'll play record well without too much background noise -- although I will probably be de-noising later. All of this I do the same way I would for any recording project. However, unlike my previous soundfont recordings, I will use the same record levels for all samples -- to level the playing field for the software.

OK, now the fun starts. I turn on the "record mode" of the assembler, which is a GUI that shows a graphic synopsis of what I have recorded -- so far, nothing except a few test notes, but let's ignore those. I start the DAW recording and leave it running for the rest of this session (or stop and restart whenever I want).

The first step is to establish the dynamic range equivalence across the keyboard (roughly, this would be adjustable afterwards). To do this, I play the loudest and softest notes at the bottom, middle, and top of the keyboard. I get to do this as many times as I like.

As each note is detected by the assembler (a moment after it's played and recorded by the DAW), it is displayed graphically in a relatively simple format: low notes at the left, high notes at the right, loud notes at the top, soft notes at the bottom. At this point, this graph is auto-ranging so that all notes fit in the graph, and the extremes are at the extremes. The axes are labeled by RMS level and musical note (or MIDI key, or whatever). Each note is rendered in a different color, and with a color assignment that ensures that no two samples that are adjacent on the graph get the same or similar colors. Of course, the color assignment algorithm takes into account common modes of color blindness -- this isn't difficult.

Yes, there is a difficulty here when two samples would occupy the same spot in the grid. I have some ideas, let's not get bogged down with that just yet.

The user can click on any occupied spot on the grid to play the corresponding sample, or right-click and omit that sample from subsequent display in the grid. (Yes, there's a way to get it back.)

After I've played the loudest and softest high, low, and middle notes, I hit a button to auto-calibrate the velocity/RMS curves. (Yes, there should be a choice of different best-fit methods: linear, exponential, etc., and with user choices like highest and lowest MIDI velocity to map to.) After that, the grid changes from displaying the vertical axis by RMS and instead does it by MIDI velocity.

Next comes the boring part: playing each key I plan to sample at each velocity I plan to sample. I play single notes, at any velocity and in any order, and holding each as long as I see fit. For any snark, I can right-click and omit the sample from further consideration. Watching the display, I can see how uniformly I've covered the grid. I can choose to record 16 or more velocities in regions of the keyboard where it matters most, and only 1 or two in regions where it matters least. I can record closely spaced velocities where I know (a priori) that the timbre changes dramatically with velocity, and widely spaced where it doesn't.

There is no need for me to record carefully even "velocity layers". One note may have 4 velocities, and the next sampled note may have only 3, and they needn't correspond to velocities sampled for the other note.

Note that so far, the assembler has only served as a window to show me what notes I've recorded. No changes are made to wave files yet. Also, during this process I can stop the DAW at any time, grab a meal or some shut-eye, and return and start recording again (using the same recording gain, etc.)

Eventually, I decide I have enough samples and want to start assembling the soundfont. The next step is to actually build the keyboard map. While this may seem as though it's implicit from above, note that that graph only shows a spot for each sample. Building the keyboard map essentially means assigning a nearby sample to each of the blank spots in the grid. This is done using a relatively straightforward distance method, where the user gets to control the tradeoff in vertical distance (velocity) versus horizontal distance (pitch). Oh yes, and the amount of cross-fade desired (in general, or by keyboard note segments, or perhaps as a graphic function).

Then I pull the plunger and an sfz file is created, which I can play using my favorite soundfont player (sfz, of course). Now, ideally, this would be bundled into the assembler for convenience and immediate handling of any parameter changes. But for now, let's stick to the simpler bottom-up approach.

I exercise the soundfont, playing a few tunes and checking velocity switch points. Or I play a MIDI file that plays a number of check exercises to help find problem spots (e.g., that plays each note from softest to loudest, listening for bad velocity switches). Based on these results, I go back into record mode and delete a few samples or add a few, or adjust calibrations & mapping parameters, and try again.

When I'm happy with the results, I pull a "finish" plunger that performs postprocessing on all samples, as desired:

De-noising
Normalizing each sample
Converting to final format (e.g., 16b/44.1kHz)
Dithering

Ideally we'd allow standard audio plugins for this stage so that any desired FX can be applied. This stage also does, automatically, all final sample chopping and trimming. (A nice feature to add later would be to show the start & stop points for any sample and allow the user to override them.)

Of course, we'd want the option to use the same sample set but with different configurations of keymap parameters. For example, to allow the same soundfont to have different touch curves.

Another option would be to automatically prune the sample set by some percentage of file space to allow smaller soundfont files. The first step would be to create simply another keymap omitting various samples, so I could compare the full version with the shrunken one, and later export the shrunken one as a smaller soundfont with the unused samples omitted. (Basically, in sfz format this would just mean copying files to a new folder along with creating the simplified .sfz file.)

Notice I've said nothing about normal parameter editing or looping. Auto looping would be great, if there's a good DSP coder with the smarts to do that! But good looping tools exist already.

Parameter editing, like for filters, etc., is another matter. While there are good parameter editors, most of them assume a layered format for velocities, and would be very tedious to apply to a soundfont where the keyboard map looks more like an areal view of a jungle than that of a city. If you've really understood the implications of what I said above, you'll understand why this is so. So, clearly, the program will need a means to configure filter settings from a global or regional perspective, rather than specific settings applied to specific layers or samples. I'll need to give that considerably more thought.

learjeff · Post by **learjeff** » Sun Oct 02, 2005 5:29 pm

BTW, I posted a link to this thread in the soundfont forum, oriented more towards user input than developer input. Occasionally I'll update this thread with suggestions from there that I find particularly interesting. Here's the first, as I reinterpret it.

Pitch adjustment. For samples that aren't in pitch according to some user-selected pitch table, program the pitch fine tuning in the soundfont. Ideally, pitch tables would be user-editable/creatable, and we'd want standard as well as stretched pitch tables (in addition to any kind of pitch table anyone wants).

NeoVXR · Post by **NeoVXR** » Mon Oct 03, 2005 3:53 am

creating a host for professional plugins is very interesting, then the plugin can also be programmed with automation and setting files. you might support particular parameter sets and mappings for denoisers, compressors, VCF style plugs, etc..

another idea,
the whole thing is extremely data-related, I would recommend to build it entirely on a SQL database. after the first step it will be very fast and easy to implement more functions, and especially the management routines (undo, version control, import/export, renaming, sort and search).
a wide and useful issue is parameter meta tables.
for another file format or another plugin you add a row in the parameter table. if the GUI is not ready, enter the values and go.
for (allegedly) fixed parameter sets you can have a flat table, but a granular table might be optimal. something like:
[catalog vectors (sort and store names)[..]] [parameter name intern] [parameter name extern] [data type/format (SQL "domain")] [math mapping factors] [description (!!!)]
this would be a structural definition. then you have the value sets, with foreign keys [catalog vectors] and values [parameter name intern][value]
you can think of it like a linearized matrix, and also create meta matrix operators, that transform one file or parameter format into another.
same with soundfont settings, like the myriads of filter and velocitiy coefficients.

this will make every filehandling obsolete, except the input and output WAV files, and the soundfont imports and results.
it will make really an orderly project, after the data model and identification has been thought through.

best wishes for success!!

learjeff · Post by **learjeff** » Mon Oct 03, 2005 4:10 am

Good idea. I'm pretty ignorant about databases other than the most basic facts (most of which I've forgotten anyway). I'll keep this in mind and if there's someone who knows how to set it up when the time comes that'd be great.

learjeff · Post by **learjeff** » Mon Oct 03, 2005 4:14 am

I should add that I'm a big fan of using ascii files for saving all information, a practice that has served me well over the decades. This way, it's very easy to adjust the settings without a GUI for the purpose. Another virtue of this method is that, when done properly, it's very forward/backward compatible between versions. (You have to use a grammar where you can ignore a parameter and all its arguments if the parameter is unknown.)

NeoVXR · Post by **NeoVXR** » Mon Oct 03, 2005 4:52 am

ascii would be just another import/export format...

the reader/writer should of course not be the generic system tool but do what you are proposing - be very error-safe and tolerant.

duncanparsons · Post by **duncanparsons** » Mon Oct 03, 2005 11:22 am

I'd love to help, but being Delphi based might pose a portability problem..

Setting up a sourceforge site might be a good idea for tracking.

Consider xml rather than ascii file, since this allows for more flexibility and readability in project files. Note, I'm not a big xml fan (ie: not a bandwagon freeloader), and I give this advice as I see it might help wider acceptance/ adaptability of plugins.

Other than that, this looks exceptionally cool. I've been interested in you pitch detection algo's, but never asked

DSP

learjeff · Post by **learjeff** » Mon Oct 03, 2005 1:20 pm

Yup, xml has all the properties my home-grown ascii formats had (and many more), plus the benefits of standardization.

duncanparsons · Post by **duncanparsons** » Tue Oct 04, 2005 8:50 am

this appears to be quite a good lightwieght parser/ serialiser.

DSP

new project: soundfont assembler