Pay someone to build app to split voice samples into pitch and noise components

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

If you can run Soundhack on your system, you might want to look at the Spectral Extraction system. This was designed to do exactly what you are trying to do (i.e split samples into pitch and noise components).

As SmashedTransistors points out, there are sounds that don't really fit into the "noise" or "pitch" categories in many real world sounds. Most voice sounds have a significant number of "noisy pitch" elements. The Spectral Extraction function allows you to set a threshold, but odds are good that you will still have some perceived pitch in the noise waveforms, OR you will have some noisiness in the "pitched" part of the waveform. Still, Spectral Extraction is a useful tool for modifying sounds. I used it in the past to extract the periodic parts of engine sounds from the noise part, back when I worked on game sounds.

Sean Costello

Post

To be more clear, the sms-tools package doesn't necessarily require programming. It has existing GUI interfaces to the models (navigate to sms-tools-master/software/models_interface/, then execute "python models_GUI.py"). You'll probably want the HPR (harmonic-plus-residual) model. It helps to go through some of the course, at least the videos in week 7, to have an idea of the best windows and fft and window sizes, but you can wing it if you have a basic understanding. You can set the tracking range and minimum track lengths to optimize what frequencies are considered pitch and not (the fact that this is a harmonic model already gives you a head-start there, but if you have a sound other than voice that has a lot of inharmonic content, then the sinusoidal model would be better).

The residual is then that captured harmonic component subtracted form the original. You can review the tracked frequency components graphically (zoomable—very helpful), and save the results to files.

Soundhack's capabilities look pretty crude by comparison, and more awkward. Maybe Sonic Visualizer has plug-ins to do this sort of thing as well, and is more modern. But personally I would go with the sms-tools, because I know exactly what it's doing, and the audible results are very good.
My audio DSP blog: earlevel.com

Post

Yeah, the HPR (harmonic plus residual) mode seems to be the ticket for me. That’s what he used as he did the organ example, and that example was *exactly* what I want.

Very addicting course, by the way. I keep watching “just one more” video. :D

This virtual machine stuff is new to me. Doesn’t look too complicated, although do you recommend any particular one? VirtualBox? VMWare? I imagine any would work, but considering how much heavy lifting I plan to do with this (I’ll be processing thousands and thousands of samples), I’d like to start with the preferred platform.

Thank you very much for pointing me in this direction, Nigel. This is a huge help.

Sean, depending on my time, I might also try out the Soundhack Spectral Extraction app. It does seem less full-featured than sms-tools, but the methodology for determining residual noise is different, and in fact, seems aimed pretty specifically at straight notes with harmonic content (exactly my situation), so I wonder if it might be quicker.

Post

Mike Greene wrote:This virtual machine stuff is new to me. Doesn’t look too complicated, although do you recommend any particular one? VirtualBox? VMWare? I imagine any would work, but considering how much heavy lifting I plan to do with this (I’ll be processing thousands and thousands of samples), I’d like to start with the preferred platform.
You're welcome Mike—glad to have a little experience that seemed to match what you were looking for. I really started the course as "I'd like to try a MOOC—this one looks interesting", and it ended up up being a good experience.

I have a lot of history with Parallels, going way back, but just for running Windows. But I work for EMC now, and not only do they have some virtual machines (Isilon clusters) that I work with that are configured for VMWare (which is owned by EMC—that means VMWare Fusion is free for me, full disclosure), so I started using it a couple of years ago. I still run my Parallels WinXP VM from time to time, but I've also opened that VM in VMWare Fusion and it runs fine there too. Getting Ubuntu up on VMWare was a snap, even though I'd never done it before.

I've pored over head-to-head review of Parallels and VMWare in the past, and again when major upgrades come along. They are pretty close, one might have small advantages over the other in certain areas (gamers, etc.—Parallels seems to always have the edge here), but both seem to be amazingly good and total bargains. If I were to be doing it for the first time, knowing what I know, I guess I'd have to go with VMWare. Part of that decision is knowing that VMWare dominates the virtualization market in the big picture.

I don't want to sound like a commercial—either will do the job easily, both are cheap. Here's a comparison.
My audio DSP blog: earlevel.com

Post

Mike, are you mac or pc based?

Ditto on VMWare fusion being easy and stable on Mac. Have used it for years on mac. The only time I really notice obvious slow performance is real time audio. I don't expect much more ambitious than stereo record or playback in a mac windows vm. Have done a bit of file based audio work in fusion over the years, converting gigabytes of audio files and such. It works surprisingly fast on file- based processing.

It is falling of the log easy to download and run ubuntu vm's. But I don't know linux and rarely use it. One of these days.

Would assume that vmware runs just as good and easy on windows pc, but never tried that. Nowadays I'm windows-centric, so spend most of my time running winders even on macs, never had a reason to run vmware on a pc.

Post

I’m a Mac boy. I installed VMWare Fusion and it went pretty easily. Ubuntu is now running and I can do the basics now, run Firefox, etc.

Installing sms-tools is proving to be a challenge, but admittedly, I’m not a guy who normally spends a lot of time on Terminal, so it may take some time. I got Python installed, using Terminal, but now I’m stuck on this step:
. . . after downloading the whole package, you need to compile some C functions. For that you should go to the directory software/models/utilFunctions_C and type:

$ python compileModule.py build_ext --inplace
When they say, “go to the directory . . . “ do they mean navigate to the folder in Finder (or whatever Ubuntu calls their version of Finder)? If so, then where do I type that $ python compileModule.py build_ext --inplace command? I typed it into Terminal, but it keeps telling me:
python: can’t open file compileModule.py (ERRNO 2) No such file or directory
I’ve tried dragging the folder from Finder onto the Terminal (which puts the address there,) then typing the $ python compileModule.py build_ext --inplace command, but that gives me errors, too. Is there some obvious mistake I’m making?

Post

Hi Mike

I'm very ignorant of unix, but in case it MIGHT be the right answer and be a "quick answer", try a "change directory" ( cd ) command to that directory. It is nice that the dragging file names to the terminal copies the text. Explained here--

http://en.wikipedia.org/wiki/Cd_(command)

Post

Mike, a few tips:

Assuming you put sms-tools in your user directory,

Code: Select all

cd ~/sms-tools-master/software/models
python compileModule.py build_ext --inplace
Note that in the terminal, I don't think you can use Control-c/v etc, but you can right-click for copy/paste. You can also go to Files (file browser), navigate where you want, then right-click the folder tabs above, or the files/folders themselves, and copy the path to paste into terminal—there is a "Paste Filenames" so that you don't need to edit the string.

Handy commands:

pwd # print working directory (see the full path of where you are—you might want to copy it)
ls -l # and other options, you probably know this already
My audio DSP blog: earlevel.com

Post

Nope, I didn’t know about “cd,” and I didn’t know about “ls” either. But I do now! :D And no, sms-tools was not in the user directory. (Learning Ubuntu, Python and Terminal will take some time. But I’m getting there!)

Made the corrections, used these newfangled commands (and a bunch of others), and i am now in business! I went ahead and watched all the videos from weeks one and two (like I said before, it’s addicting) and in one of the segments, he actually shows the installation process. I should have watched from the beginning in the first place. Not just for the concepts, but also watching someone use terminal and Python as he explains is really educational. Best way to learn IMO.

I haven’t gotten too deep yet, but I’ve played enough with this app collection to see that this is indeed what I was looking for. (Seriously, thank you, Nigel.) I can separate noise from tone (technically, a harmonic component and a residual component) so now I can proceed and see if my original phase-locking idea will pay off or not. (My theory is that you don’t want to lock phase unless you’ve stripped the noise/residual first. Otherwise there are too many ugly artifacts. Phase locking is ugly business.)

This is a ton of fun to play with. Finicky about what samples it will accept, though. Apparently only 16 bit 44.1k. Okay for now, but at some point I’ll need to tweak the code. I’m not sure how hard that will be.

These videos are great, by the way. I’m going to watch the whole course. Here’s the link again for anyone interested in breaking down and reconstructing sound. Click the “Preview Lectures” button at the top to access the videos:
https://www.coursera.org/course/audio

Post

Mike Greene wrote: Click the “Preview Lectures” button at the top to access the videos:
https://www.coursera.org/course/audio
Wellll... there goes my 'weekend'. 8)

:tu:
I'm not a musician, but I've designed sounds that others use to make music. http://soundcloud.com/obsidiananvil

Post

Mike Greene wrote:Finicky about what samples it will accept, though. Apparently only 16 bit 44.1k. Okay for now, but at some point I’ll need to tweak the code. I’m not sure how hard that will be.
Oh, I forgot about that, for the GUI.

As fas a unix commands, the GUI stuff in Ubuntu cuts down quite a bit on what you need to know, so you probably know most of what you'll need now. For instance, if you need to rename a file and are new to unix, you'd probably have to do a search online to find out that you use the mv (move) command. But having a field browser bypasses most of that sort of silliness.

Helpful bonus commands:

cd .. #back up one directory level
cd ../../models # back up two directory levels, and move forward into the models directory
cd / # go to the root directory
cd ~ # you've probably figured out this is *your* home directory, the same as /home/<you>

The course was an opportunity for me to learn Python, which I had been curious about, but had no motivation to learn. Nice to type single functions calls to iterate through arrays—a huge win when executing code interactively in ipython.
My audio DSP blog: earlevel.com

Post

We implemented an algo to divide between pitched overtones and noise in Chipspeech and it works well for our needs (voice).

Post

earlevel wrote:The course was an opportunity for me to learn Python, which I had been curious about, but had no motivation to learn.
I hear you. I'm now through "Week 4," and already feel like I could do some things with Python. Very fun, although I'm not sure if learning another language is the best use of my time. (I really should be finishing up the Realivox Men. Lets see, which would I rather do . . . edit thousands of tedious samples, or learn cool stuff with a new computer language . . . :D )

It's an interesting conundrum. (Or paradox, or whatever the word is.) I could forget the Python education and just pay someone to tweak the sms-tools app to do what I need. (Accept 24 bit files, for starters.) But until I learn a little bit about it myself, I can't even explain exactly what I need done. Or be sure what's easy and what's not. Or even what's possible.

The danger, of course, being that I may spend a bunch of time learning Python, but not get quite good enough to do the tweaks myself. And then find out I could have hired someone to do the tweaks and find out it would have only taken him an hour. :D

Post

Yes, good point—I haven't looked at it to see, but I'd hope that switching to 24-bit samples is just a matter of changing a single variable somewhere in the GUI model code. At worst, it's just the wav input/output part of the code, at least for bit depth, since the work is in floating point.

You mentioned the "men" product you're working on—Checked out "Ladies"...wow, very nice job—great product video too!
My audio DSP blog: earlevel.com

Post

As far as reading in 24-bit files, it looks like the problem is with the wavfile.read function in scipy. It looks like it intends to read in whatever format the file is in, as data of that type (int16, etc.), but it fails for 24-bit. The sms-tools function wavread would accept int16/32/64 and float32/64 if the spicy function didn't fail. The sms-tools function wav write forces a 16-bit write, but that could be change to 24-bit by modifying the two lines with "16"...doh, I think I just guessed the problem with scipy. It's a scientific package, and not necessarily an audio package. Without looking, I'll bet no money that it only understands data types with matching computer data types. In other words, 16-bit int, 32-bit int...oops, no such thing as an "int24" in python, so the code doesn't consider its existence in wav files—it should read 24-bit data into int32. Well, it wouldn't be hard to write a proper one, if some code isn't already sitting around some place (or use a C function).

The spicy code has no problem with other sample rates, but the sms-tools utilityFunctions code, wavered, checks and only accepts 44.1 kHz, because the code is configured for it.
My audio DSP blog: earlevel.com

Post Reply

Return to “DSP and Plugin Development”