Photosounder

pshute · Post by **pshute** » Tue Jan 01, 2013 2:30 am

I have loaded a wav file into PhotoSounder 1.8.3, and it displays a spectrogram, but I'd like to view the spectrogram with a linear frequency scale.

I turn the knob from the default 2 down to 1, and it now says "Linear". The frequency scale on the vertical axis changes to a linear one, but the spectrogram still looks the same. Only the waveform at the top changes.

Am I misunderstanding how this should work?

A_SN · Post by **A_SN** » Thu Jan 24, 2013 10:08 pm

pshute wrote:I have loaded a wav file into PhotoSounder 1.8.3, and it displays a spectrogram, but I'd like to view the spectrogram with a linear frequency scale.

I turn the knob from the default 2 down to 1, and it now says "Linear". The frequency scale on the vertical axis changes to a linear one, but the spectrogram still looks the same. Only the waveform at the top changes.

Am I misunderstanding how this should work?

That knob only changes the way the spectrogram is synthesised, not the way it is analysed.

There is one way to do what you want and it's not very pretty, but it works. You can make a test file that only contains this line:

Code: Select all

Analysis frequency scale 1.0

Save it as let's says "linear_analysis.pha" (the important part is that it has a .pha extension, not .txt), then open Photosounder, open linear_analysis.pha, only then load the sound file, and that should do it (just tested it here and it works). You'd have to load this script file before every time you want to load a sound if that's the way you want it. However remember that Photosounder's editing tools aren't really designed to work well with anything but the default logarithmic scale.

If you want to change more parameters that just this that are not in the config.txt file then have a look at this scripting reference http://photosounder.com/PHA_spec_v1.1.pdf

pshute · Post by **pshute** » Thu Jan 24, 2013 10:44 pm

Thanks, that's what I suspected. It's becoming obvious that this program is heavily oriented towards synthesis, whereas I only want to look at the spectrograms of existing recordings.

I became interested in this program because I read somewhere that you calculate the spectrogram a special way. I can't find the reference to it now, but I got the impression that you might be varying the FFT window size with frequency to make the image sharper for a greater frequency range than normal.

Is that the case, or have I got it all wrong?

A_SN · Post by **A_SN** » Thu Jan 24, 2013 10:51 pm

pshute wrote:Thanks, that's what I suspected. It's becoming obvious that this program is heavily oriented towards synthesis, whereas I only want to look at the spectrograms of existing recordings.

I became interested in this program because I read somewhere that you calculate the spectrogram a special way. I can't find the reference to it now, but I got the impression that you might be varying the FFT window size with frequency to make the image sharper for a greater frequency range than normal.

Is that the case, or have I got it all wrong?

Yes, that sounds about correct (although I have doubts about the very last thing you said, it's still a tradeoff, it doesn't make everything overall sharper), that's how by default the frequency resolution increases in the lower frequencies to the detriment of time resolution. However I don't see the interest for this if you use a linear frequency scale for analysis, the lower frequencies are already quite hard to see as it is in linear frequency, I don't see how having more resolution in there would help. That's the whole advantage of using a logarithmic scale, you get more space for the lower frequencies, and using the varying window size you get more resolution to use that extra space.

Jon_Nissenbaum · Post by **Jon_Nissenbaum** » Fri Feb 13, 2015 9:37 pm

I would like to change the spectrogram display to linear frequency scale without distorting the analysis. Using a linear frequency scale is not common for music, but it is the norm for speech analysis.

Specifically, I would like to do the following:
(1) import a speech file, and **display** only the frequencies from 0 Hz to 6 kHz using a linear frequency scale;
(2) create a new layer where I trace the vowel formants (and maybe some other features) in the spectrogram;
(3) synthesize the new layer.

Is there a way to do this?
Thanks,
Jon

A_SN · Post by **A_SN** » Fri Feb 13, 2015 10:37 pm

Jon_Nissenbaum wrote:I would like to change the spectrogram display to linear frequency scale without distorting the analysis. Using a linear frequency scale is not common for music, but it is the norm for speech analysis.

Specifically, I would like to do the following:
(1) import a speech file, and **display** only the frequencies from 0 Hz to 6 kHz using a linear frequency scale;
(2) create a new layer where I trace the vowel formants (and maybe some other features) in the spectrogram;
(3) synthesize the new layer.

Is there a way to do this?
Thanks,
Jon

Yes, see the first reply in this thread, it only takes creating a file with that one line and loading it before loading your sound.

However you have to wonder why you want to do this. The way it looks to me, analysing speech and animal calls through spectrograms is so old (much older than using digital computers for the task) that weird choices have become habits, and they're quite questionable as they fail to properly represent all characteristics of the sound. The use of linear scale for frequency squeezes everything on the low end together, it misrepresents the low end as being much stronger than it is, the chosen greyscale mapping often makes it difficult to properly see anything above a certain threshold of intensity as everything turns pitch black, the frequency resolution is way too low so a pure sinusoid might be shown as spreading over hundreds of Hertz (although you might say that it's desirable to fuse overtones together as to better show formants, it's still overdone and removes details, plus it overly emphasises the beating caused by the harmonics which doesn't really add much of value), and because of the unwieldiness of the linear frequency scale not only can't you see anything in the lower 500 Hz but you often have to crop above a certain frequency (in your case 6 kHz) because otherwise the rest of the spectrum takes too much space, when it might still contain relevant data (for instance I just said 'ssssssssss' into a microphone and most of the sibilant was between 4.6 kHz and 13 kHz, with the bulk being centred around 6.8 kHz), and just like the intensity of the low end is overemphasised, the high end is made to look quieter than it really is.

So that's why I think you should consider whether or not you really want to do that. I think Photosounder with its default settings gives a representation of speech that is much more accurate and gives you a very clear view of the pitch in voiced speech, and as for fusing overtones together you can use a script like this one http://photosounder.com/scripts/Formant ... ration.pha which separates the original sound into a de-formanted layer for overtones and a layer with the isolated formants (it does not separate noisy sounds like fricatives and sibilants from voiced sounds though). Doing that would give you a better basis to start tracing the formants on a new layer.

Jon_Nissenbaum · Post by **Jon_Nissenbaum** » Sat Feb 14, 2015 12:14 am

Thanks so much for your extremely quick and detailed reply. Much appreciated.
It may well be that, as you say, the tendency to analyze speech using linear scale spectrograms reflects a failure to break old habits rather than a sensible decision. Nevertheless, for better or worse this is the way it is done in the technical literature in linguistic phonetics. And for certain purposes I think it makes sense to disregard the higher frequencies, despite the fact that they often contain a lot of information. What I am interested in doing is creating "Sine Wave speech", which is like a caricature of natural speech. It is not intended to sound natural, and is used (among other things) for psycholinguistic experimentation. Interestingly, when sinewave speech was first developed at Haskins Laboratories, they used a technique that does something very similar to your software; they drew the formants by hand and synthesized the sound using sinusoids to play back the formants. The resulting sound is more like a set of overlapping whistles than speech, yet hearers are able to perceive the whistles as an abstract speech token.
In any case, I want to try using your suggestion. But I have to confess that I'm ignorant about the first step, namely how do I use the script? I tried saving both the simple "linear_analysis" and also your formant-separator scripts, and it did nothing. I saved them as text files and replaced the .rtf extension with .pha -- but nothing happened when I attempted to open either of those files first before opening my sound file. Sorry if this is an idiotic question, but I'd be grateful if you could let me know what I am supposed to do.

Best,
Jon

A_SN · Post by **A_SN** » Sat Feb 14, 2015 12:40 am

Jon_Nissenbaum wrote:Thanks so much for your extremely quick and detailed reply. Much appreciated.
It may well be that, as you say, the tendency to analyze speech using linear scale spectrograms reflects a failure to break old habits rather than a sensible decision. Nevertheless, for better or worse this is the way it is done in the technical literature in linguistic phonetics. And for certain purposes I think it makes sense to disregard the higher frequencies, despite the fact that they often contain a lot of information. What I am interested in doing is creating "Sine Wave speech", which is like a caricature of natural speech. It is not intended to sound natural, and is used (among other things) for psycholinguistic experimentation. Interestingly, when sinewave speech was first developed at Haskins Laboratories, they used a technique that does something very similar to your software; they drew the formants by hand and synthesized the sound using sinusoids to play back the formants. The resulting sound is more like a set of overlapping whistles than speech, yet hearers are able to perceive the whistles as an abstract speech token.
In any case, I want to try using your suggestion. But I have to confess that I'm ignorant about the first step, namely how do I use the script? I tried saving both the simple "linear_analysis" and also your formant-separator scripts, and it did nothing. I saved them as text files and replaced the .rtf extension with .pha -- but nothing happened when I attempted to open either of those files first before opening my sound file. Sorry if this is an idiotic question, but I'd be grateful if you could let me know what I am supposed to do.

Best,
Jon

Alright I've put the script with others here, http://photosounder.com/scripts.php it's the second-to-last script. All you have to do is open it like you normally open a file, then open your sound. You have to open it first before every time you load a sound if you want the linear scale.

The way you did it might not have worked because the .rtf format isn't the same as plain-text, you get a bunch of "{\rtf1\ansi\deff3\adeflang1025" which is nonsense to Photosounder.

Jon_Nissenbaum · Post by **Jon_Nissenbaum** » Sat Feb 14, 2015 1:12 am

Thanks again for the extremely fast reply.
I'm afraid to report that it is still not working. First, the linear analysis script (which I have saved first as a .txt file and then I replaced the file extension with .pha) does not load when I try to open it as a regular file. Nothing happens.

But something different happens when I try to load the formant_separation.pha script: Photosounder crashes. I'm using the demo version -- could that be the cause? Platform is MacOS X (Yosemite).
Any thoughts?

Thanks again,
Jon

A_SN · Post by **A_SN** » Sat Feb 14, 2015 1:47 am

Jon_Nissenbaum wrote:Thanks again for the extremely fast reply.
I'm afraid to report that it is still not working. First, the linear analysis script (which I have saved first as a .txt file and then I replaced the file extension with .pha) does not load when I try to open it as a regular file. Nothing happens.

But something different happens when I try to load the formant_separation.pha script: Photosounder crashes. I'm using the demo version -- could that be the cause? Platform is MacOS X (Yosemite).
Any thoughts?

Thanks again,
Jon

Try right-clicking on the link and do "Save Link As...", that should preserve the extension. When you load the script it doesn't visibly do anything at all, but it changes a parameter that is preserved when you load a sound file after it.

Did you try to load the Formant Separation script without a sound load? It will crash if you have no sound loaded. But wait I just realised the script doesn't even work on Mac because of Windows line endings.

Okay I've just updated all the scripts, now they should work on Windows and Mac. Can't believe no one in 4 years told me the scripts didn't work on Mac...

Jon_Nissenbaum · Post by **Jon_Nissenbaum** » Sat Feb 14, 2015 2:54 am

Thank you so much for doing this!

Now, each of the scripts individually works like a charm. For some reason, however, the application crashes when I load forman_separation.pha *after* first loading linear_analysis.pha and then the sound file.

If you have an idea of what is going on and if there is a quick fix, that would obviously be great... But on the other hand I intend to have some fun experimenting with this even as is.

Your help, and your amazing application, are very much appreciated!!!

Thanks again,
Jon

A_SN · Post by **A_SN** » Sat Feb 14, 2015 3:29 am

Jon_Nissenbaum wrote:Thank you so much for doing this!

Now, each of the scripts individually works like a charm. For some reason, however, the application crashes when I load forman_separation.pha *after* first loading linear_analysis.pha and then the sound file.

If you have an idea of what is going on and if there is a quick fix, that would obviously be great... But on the other hand I intend to have some fun experimenting with this even as is.

Your help, and your amazing application, are very much appreciated!!!

Thanks again,
Jon

Ah yeah I know what's going on, I think. The formant separation script specifies blurring height in semitones. Semitones don't mean anything as a consistent unit of height with a linear scale. Maybe editing that value "19 st" to something like "100 Hz" might do the trick.

That's why I'd like to drop non-logarithmic scales in Photosounder 2.0, it messes everything up.

Jon_Nissenbaum · Post by **Jon_Nissenbaum** » Sat Feb 14, 2015 4:14 am

That did the trick, the formant separation script now loads. I need to play with some of the other settings because it's not really extracting the formants properly, but I am comfortable proceeding from here on my own. In my opinion you might be too quick to dismiss the linear scale; I think it can be useful for some purposes. But anyway... I am so glad to have found the program.

Thanks a million!

-Jon

A_SN · Post by **A_SN** » Sat Feb 14, 2015 8:44 pm

Jon_Nissenbaum wrote:That did the trick, the formant separation script now loads. I need to play with some of the other settings because it's not really extracting the formants properly, but I am comfortable proceeding from here on my own. In my opinion you might be too quick to dismiss the linear scale; I think it can be useful for some purposes. But anyway... I am so glad to have found the program.

Thanks a million!

-Jon

I think the blurring parameter (the aforementioned 100 Hz) should be a function of your typical pitch, although I forgot whether it should be that same value or half as much. So if you have a voice around 300 Hz you might want to increase the blurring to 300 Hz I think.

The problem for me with the linear scale is that it takes much work to get everything on both types of scales, basically it's different enough that what works for one won't work for the other, and for the reasons I mentioned earlier there's a shortage of good reasons. So I'll see what I'll do. The main reasons for me supporting a linear scale would be so that you can put images in the spectrum with equal resolution from the top to the bottom of the image, and analyse those types of things back correctly. Also things like taking a scan of an old spectrogram (like, from a book or something) and turn it back into a sound. But on a music-focused program those things aren't a great priority.

So I'd probably leave an option to do that in there, but as it is now it would probably best be used with caution as not everything would be made with linear scale in mind.

Photosounder

KVR Audio

Why doesn't the spectrogram display update when I change settings?