source: audiofilter/doc/columbia.edu/A Phase Vocoder in Matlab.html

Last change on this file was 2, checked in by wouter, 3 years ago

AGAIN FORCE COMMIT FIRST RECOVERY

File size: 9.4 KB
Line 
1<!DOCTYPE html PUBLIC "-//w3c//dtd html 4.0 transitional//en">
2<html><head>
3 <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
4 <meta name="GENERATOR" content="Mozilla/4.75C-CCK-MCD {C-UDP; EBM-APPLE} (Macintosh; U; PPC) [Netscape]">
5 <meta name="Author" content="Dan Ellis &lt;dpwe@ee.columbia.edu&gt;">
6 <meta name="Description" content="Describes and links to an implementation of the phase vocoder algorithm for time-scale modification of audio in the Matlab language.">
7 <meta name="KeyWords" content="matlab, audio, time-scale modification, phase vocoder, pvoc">
8 <title>A Phase Vocoder in Matlab</title>
9</head>
10<body alink="#0000FF" bgcolor="#FFFFFF" link="#0000FF" text="#000000" vlink="#551A8B">
11<a href="http://www.ee.columbia.edu/%7Edpwe/">Dan
12Ellis</a> : <a href="http://www.ee.columbia.edu/%7Edpwe/resources/">Resources</a>
13: <a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/">Matlab</a>
14:
15<h1>
16A Phase Vocoder in Matlab
17<hr width="100%"></h1>
18
19<h3>
20Introduction</h3>
21<p>
22The Phase Vocoder [FlanG66, Dols86, LaroD99]
23is an algorithm for timescale modification
24of audio.&nbsp; One way of understanding it is to think of it as stretching
25or compressing the time-base of a spectrogram to change the temporal characteristics
26of a sound while retaining its short-time spectral characteristics; if
27the spectrogram is narrowband (analysis window longer than a pitch cycle,
28so the individual harmonics are resolved), then preserving the spectral
29characteristics implies preserving the pitch, and avoiding the 'slowing
30down the tape' pitch drop.&nbsp; The only complication to the algorithm
31is that the phases associated with each bin in the modified spectrogram
32image have to be 'fixed up' to maintain the dphase/dtime of the original,
33thereby ensuring the correct alignment of successive windows in the overlap-add
34reconstruction.
35</p>
36
37<p>I first wrote a phase vocoder in 1990 which eventually became the 'pvoc'
38unit generator in Csound.&nbsp; This implementation is a lot smaller and
39took much less time to debug!&nbsp; It first calculates the short-time
40Fourier transform of the signal using 'stft';
41'pvsample' then builds a modified spectrogram array by sampling the original
42array at a sequence of fractional time values, interpolating the magnitudes
43and fixing-up the phases as it goes along.&nbsp; The resulting time-frequency
44array can be inverted back into a sound with 'istft'.&nbsp; The 'pvoc'
45script is a wrapper to perform all three of these steps for a fixed time-scaling
46factor (larger than one for speeding up; smaller than one to slow down).&nbsp;
47But the underlying pvsample routine would also support arbitrary timebase
48variation (freezing, reversal, modulation) if one wished to write a suitable
49interface to specify the time path.</p>
50
51<h3>Code</h3>
52<p>These were developed on Matlab 5.0, but should work on any version.</p>
53<ul>
54<li>
55<a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/pvoc/pvoc.m">pvoc.m</a> - the top-level routine</li>
56
57<li>
58<a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/pvoc/stft.m">stft.m</a> - calculate the STFT time-frequency representation</li>
59
60<li>
61<a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/pvoc/pvsample.m">pvsample.m</a> - interpolate/reconstuct the new STFT on the modified timebase</li>
62
63<li>
64<a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/pvoc/istft.m">istft.m</a> - overlap-add the modified STFT back into a waveform</li>
65
66</ul>
67
68<p>
69Here's an example of how to use pvoc to slow down a soundfile of voice (sampled at 16 kHz) to 3/4 speed:</p>
70<p><tt>»[d,sr]=wavread('sf1_cln.wav');</tt>
71<br><tt>»sr</tt>
72<br><tt>sr =</tt>
73<br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 16000</tt>
74<br><tt>»% 1024 samples is about 60 ms at 16kHz, a good window</tt>
75<br><tt>»y=pvoc(d,.75,1024);</tt>
76<br><tt>»% Compare original and resynthesis</tt>
77<br><tt>»sound(d,16000)</tt>
78<br><tt>»sound(y,16000)</tt>
79</p>
80
81<p>
82Here's how to use phase vocoder time-scale modification followed by
83resampling to effect a pitch shift. In this case, we shift the pitch up
84 by a major third (by extending duration with the phase vocoder, then
85resampling to the original length), then add it back to the initial
86sound to give harmonization:</p>
87<p><tt>»[d,sr]=wavread('<a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/sinemodel/clar.wav">clar.wav</a>');</tt>
88<br><tt>»e = pvoc(d, 0.8);</tt>
89<br><tt>»f = resample(e,4,5); % NB: 0.8 = 4/5</tt>
90<br><tt>»soundsc(d(1:length(f))+f,sr)</tt>
91</p>
92<p>(Thanks to Martín Rocamora for fixing the bug here!)</p>
93
94
95
96<h3>FAQs</h3>
97
98<p><b>Q. In pvsample.m, I see you first subtract dphi from the phase
99difference, then add it back in before cumulating the phase. Why
100bother?</b></p>
101
102<p>A. dphi is set up as:</p>
103<pre><tt>
104 dphi(2:(1 + N/2)) = (2*pi*hop)./(N./(1:(N/2)));
105</tt></pre>
106<p>
107It's the phase advance you'd expect to see from a sinusoid at the
108center frequency of bin n of an N point FFT if you shifted the window
109by hop points. We only worry about the lowest N/2+1 bins, since the
110remainder are conjugate-symmetric.
111</p>
112<p><tt>N./(1:(N/2))</tt> is the cycle length of the sinusoids at the center of bins
1131:N/2 of the FFT (counting from 0) -- i.e. FFT bin 1 corresponds to a
114sinusoid that completes 1 cycle in N samples (period N/1), and the
115highest bin (bin N/2) corresponds to a sinusoid that completes 1 cycle
116every 2 samples i.e. period N/(N/2). So <tt>hop/(N./(1:N/2))</tt> is the
117proportion of a cycle represented by hop samples, and <tt>2*pi*...</tt> is that
118cycle proportion in radians.
119</p>
120<p>
121We're interested in estimating the frequency of a sinusoid that would
122give the phase difference we observe, but the phase difference is
123modulo 2pi (i.e. we only know it to within +/- r.2pi), so we have to
124guess a range. That's the function of dphi: our 'starting point' is
125to assume the sinusoid exciting bin n is exactly at the center
126frequency of that bin, in which case it would give an expected phase
127difference of dphi(n). So the final value of dp in each column is
128actually the deviation from the expected phase advance in each bin;
129we can convert these into our best guess of the frequency in each bin
130as <tt>freq(n) = 2*pi*n/N + dp(n)/hop</tt> (in radians per sample).
131</p>
132<p>
133When we come to reconstruct the output spectrogram, for each column we
134cumulate a phase advance consistent with the current sampling point in
135the original STFT - which is just the original phase difference,
136assuming the output and input hop sizes match. But if the output
137hopsize was different, we'd need to know the actual effective
138frequency of the bin center, so we could scale it by a different ohop
139before collapsing down to -pi:pi. That's when separating into dphi
140and dp would be important.
141</p>
142<p>
143But, you're correct, in the current code it does nothing!
144</p>
145
146
147
148<h3>References</h3>
149
150<dl>
151<p>
152</p><dt><b>[FlanG66]</b></dt>
153<dd>J. L. Flanagan, R. M. Golden, "Phase Vocoder,"
154Bell System Technical Journal, November 1966, 1493-1509.
155<br>
156<a href="http://www.ee.columbia.edu/%7Edpwe/e6820/papers/FlanG66.pdf">
157http://www.ee.columbia.edu/~dpwe/e6820/papers/FlanG66.pdf</a>
158</dd>
159<p></p>
160
161<p>
162</p><dt><b>[Port76]</b></dt>
163<dd>M. R. Portnoff, "Implementation of the Digital Phase Vocoder Using the Fast Fourier Transform,"
164IEEE Trans. Acous., Speech, Sig. Proc., 24(3), June 1976, 243-248.
165<a href="http://www.ee.columbia.edu/%7Edpwe/papers/Portnoff76-pvoc.pdf">
166http://www.ee.columbia.edu/~dpwe/papers/Portnoff76-pvoc.pdf</a>
167</dd>
168<p></p>
169
170<p>
171</p><dt><b>[Dols86]</b></dt>
172<dd>
173Mark Dolson, "The phase vocoder: A tutorial," Computer Music Journal,
174vol. 10, no. 4, pp. 14 -- 27, 1986.
175<br>
176<a href="http://www.panix.com/%7Ejens/pvoc-dolson.par">
177http://www.panix.com/~jens/pvoc-dolson.par</a>
178</dd>
179<p></p>
180
181<p>
182</p><dt><b>[LaroD99]</b></dt>
183<dd>
184Jean Laroche and Mark Dolson "New Phase Vocoder Technique for
185Pitch-Shifting, Harmonizing and Other Exotic Effects".
186IEEE Workshop on Applications of Signal Processing to Audio and
187Acoustics. Mohonk, New Paltz, NY. 1999.
188<br>
189<a href="http://www.ee.columbia.edu/%7Edpwe/papers/LaroD99-pvoc.pdf">
190http://www.ee.columbia.edu/~dpwe/papers/LaroD99-pvoc.pdf</a>
191</dd>
192<p></p>
193</dl>
194
195<p>
196There are also recommended tutorials at
197<a href="http://www.dspdimension.com/admin/time-pitch-overview/">
198Stephan Bernsee's DSP dimension</a> and by
199<a href="http://eceserv0.ece.wisc.edu/%7Esethares/vocoders/phasevocoder.html">Bill Sethares at Wisconsin</a>.
200</p>
201
202<h3>Referencing this work</h3>
203<p>
204I do not have any publication describing this code, since it is
205basically a straightforward implementation of the Flanagan &amp; Golden /
206Dolson phase vocoder. However, if you use this code and would like
207to acknowledge it with a reference, you could consider something like
208this:
209</p>
210<pre><tt>@misc{Ellis02-pvoc
211 author = {D. P. W. Ellis},
212 year = {2002},
213 title = {A Phase Vocoder in {M}atlab},
214 note = {Web resource},
215 url = {http://www.ee.columbia.edu/~dpwe/resources/matlab/pvoc/},
216}
217</tt></pre>
218
219<h3>History</h3>
220<p><b>2003-03-06</b> Added pitch shifting/harmonization example</p>
221<p><b>2002-02-13</b> Revised version uses stft/istft for perfect reconstruction when r = 1. More stuff on page.</p>
222<p><b>2000-12-11</b> First version of this page, after demo'ing in E4810.</p>
223
224
225<hr align="LEFT">
226<address>
227Last updated: $Date: 2010/10/26 18:55:10 $</address>
228
229<br><a href="http://www.ee.columbia.edu/%7Edpwe/">Dan Ellis</a> &lt;<a href="mailto:dpwe@ee.columbia.edu">dpwe@ee.columbia.edu</a>&gt;
230<br>&nbsp;
231
232
233</body></html>
Note: See TracBrowser for help on using the repository browser.