1 | <!DOCTYPE html PUBLIC "-//w3c//dtd html 4.0 transitional//en">
|
---|
2 | <html><head>
|
---|
3 | <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
|
---|
4 | <meta name="GENERATOR" content="Mozilla/4.75C-CCK-MCD {C-UDP; EBM-APPLE} (Macintosh; U; PPC) [Netscape]">
|
---|
5 | <meta name="Author" content="Dan Ellis <dpwe@ee.columbia.edu>">
|
---|
6 | <meta name="Description" content="Describes and links to an implementation of the phase vocoder algorithm for time-scale modification of audio in the Matlab language.">
|
---|
7 | <meta name="KeyWords" content="matlab, audio, time-scale modification, phase vocoder, pvoc">
|
---|
8 | <title>A Phase Vocoder in Matlab</title>
|
---|
9 | </head>
|
---|
10 | <body alink="#0000FF" bgcolor="#FFFFFF" link="#0000FF" text="#000000" vlink="#551A8B">
|
---|
11 | <a href="http://www.ee.columbia.edu/%7Edpwe/">Dan
|
---|
12 | Ellis</a> : <a href="http://www.ee.columbia.edu/%7Edpwe/resources/">Resources</a>
|
---|
13 | : <a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/">Matlab</a>
|
---|
14 | :
|
---|
15 | <h1>
|
---|
16 | A Phase Vocoder in Matlab
|
---|
17 | <hr width="100%"></h1>
|
---|
18 |
|
---|
19 | <h3>
|
---|
20 | Introduction</h3>
|
---|
21 | <p>
|
---|
22 | The Phase Vocoder [FlanG66, Dols86, LaroD99]
|
---|
23 | is an algorithm for timescale modification
|
---|
24 | of audio. One way of understanding it is to think of it as stretching
|
---|
25 | or compressing the time-base of a spectrogram to change the temporal characteristics
|
---|
26 | of a sound while retaining its short-time spectral characteristics; if
|
---|
27 | the spectrogram is narrowband (analysis window longer than a pitch cycle,
|
---|
28 | so the individual harmonics are resolved), then preserving the spectral
|
---|
29 | characteristics implies preserving the pitch, and avoiding the 'slowing
|
---|
30 | down the tape' pitch drop. The only complication to the algorithm
|
---|
31 | is that the phases associated with each bin in the modified spectrogram
|
---|
32 | image have to be 'fixed up' to maintain the dphase/dtime of the original,
|
---|
33 | thereby ensuring the correct alignment of successive windows in the overlap-add
|
---|
34 | reconstruction.
|
---|
35 | </p>
|
---|
36 |
|
---|
37 | <p>I first wrote a phase vocoder in 1990 which eventually became the 'pvoc'
|
---|
38 | unit generator in Csound. This implementation is a lot smaller and
|
---|
39 | took much less time to debug! It first calculates the short-time
|
---|
40 | Fourier transform of the signal using 'stft';
|
---|
41 | 'pvsample' then builds a modified spectrogram array by sampling the original
|
---|
42 | array at a sequence of fractional time values, interpolating the magnitudes
|
---|
43 | and fixing-up the phases as it goes along. The resulting time-frequency
|
---|
44 | array can be inverted back into a sound with 'istft'. The 'pvoc'
|
---|
45 | script is a wrapper to perform all three of these steps for a fixed time-scaling
|
---|
46 | factor (larger than one for speeding up; smaller than one to slow down).
|
---|
47 | But the underlying pvsample routine would also support arbitrary timebase
|
---|
48 | variation (freezing, reversal, modulation) if one wished to write a suitable
|
---|
49 | interface to specify the time path.</p>
|
---|
50 |
|
---|
51 | <h3>Code</h3>
|
---|
52 | <p>These were developed on Matlab 5.0, but should work on any version.</p>
|
---|
53 | <ul>
|
---|
54 | <li>
|
---|
55 | <a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/pvoc/pvoc.m">pvoc.m</a> - the top-level routine</li>
|
---|
56 |
|
---|
57 | <li>
|
---|
58 | <a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/pvoc/stft.m">stft.m</a> - calculate the STFT time-frequency representation</li>
|
---|
59 |
|
---|
60 | <li>
|
---|
61 | <a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/pvoc/pvsample.m">pvsample.m</a> - interpolate/reconstuct the new STFT on the modified timebase</li>
|
---|
62 |
|
---|
63 | <li>
|
---|
64 | <a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/pvoc/istft.m">istft.m</a> - overlap-add the modified STFT back into a waveform</li>
|
---|
65 |
|
---|
66 | </ul>
|
---|
67 |
|
---|
68 | <p>
|
---|
69 | Here's an example of how to use pvoc to slow down a soundfile of voice (sampled at 16 kHz) to 3/4 speed:</p>
|
---|
70 | <p><tt>»[d,sr]=wavread('sf1_cln.wav');</tt>
|
---|
71 | <br><tt>»sr</tt>
|
---|
72 | <br><tt>sr =</tt>
|
---|
73 | <br><tt> 16000</tt>
|
---|
74 | <br><tt>»% 1024 samples is about 60 ms at 16kHz, a good window</tt>
|
---|
75 | <br><tt>»y=pvoc(d,.75,1024);</tt>
|
---|
76 | <br><tt>»% Compare original and resynthesis</tt>
|
---|
77 | <br><tt>»sound(d,16000)</tt>
|
---|
78 | <br><tt>»sound(y,16000)</tt>
|
---|
79 | </p>
|
---|
80 |
|
---|
81 | <p>
|
---|
82 | Here's how to use phase vocoder time-scale modification followed by
|
---|
83 | resampling to effect a pitch shift. In this case, we shift the pitch up
|
---|
84 | by a major third (by extending duration with the phase vocoder, then
|
---|
85 | resampling to the original length), then add it back to the initial
|
---|
86 | sound to give harmonization:</p>
|
---|
87 | <p><tt>»[d,sr]=wavread('<a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/sinemodel/clar.wav">clar.wav</a>');</tt>
|
---|
88 | <br><tt>»e = pvoc(d, 0.8);</tt>
|
---|
89 | <br><tt>»f = resample(e,4,5); % NB: 0.8 = 4/5</tt>
|
---|
90 | <br><tt>»soundsc(d(1:length(f))+f,sr)</tt>
|
---|
91 | </p>
|
---|
92 | <p>(Thanks to Martín Rocamora for fixing the bug here!)</p>
|
---|
93 |
|
---|
94 |
|
---|
95 |
|
---|
96 | <h3>FAQs</h3>
|
---|
97 |
|
---|
98 | <p><b>Q. In pvsample.m, I see you first subtract dphi from the phase
|
---|
99 | difference, then add it back in before cumulating the phase. Why
|
---|
100 | bother?</b></p>
|
---|
101 |
|
---|
102 | <p>A. dphi is set up as:</p>
|
---|
103 | <pre><tt>
|
---|
104 | dphi(2:(1 + N/2)) = (2*pi*hop)./(N./(1:(N/2)));
|
---|
105 | </tt></pre>
|
---|
106 | <p>
|
---|
107 | It's the phase advance you'd expect to see from a sinusoid at the
|
---|
108 | center frequency of bin n of an N point FFT if you shifted the window
|
---|
109 | by hop points. We only worry about the lowest N/2+1 bins, since the
|
---|
110 | remainder are conjugate-symmetric.
|
---|
111 | </p>
|
---|
112 | <p><tt>N./(1:(N/2))</tt> is the cycle length of the sinusoids at the center of bins
|
---|
113 | 1:N/2 of the FFT (counting from 0) -- i.e. FFT bin 1 corresponds to a
|
---|
114 | sinusoid that completes 1 cycle in N samples (period N/1), and the
|
---|
115 | highest bin (bin N/2) corresponds to a sinusoid that completes 1 cycle
|
---|
116 | every 2 samples i.e. period N/(N/2). So <tt>hop/(N./(1:N/2))</tt> is the
|
---|
117 | proportion of a cycle represented by hop samples, and <tt>2*pi*...</tt> is that
|
---|
118 | cycle proportion in radians.
|
---|
119 | </p>
|
---|
120 | <p>
|
---|
121 | We're interested in estimating the frequency of a sinusoid that would
|
---|
122 | give the phase difference we observe, but the phase difference is
|
---|
123 | modulo 2pi (i.e. we only know it to within +/- r.2pi), so we have to
|
---|
124 | guess a range. That's the function of dphi: our 'starting point' is
|
---|
125 | to assume the sinusoid exciting bin n is exactly at the center
|
---|
126 | frequency of that bin, in which case it would give an expected phase
|
---|
127 | difference of dphi(n). So the final value of dp in each column is
|
---|
128 | actually the deviation from the expected phase advance in each bin;
|
---|
129 | we can convert these into our best guess of the frequency in each bin
|
---|
130 | as <tt>freq(n) = 2*pi*n/N + dp(n)/hop</tt> (in radians per sample).
|
---|
131 | </p>
|
---|
132 | <p>
|
---|
133 | When we come to reconstruct the output spectrogram, for each column we
|
---|
134 | cumulate a phase advance consistent with the current sampling point in
|
---|
135 | the original STFT - which is just the original phase difference,
|
---|
136 | assuming the output and input hop sizes match. But if the output
|
---|
137 | hopsize was different, we'd need to know the actual effective
|
---|
138 | frequency of the bin center, so we could scale it by a different ohop
|
---|
139 | before collapsing down to -pi:pi. That's when separating into dphi
|
---|
140 | and dp would be important.
|
---|
141 | </p>
|
---|
142 | <p>
|
---|
143 | But, you're correct, in the current code it does nothing!
|
---|
144 | </p>
|
---|
145 |
|
---|
146 |
|
---|
147 |
|
---|
148 | <h3>References</h3>
|
---|
149 |
|
---|
150 | <dl>
|
---|
151 | <p>
|
---|
152 | </p><dt><b>[FlanG66]</b></dt>
|
---|
153 | <dd>J. L. Flanagan, R. M. Golden, "Phase Vocoder,"
|
---|
154 | Bell System Technical Journal, November 1966, 1493-1509.
|
---|
155 | <br>
|
---|
156 | <a href="http://www.ee.columbia.edu/%7Edpwe/e6820/papers/FlanG66.pdf">
|
---|
157 | http://www.ee.columbia.edu/~dpwe/e6820/papers/FlanG66.pdf</a>
|
---|
158 | </dd>
|
---|
159 | <p></p>
|
---|
160 |
|
---|
161 | <p>
|
---|
162 | </p><dt><b>[Port76]</b></dt>
|
---|
163 | <dd>M. R. Portnoff, "Implementation of the Digital Phase Vocoder Using the Fast Fourier Transform,"
|
---|
164 | IEEE Trans. Acous., Speech, Sig. Proc., 24(3), June 1976, 243-248.
|
---|
165 | <a href="http://www.ee.columbia.edu/%7Edpwe/papers/Portnoff76-pvoc.pdf">
|
---|
166 | http://www.ee.columbia.edu/~dpwe/papers/Portnoff76-pvoc.pdf</a>
|
---|
167 | </dd>
|
---|
168 | <p></p>
|
---|
169 |
|
---|
170 | <p>
|
---|
171 | </p><dt><b>[Dols86]</b></dt>
|
---|
172 | <dd>
|
---|
173 | Mark Dolson, "The phase vocoder: A tutorial," Computer Music Journal,
|
---|
174 | vol. 10, no. 4, pp. 14 -- 27, 1986.
|
---|
175 | <br>
|
---|
176 | <a href="http://www.panix.com/%7Ejens/pvoc-dolson.par">
|
---|
177 | http://www.panix.com/~jens/pvoc-dolson.par</a>
|
---|
178 | </dd>
|
---|
179 | <p></p>
|
---|
180 |
|
---|
181 | <p>
|
---|
182 | </p><dt><b>[LaroD99]</b></dt>
|
---|
183 | <dd>
|
---|
184 | Jean Laroche and Mark Dolson "New Phase Vocoder Technique for
|
---|
185 | Pitch-Shifting, Harmonizing and Other Exotic Effects".
|
---|
186 | IEEE Workshop on Applications of Signal Processing to Audio and
|
---|
187 | Acoustics. Mohonk, New Paltz, NY. 1999.
|
---|
188 | <br>
|
---|
189 | <a href="http://www.ee.columbia.edu/%7Edpwe/papers/LaroD99-pvoc.pdf">
|
---|
190 | http://www.ee.columbia.edu/~dpwe/papers/LaroD99-pvoc.pdf</a>
|
---|
191 | </dd>
|
---|
192 | <p></p>
|
---|
193 | </dl>
|
---|
194 |
|
---|
195 | <p>
|
---|
196 | There are also recommended tutorials at
|
---|
197 | <a href="http://www.dspdimension.com/admin/time-pitch-overview/">
|
---|
198 | Stephan Bernsee's DSP dimension</a> and by
|
---|
199 | <a href="http://eceserv0.ece.wisc.edu/%7Esethares/vocoders/phasevocoder.html">Bill Sethares at Wisconsin</a>.
|
---|
200 | </p>
|
---|
201 |
|
---|
202 | <h3>Referencing this work</h3>
|
---|
203 | <p>
|
---|
204 | I do not have any publication describing this code, since it is
|
---|
205 | basically a straightforward implementation of the Flanagan & Golden /
|
---|
206 | Dolson phase vocoder. However, if you use this code and would like
|
---|
207 | to acknowledge it with a reference, you could consider something like
|
---|
208 | this:
|
---|
209 | </p>
|
---|
210 | <pre><tt>@misc{Ellis02-pvoc
|
---|
211 | author = {D. P. W. Ellis},
|
---|
212 | year = {2002},
|
---|
213 | title = {A Phase Vocoder in {M}atlab},
|
---|
214 | note = {Web resource},
|
---|
215 | url = {http://www.ee.columbia.edu/~dpwe/resources/matlab/pvoc/},
|
---|
216 | }
|
---|
217 | </tt></pre>
|
---|
218 |
|
---|
219 | <h3>History</h3>
|
---|
220 | <p><b>2003-03-06</b> Added pitch shifting/harmonization example</p>
|
---|
221 | <p><b>2002-02-13</b> Revised version uses stft/istft for perfect reconstruction when r = 1. More stuff on page.</p>
|
---|
222 | <p><b>2000-12-11</b> First version of this page, after demo'ing in E4810.</p>
|
---|
223 |
|
---|
224 |
|
---|
225 | <hr align="LEFT">
|
---|
226 | <address>
|
---|
227 | Last updated: $Date: 2010/10/26 18:55:10 $</address>
|
---|
228 |
|
---|
229 | <br><a href="http://www.ee.columbia.edu/%7Edpwe/">Dan Ellis</a> <<a href="mailto:dpwe@ee.columbia.edu">dpwe@ee.columbia.edu</a>>
|
---|
230 | <br>
|
---|
231 |
|
---|
232 |
|
---|
233 | </body></html> |
---|