Context Navigation

A Phase Vocoder in Matlab.html

Last change on this file was 2, checked in by wouter, 3 years ago
AGAIN FORCE COMMIT FIRST RECOVERY
File size: 9.4 KB

Line
1	<!DOCTYPE html PUBLIC "-//w3c//dtd html 4.0 transitional//en">
2	<html><head>
3	<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
4	<meta name="GENERATOR" content="Mozilla/4.75C-CCK-MCD {C-UDP; EBM-APPLE} (Macintosh; U; PPC) [Netscape]">
5	<meta name="Author" content="Dan Ellis <dpwe@ee.columbia.edu>">
6	<meta name="Description" content="Describes and links to an implementation of the phase vocoder algorithm for time-scale modification of audio in the Matlab language.">
7	<meta name="KeyWords" content="matlab, audio, time-scale modification, phase vocoder, pvoc">
8	<title>A Phase Vocoder in Matlab</title>
9	</head>
10	<body alink="#0000FF" bgcolor="#FFFFFF" link="#0000FF" text="#000000" vlink="#551A8B">
11	<a href="http://www.ee.columbia.edu/%7Edpwe/">Dan
12	Ellis</a> : <a href="http://www.ee.columbia.edu/%7Edpwe/resources/">Resources</a>
13	: <a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/">Matlab</a>
14	:
15	<h1>
16	A Phase Vocoder in Matlab
17	<hr width="100%"></h1>
18
19	<h3>
20	Introduction</h3>
21	<p>
22	The Phase Vocoder [FlanG66, Dols86, LaroD99]
23	is an algorithm for timescale modification
24	of audio.  One way of understanding it is to think of it as stretching
25	or compressing the time-base of a spectrogram to change the temporal characteristics
26	of a sound while retaining its short-time spectral characteristics; if
27	the spectrogram is narrowband (analysis window longer than a pitch cycle,
28	so the individual harmonics are resolved), then preserving the spectral
29	characteristics implies preserving the pitch, and avoiding the 'slowing
30	down the tape' pitch drop.  The only complication to the algorithm
31	is that the phases associated with each bin in the modified spectrogram
32	image have to be 'fixed up' to maintain the dphase/dtime of the original,
33	thereby ensuring the correct alignment of successive windows in the overlap-add
34	reconstruction.
35	</p>
36
37	<p>I first wrote a phase vocoder in 1990 which eventually became the 'pvoc'
38	unit generator in Csound.  This implementation is a lot smaller and
39	took much less time to debug!  It first calculates the short-time
40	Fourier transform of the signal using 'stft';
41	'pvsample' then builds a modified spectrogram array by sampling the original
42	array at a sequence of fractional time values, interpolating the magnitudes
43	and fixing-up the phases as it goes along.  The resulting time-frequency
44	array can be inverted back into a sound with 'istft'.  The 'pvoc'
45	script is a wrapper to perform all three of these steps for a fixed time-scaling
46	factor (larger than one for speeding up; smaller than one to slow down).
47	But the underlying pvsample routine would also support arbitrary timebase
48	variation (freezing, reversal, modulation) if one wished to write a suitable
49	interface to specify the time path.</p>
50
51	<h3>Code</h3>
52	<p>These were developed on Matlab 5.0, but should work on any version.</p>
53	<ul>
54	<li>
55	<a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/pvoc/pvoc.m">pvoc.m</a> - the top-level routine</li>
56
57	<li>
58	<a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/pvoc/stft.m">stft.m</a> - calculate the STFT time-frequency representation</li>
59
60	<li>
61	<a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/pvoc/pvsample.m">pvsample.m</a> - interpolate/reconstuct the new STFT on the modified timebase</li>
62
63	<li>
64	<a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/pvoc/istft.m">istft.m</a> - overlap-add the modified STFT back into a waveform</li>
65
66	</ul>
67
68	<p>
69	Here's an example of how to use pvoc to slow down a soundfile of voice (sampled at 16 kHz) to 3/4 speed:</p>
70	<p><tt>»[d,sr]=wavread('sf1_cln.wav');</tt>
71	<br><tt>»sr</tt>
72	<br><tt>sr =</tt>
73	<br><tt>           16000</tt>
74	<br><tt>»% 1024 samples is about 60 ms at 16kHz, a good window</tt>
75	<br><tt>»y=pvoc(d,.75,1024);</tt>
76	<br><tt>»% Compare original and resynthesis</tt>
77	<br><tt>»sound(d,16000)</tt>
78	<br><tt>»sound(y,16000)</tt>
79	</p>
80
81	<p>
82	Here's how to use phase vocoder time-scale modification followed by
83	resampling to effect a pitch shift. In this case, we shift the pitch up
84	by a major third (by extending duration with the phase vocoder, then
85	resampling to the original length), then add it back to the initial
86	sound to give harmonization:</p>
87	<p><tt>»[d,sr]=wavread('<a href="http://www.ee.columbia.edu/%7Edpwe/resources/matlab/sinemodel/clar.wav">clar.wav</a>');</tt>
88	<br><tt>»e = pvoc(d, 0.8);</tt>
89	<br><tt>»f = resample(e,4,5); % NB: 0.8 = 4/5</tt>
90	<br><tt>»soundsc(d(1:length(f))+f,sr)</tt>
91	</p>
92	<p>(Thanks to Martín Rocamora for fixing the bug here!)</p>
93
94
95
96	<h3>FAQs</h3>
97
98	<p><b>Q. In pvsample.m, I see you first subtract dphi from the phase
99	difference, then add it back in before cumulating the phase. Why
100	bother?</b></p>
101
102	<p>A. dphi is set up as:</p>
103	<pre><tt>
104	dphi(2:(1 + N/2)) = (2pihop)./(N./(1:(N/2)));
105	</tt></pre>
106	<p>
107	It's the phase advance you'd expect to see from a sinusoid at the
108	center frequency of bin n of an N point FFT if you shifted the window
109	by hop points. We only worry about the lowest N/2+1 bins, since the
110	remainder are conjugate-symmetric.
111	</p>
112	<p><tt>N./(1:(N/2))</tt> is the cycle length of the sinusoids at the center of bins
113	1:N/2 of the FFT (counting from 0) -- i.e. FFT bin 1 corresponds to a
114	sinusoid that completes 1 cycle in N samples (period N/1), and the
115	highest bin (bin N/2) corresponds to a sinusoid that completes 1 cycle
116	every 2 samples i.e. period N/(N/2). So <tt>hop/(N./(1:N/2))</tt> is the
117	proportion of a cycle represented by hop samples, and <tt>2pi...</tt> is that
118	cycle proportion in radians.
119	</p>
120	<p>
121	We're interested in estimating the frequency of a sinusoid that would
122	give the phase difference we observe, but the phase difference is
123	modulo 2pi (i.e. we only know it to within +/- r.2pi), so we have to
124	guess a range. That's the function of dphi: our 'starting point' is
125	to assume the sinusoid exciting bin n is exactly at the center
126	frequency of that bin, in which case it would give an expected phase
127	difference of dphi(n). So the final value of dp in each column is
128	actually the deviation from the expected phase advance in each bin;
129	we can convert these into our best guess of the frequency in each bin
130	as <tt>freq(n) = 2pin/N + dp(n)/hop</tt> (in radians per sample).
131	</p>
132	<p>
133	When we come to reconstruct the output spectrogram, for each column we
134	cumulate a phase advance consistent with the current sampling point in
135	the original STFT - which is just the original phase difference,
136	assuming the output and input hop sizes match. But if the output
137	hopsize was different, we'd need to know the actual effective
138	frequency of the bin center, so we could scale it by a different ohop
139	before collapsing down to -pi:pi. That's when separating into dphi
140	and dp would be important.
141	</p>
142	<p>
143	But, you're correct, in the current code it does nothing!
144	</p>
145
146
147
148	<h3>References</h3>
149
150	<dl>
151	<p>
152	</p><dt><b>[FlanG66]</b></dt>
153	<dd>J. L. Flanagan, R. M. Golden, "Phase Vocoder,"
154	Bell System Technical Journal, November 1966, 1493-1509.
155	<br>
156	<a href="http://www.ee.columbia.edu/%7Edpwe/e6820/papers/FlanG66.pdf">
157	http://www.ee.columbia.edu/~dpwe/e6820/papers/FlanG66.pdf</a>
158	</dd>
159	<p></p>
160
161	<p>
162	</p><dt><b>[Port76]</b></dt>
163	<dd>M. R. Portnoff, "Implementation of the Digital Phase Vocoder Using the Fast Fourier Transform,"
164	IEEE Trans. Acous., Speech, Sig. Proc., 24(3), June 1976, 243-248.
165	<a href="http://www.ee.columbia.edu/%7Edpwe/papers/Portnoff76-pvoc.pdf">
166	http://www.ee.columbia.edu/~dpwe/papers/Portnoff76-pvoc.pdf</a>
167	</dd>
168	<p></p>
169
170	<p>
171	</p><dt><b>[Dols86]</b></dt>
172	<dd>
173	Mark Dolson, "The phase vocoder: A tutorial," Computer Music Journal,
174	vol. 10, no. 4, pp. 14 -- 27, 1986.
175	<br>
176	<a href="http://www.panix.com/%7Ejens/pvoc-dolson.par">
177	http://www.panix.com/~jens/pvoc-dolson.par</a>
178	</dd>
179	<p></p>
180
181	<p>
182	</p><dt><b>[LaroD99]</b></dt>
183	<dd>
184	Jean Laroche and Mark Dolson "New Phase Vocoder Technique for
185	Pitch-Shifting, Harmonizing and Other Exotic Effects".
186	IEEE Workshop on Applications of Signal Processing to Audio and
187	Acoustics. Mohonk, New Paltz, NY. 1999.
188	<br>
189	<a href="http://www.ee.columbia.edu/%7Edpwe/papers/LaroD99-pvoc.pdf">
190	http://www.ee.columbia.edu/~dpwe/papers/LaroD99-pvoc.pdf</a>
191	</dd>
192	<p></p>
193	</dl>
194
195	<p>
196	There are also recommended tutorials at
197	<a href="http://www.dspdimension.com/admin/time-pitch-overview/">
198	Stephan Bernsee's DSP dimension</a> and by
199	<a href="http://eceserv0.ece.wisc.edu/%7Esethares/vocoders/phasevocoder.html">Bill Sethares at Wisconsin</a>.
200	</p>
201
202	<h3>Referencing this work</h3>
203	<p>
204	I do not have any publication describing this code, since it is
205	basically a straightforward implementation of the Flanagan & Golden /
206	Dolson phase vocoder. However, if you use this code and would like
207	to acknowledge it with a reference, you could consider something like
208	this:
209	</p>
210	<pre><tt>@misc{Ellis02-pvoc
211	author = {D. P. W. Ellis},
212	year = {2002},
213	title = {A Phase Vocoder in {M}atlab},
214	note = {Web resource},
215	url = {http://www.ee.columbia.edu/~dpwe/resources/matlab/pvoc/},
216	}
217	</tt></pre>
218
219	<h3>History</h3>
220	<p><b>2003-03-06</b> Added pitch shifting/harmonization example</p>
221	<p><b>2002-02-13</b> Revised version uses stft/istft for perfect reconstruction when r = 1. More stuff on page.</p>
222	<p><b>2000-12-11</b> First version of this page, after demo'ing in E4810.</p>
223
224
225	<hr align="LEFT">
226	<address>
227	Last updated: $Date: 2010/10/26 18:55:10 $</address>
228
229	<br><a href="http://www.ee.columbia.edu/%7Edpwe/">Dan Ellis</a> <<a href="mailto:dpwe@ee.columbia.edu">dpwe@ee.columbia.edu</a>>
230	<br>
231
232
233	</body></html>

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: audiofilter/doc/columbia.edu/A Phase Vocoder in Matlab.html

Download in other formats: