Re: Audio to pitch is not a solved problem. Don't expect it to work for general cases anytime soon!

From: James Rogers (
Date: Sun May 28 2000 - 09:26:13 MDT

On Sat, 27 May 2000, phil osborn wrote:
> At the first CyberArts conference in L.A. in 1990 - the BEST digital arts
> conference ever held! - a man who created the systems used by the big record
> houses to clean up recordings demonstrated a live system into which he read
> a poem. He then stated that this system could take his input and output it
> as ANY sound, retaining legibiilty. Then he demonstrated it, having the
> system speak back his vocalization, using the sound of waves crashing on a
> beach. It was waves crashing on a beach, reading poetry, retaining perfect
> legibility and retaining his style. Clearly, such a system could take my
> rather lousy singing voice and map it to Caruso. So, where can I buy it?

There are a few different types of algorithms which can do this type of
effect, with varying degrees of precision and differing qualities to the
sound. Some of the simplest ones, such as vocoding, take very little DSP
power even when a high number of bands are used. Vocoding, by its nature,
has a "filtered" and often identifiable sound to it, but preserves a
lot of the character of the source when a large number of bands (say 30)
are used and is done with good (read expensive) filters.

What is being described above at the Cyberarts conference is probably an
audio convolving algorithm. These types of matrix algorithms are pretty
expensive DSP-wise to do in real-time on Red Book audio (16-bits @ 44.1kHz)
but can produce very clean results that do not have the processing
artifacts in the output signal unlike the computationally cheaper classes
of algorithms. In 1990 it would have required a high-end DSP
engine to do in real-time. It *is* interesting that the choice for the
second signal was waves crashing on a beach -- a classic "pink noise"
source. Most cheap algorithms do not preserve articulation as well with
noisy carriers (in this case almost pure noise), which lends support to a
convolving algorithm.

For a rough idea of the computational requirements for convolving
algorithms, I have a middle-aged audio sampler from E-mu Systems that
included a convolver utility as part of the operating system (EOS v2.5 at
the time). Although the processor is admittedly pretty weak (I believe it
is a Motorola 68k at around 25MHz), it takes the system around 10 minutes
of processing time per second of Red Book audio to convolve two samples.
Of course, a proper DSP would do these types of ops much faster at the
same clock speed.

Subjectively, convolvers are wicked cool audio processing utilities and
I've gotten great results with them on occasion when looking for
interesting sounds/textures. I actually have more use for convolvers than
the ever popular vocoders.

-James Rogers

This archive was generated by hypermail 2b29 : Thu Jul 27 2000 - 14:11:49 MDT