Data reconstruction (was Dithering)

From: James Rogers (jamesr@best.com)
Date: Mon Oct 15 2001 - 01:22:26 MDT


On 10/14/01 10:37 PM, "Amara Graps" <Amara.Graps@mpi-hd.mpg.de> wrote:
>
> (It's not published yet.) The idea is to use the statistical
> characteristics of the data to fill in gaps for modelling
> purposes. Note: you are NOT creating *real* data (i.e. the folks who
> need to make predictions based on real data should not use this). But
> I think that this method has a valid use for those folks who can't
> apply a data analysis on their data because of small gaps in the
> time-series, or are trying to show long term dynamics of a system.
> For example, the 'standard' wavelet transforms which determine
> frequencies require as input time series data on an evenly-spaced time
> grid.

Just out of curiosity, why couldn't full spectral re-synthesis be used to
interpolate the data rather than using statistical models? Not trivial, but
I'm pretty sure it would work (without thinking about it too much).
Arguably the accuracy of the high-resolution spectral re-synthesis is going
to be extremely good and properly general if you are willing to throw a
little silicon at it. I tend to shy away from statistical models because
many, though not all, statistical models seem to be sensitive to anomalies
in the data i.e. even many of the adaptive ones are based on the "expected
case" and can do ugly things when they come across something unusual. (Also
most statistical models are specific cases of a general model waiting to be
discovered in my experience.)

Unrelated to this discussion, it kind of shocks me to see some of the
archaic methods of analyzing and working with time series data used in many
parts of industry and even science and engineering. The engineering
discipline of signal processing has very mature and extremely generalized
mathematics for handling just about any aspect of generic time series data
you could want in many cases, but many of the algorithms are rarely applied
outside of that discipline. I've applied these things in an "unorthodox"
manner for some projects in the past where I wow-ed people who I really
thought should have known better. I guess being a polymath has its
advantages.

 
> The problem with kriging, or any classical interpolation method, is
> that the method does not preserve the observed variability of the
> data. In addition, those methods cannot usually be calibrated to the
> analyzed observations. With this (which is what I think a much smarter
> approach), the temporal auto-correlation is honored and the observed
> variability of the data is conserved. A time series missing 20% of its
> values can be reconstructed without gaps preserving its temporal
> behavior in a statistical sense.

It should be possible to accurately re-synthesize the expected variability
in a way that preserves most of the hidden context if there is any and add
statistically correct noise if there isn't. Obviously the reconstructed
data isn't real, but you could theoretically generate essentially correct
data behaviors by convolving the "noise" fingerprint back into the
reconstructed data, particularly since such information could be extracted
cheaply during the primary re-synthesis stage of the data reconstruction.
This assumes you are doing a genuine re-synthesis rather than a much
computationally cheaper form of interpolation, but it is a minor problem
even if you aren't. There are a few existing niche applications that do
something like this already, so the idea isn't completely novel.

Now, this would be computationally very expensive if your time series is
large; doing this kind of correction in real-time with simple audio data
today requires a dozen or two high-end DSPs working in parallel. I can only
imagine how long it would take to do millions of data points on a standard
computer. However, the reason people spend the money to be able to do this
is that the reconstruction is absolutely superb and the algorithm is very
general; it doesn't generate any noticeable anomalies in the reconstructed
data even with relatively high loss. Ironically, most people don't use
these algorithms to add noise to data (quite the opposite), but they
certainly could, and apparently there is an application. :^)

I wrote a program that did a very limited, low-resolution version of
something similar to this about five years ago for a specific analytical
purpose. I doubt I still have the code, but it might be an interesting
mini-project to program a generalized, highly accurate, time-series
re-synthesis engine with a bunch of configurable controls. I'm not sure
such a general tool exists, but it might be nice to have.

Cheers,

-James Rogers
 jamesr@best.com



This archive was generated by hypermail 2b30 : Sat May 11 2002 - 17:44:13 MDT