Dept. of Computer Science and Engineering
Oregon Health & Science University
In this thesis, we present a latent data framework that facilitates formalizing observations of data behavior into statistical models. Using this framework, we derive two related models for a broad category of real-world data that includes images, speech data, and other measurements from natural processes. These models take the form of constrained Gaussian mixture models. Our statistical models lead to new algorithms for adaptive transform coding, a common method of signal compression, and adaptive principal component analysis, a technique for data modeling and analysis. Adaptive transform coding is a computationally attractive method for compressing non-stationary multi-variate data. A classic transform coder converts signal vectors to a new coordinate basis and then codes the transform coefficient values independently with scalar quantizers. An adaptive transform coder partitions the data into regions and compresses the vectors in each region with a custom transform coder. Prior art treats the development of transform coders heuristically, chaining sub-optimal operations together. Instead of this ad hoc approach, we start from a statistical model of the data. Using this model, we derive, in closed form, a new optimal linear transform for coding. We incorporate this transform into a new transform coding algorithm that provides an optimal solution for non-stationary signal compression. We evaluate our adaptive transform coder on the task of image compression. Our results show that a single adaptive transform coder can compress database images with quality comparable to or better than a set of current state-of-the art coders customized to each image in the database. Adaptive principal component analysis (PCA) is an effective modeling tool for high-dimensional data. Classic PCA models high-dimensional data by finding the closest low-dimensional hyperplane to the data. Adaptive or local PCA partitions data into regions and performs PCA on the data within each region. Prior art underestimates the potential of this method by requiring a single global target dimension for the model hyperplanes. We develop a statistical model of the data that allows the target dimension to adjust to the data structure. This formulation leads to a new algorithm for adaptive PCA, which minimizes dimension reduction error subject to an entropy constraint. The entropy constraint, which derives naturally from the probability model, effectively controls model complexity when training data is sparse. We evaluate our adaptive PCA models on two tasks; exploratory data analysis of salinity and temperature measurements from the Columbia River estuary and texture image segmentation. Our results show that entropy-constrained adaptive PCA conforms to the natural cluster structure of data better than state-of-the-art modeling methods.
OGI School of Science and Engineering
Archer, Cynthia, "A framework for representing non-stationary data with mixtures of linear models" (2002). Scholar Archive. 176.