Date

October 2002

Document Type

Dissertation

Degree Name

Ph.D.

Department

Dept. of Computer Science and Engineering

Institution

Oregon Health & Science University

Abstract

In this thesis, we present a latent data framework that facilitates formalizing observations of data behavior into statistical models. Using this framework, we derive two related models for a broad category of real-world data that includes images, speech data, and other measurements from natural processes. These models take the form of constrained Gaussian mixture models. Our statistical models lead to new algorithms for adaptive transform coding, a common method of signal compression, and adaptive principal component analysis, a technique for data modeling and analysis. Adaptive transform coding is a computationally attractive method for compressing non-stationary multi-variate data. A classic transform coder converts signal vectors to a new coordinate basis and then codes the transform coefficient values independently with scalar quantizers. An adaptive transform coder partitions the data into regions and compresses the vectors in each region with a custom transform coder. Prior art treats the development of transform coders heuristically, chaining sub-optimal operations together. Instead of this ad hoc approach, we start from a statistical model of the data. Using this model, we derive, in closed form, a new optimal linear transform for coding. We incorporate this transform into a new transform coding algorithm that provides an optimal solution for non-stationary signal compression. We evaluate our adaptive transform coder on the task of image compression. Our results show that a single adaptive transform coder can compress database images with quality comparable to or better than a set of current state-of-the art coders customized to each image in the database. Adaptive principal component analysis (PCA) is an effective modeling tool for high-dimensional data. Classic PCA models high-dimensional data by finding the closest low-dimensional hyperplane to the data. Adaptive or local PCA partitions data into regions and performs PCA on the data within each region. Prior art underestimates the potential of this method by requiring a single global target dimension for the model hyperplanes. We develop a statistical model of the data that allows the target dimension to adjust to the data structure. This formulation leads to a new algorithm for adaptive PCA, which minimizes dimension reduction error subject to an entropy constraint. The entropy constraint, which derives naturally from the probability model, effectively controls model complexity when training data is sparse. We evaluate our adaptive PCA models on two tasks; exploratory data analysis of salinity and temperature measurements from the Columbia River estuary and texture image segmentation. Our results show that entropy-constrained adaptive PCA conforms to the natural cluster structure of data better than state-of-the-art modeling methods.

Identifier

doi:10.6083/M47P8WB5

School

OGI School of Science and Engineering

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.