Author

Umut Ozertem

Date

September 2008

Document Type

Dissertation

Degree Name

Ph.D.

Department

Dept. of Science & Engineering

Institution

Oregon Health & Science University

Abstract

Defined as self consistent smooth curves passing through the middle of the data, principal curves are used in many applications of machine learning as a generalization, dimensionality reduction and a feature extraction tool. The amount of smoothness and the middle of the data are not well-defined and the ill-posed definition of principal curves leads to practical difficulties in designing principal curve fitting algorithms, main causes of which are the desire to use global statistics such as conditional expectations to build a self consistent definition, and not decoupling the definition of principal curve from the data samples. We take a novel approach by redefining the concept of principal curves, surfaces and manifolds with a particular intrinsic dimensionality, which we characterize in terms of the gradient and the Hessian of the probability density estimate. The theory lays a geometric understanding of the principal curves and surfaces, and a unifying view for clustering, principal curve fitting and manifold learning by regarding those as principal manifolds of different intrinsic dimensionalities. Given the probability density of the data, the principal manifold of any intrinsic dimensionality and projections of any point in the feature space onto the principal manifold are uniquely defined. In real life, however, probability densities are never known, and should be estimated from the data samples. At this point, our definition of principal curves and surfaces does not impose any particular density estimation method, and we will provide results for kernel density estimation and Gaussian mixture models based density estimates. We will emphasize natural connections between challenges in the principal curve fitting and known results in kernel density estimation, and develop several practical algorithms to find principal curves and surfaces from data. To present practical aspects of our theoretical contribution, we apply our principal curve algorithms to a diverse set of problems including image segmentation, time warping, piecewise smooth signal denoising, manifold unwrapping, optical character skeletonization, sharpening of time-frequency distributions, multiple-input multiple output channel equalization and neighborhood graph construction. All in all, this dissertation presents a theoretical contribution that brings a novel understanding of principal curves and surfaces, practical algorithms as general purpose machine learning tools, and applications of these algorithms to practical problems in several research areas.

Identifier

doi:10.6083/M4ZG6Q60

Division

Div. of Biomedical Computer Science

School

School of Medicine

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.