Dept. of Science & Engineering
Oregon Health & Science University
Defined as self consistent smooth curves passing through the middle of the data, principal curves are used in many applications of machine learning as a generalization, dimensionality reduction and a feature extraction tool. The amount of smoothness and the middle of the data are not well-defined and the ill-posed definition of principal curves leads to practical difficulties in designing principal curve fitting algorithms, main causes of which are the desire to use global statistics such as conditional expectations to build a self consistent definition, and not decoupling the definition of principal curve from the data samples. We take a novel approach by redefining the concept of principal curves, surfaces and manifolds with a particular intrinsic dimensionality, which we characterize in terms of the gradient and the Hessian of the probability density estimate. The theory lays a geometric understanding of the principal curves and surfaces, and a unifying view for clustering, principal curve fitting and manifold learning by regarding those as principal manifolds of different intrinsic dimensionalities. Given the probability density of the data, the principal manifold of any intrinsic dimensionality and projections of any point in the feature space onto the principal manifold are uniquely defined. In real life, however, probability densities are never known, and should be estimated from the data samples. At this point, our definition of principal curves and surfaces does not impose any particular density estimation method, and we will provide results for kernel density estimation and Gaussian mixture models based density estimates. We will emphasize natural connections between challenges in the principal curve fitting and known results in kernel density estimation, and develop several practical algorithms to find principal curves and surfaces from data. To present practical aspects of our theoretical contribution, we apply our principal curve algorithms to a diverse set of problems including image segmentation, time warping, piecewise smooth signal denoising, manifold unwrapping, optical character skeletonization, sharpening of time-frequency distributions, multiple-input multiple output channel equalization and neighborhood graph construction. All in all, this dissertation presents a theoretical contribution that brings a novel understanding of principal curves and surfaces, practical algorithms as general purpose machine learning tools, and applications of these algorithms to practical problems in several research areas.
Div. of Biomedical Computer Science
School of Medicine
Ozertem, Umut, "Locally defined principal curves and surfaces" (2008). Scholar Archive. 333.