Oregon Health & Science University
My thesis consists of two independents parts: (1) constrained clustering and (2) cognitive decline. Constrained Clustering: While clustering is usually executed completely unsupervised, there are circumstances in which we have prior belief (with varying degrees of certainty) that pairs of samples should (or should not) be assigned to the same cluster. These pairwise constraints are less informative than direct labeling of the samples, but are often considerably easier to obtain. We proposed two probabilistic clustering algorithms to make use of this kind of pairwise constraints. Our first algorithm, called Penalized Probabilistic Clustering (PPC), is based on Gaussian mixture models (GMMs), where our belief on the pairwise constraints are expressed as a prior probability on the assignments of data points to clusters. Unlike previous effort in this direction, this clustering model naturally accommodates both hard constraints and soft preferences in a framework. Although PPC and its follow-up models are successful in many applications, they also suffer from their limited modeling capability and inefficiency on using the pairwise constraints. Our second clustering algorithm is specifically designed to address these two limitations. Instead of adapting a traditional clustering model, we started from the Gaussian process classifiers (GPCs), a type of discriminative models carefully chosen for our specific constrained clustering requirement, and treated the pairwise relations as a special form of observation. The prior probability of the latent process is controlled with a kernel designed using the graph Laplacian of all the available data, thus we can make use of the samples that are not involved in pairwise relations. Cognitive Decline Detection: We studied the approaches to the detection of decline in people's cognitive ability based on the longitudinal clinical observations. The ultimate goal is to evaluate a subject's risk of becoming cognitively impaired at different age, given his or her past clinical observations including motor ability and neuro-psychological test score. Our work consists of two strongly related parts. In the first part, we studied modeling a population of similar time series with mixed-effect models. This mixed-effect model does not only capture the group characteristic of different population, but also provides a means to learn an effective prior for individual time series modeling. The second part of our project is a cross-sectional study, where we try to predict whether a cognitively healthy subject will later develop into cognitive impairment. Towards this end, we first constructed a probabilistic classifier based on mixed-effect models trained separately on healthy and impaired populations, and demonstrated the gain of discriminative power by modeling the individual-specific random effects. To circumvent the shortcomings of the generative model-based classifier, we also considered discriminative approaches. We adopted the support vector machine (SVM) with kernels specially developed for longitudinal time series. We extended the design of Fisher kernel to take mixed-effect models as the generative model based on its hierarchical structure. In addition, we proposed a non-parametric distance measure for time series based on Gaussian processes (GPs) and reproducing kernel Hilbert space (RKHS). A Gaussian kernel based on this distance measure were also used in SVM. Experiments show that the discriminative approaches yield improved classification accuracy over generative models on four motor ability observations, while on neuro-psychological test scores the two schools of methods have comparably good classification performances.
OGI School of Science and Engineering
Lu, Zhengdong, "Constrained clustering and cognitive decline detection" (2008). Scholar Archive. 330.