Conference on Nonparametric Statistics and Statistical Learning
Blackwell Inn, The Ohio State University, Columbus, OH
May 19 - 22, 2010
May 20 (Thursday)
|| Plenary Talk
|| Ronald Randles (University of Florida)
Robustness of Location Estimators to Distortion
Robustness of affine equivariant location estimators is described relative to a model which creates infinitesimal distortion of a symmetric distribution. The model for distortion is based upon a class of distributions introduced by Fechner (1897). Properties of estimators are compared and contrasted in terms of their sensitivity to this type of distortion. This work is coauthored with Demetris Athienitis.
|| Parallel Sessions
||Data Mining (Invited)
Variable selection for decision making
I will address the topic of variable selection for decision making with a focus on decisions regarding when to provide treatment and which treatment to provide. Current variable selection techniques were developed for use in a supervised learning setting where the goal is a prediction of the response. These techniques often downplay the importance of interaction variables that have small predictive ability but that are critical when the ultimate goal is decision making rather than prediction. Two new techniques will be proposed which are designed specifically to find variables that aid in decision making. Simulation results are given, along with an application of the methods on data from a randomized controlled trial for the treatment of depression.
Kevin Killourhy (Carnegie Mellon University)
Cherry-picking for complex data: robust structure discovery
Complex data often arise as a superposition of data generated from several simpler models. The traditional strategy for such cases is to use mixture modeling, but it can be problematic, especially in higher dimensions. In this talk, I consider an alternative approach, emphasizing data exploration and robustness to model misspecification. I will focus on a problem in cluster analysis with promising implications for computer security, and I will also consider applications of this strategy to problems in regression and multidimensional scaling. The talk is comprised of work done in collaboration with David L. Banks and Leanna House.
Yichao Wu (North Carolina State University)
Non-crossing large-margin probability estimation and its application to robust SVM via preconditioning
Many large-margin classifiers such as the Support Vector Machine (SVM) sidestep estimating conditional class probabilities and target the discovery of classification boundaries directly. However, estimation of conditional class probabilities can be useful in many applications. Wang, Shen and Liu [J. Wang, X. Shen, Y. Liu, Probability estimation for large margin classifiers, Biometrika 95 (2008) 149-167] bridged the gap by providing an interval estimator of the conditional class probability via bracketing. The interval estimator was achieved by applying different weights to positive and negative classes and training the corresponding weighted large-margin classifiers. They propose to estimate the weighted large-margin classifiers individually. However, empirically the individually estimated classification boundaries may suffer from crossing each other even though, theoretically, they should not. In this work, we propose a technique to ensure non-crossing of the estimated classification boundaries. Furthermore, we take advantage of the estimated conditional class probabilities to precondition our training data. The standard SVM is then applied to the preconditioned training data to achieve robustness. Simulations and real data are used to illustrate their finite sample performance.
||Rank Based Methods (Invited)
Somnath Datta (University of Louisville)
Rank Tests for Clustered Paired Data When the Cluster Size is Potentially Informative
Rank based tests are alternatives to likelihood based tests popularized by their relative robustness and underlying elegant mathematical theory. There has been a serge in research activities in this area in recent years since a number of researchers are working to develop and extend rank based procedures to clustered dependent data which include situations with known correlation structures (e.g., as in mixed effects models) as well as more general form of dependence. In this talk, we consider the problem of testing the symmetry of a marginal distribution of paired differences under clustered data. However, unlike most other work in the area, we consider the possibility that the cluster size is a random variable that is statistically dependent on the variable of interest within a cluster. This situation typically arises when the clusters are defined in a natural way (e.g, not determined by the experimenter) and in which the size of the cluster may carry information about the distribution of data values within a cluster.
Denis Larocque (HEC Montreal)
Nonparametric Methods and Trees for Multiple Mixed Outcomes
Over the years, many nonparametric methods for multivariate outcomes have been proposed. These include coordinate-wise and spatial rank and sign methods and others based on different notions of depth. But these methods are aimed at the situation where all outcomes are continuous. In this talk, we will present new methods specifically designed for mixed outcomes, where some outcomes are categorical and others are continuous. In particular, we will discuss the two-sample testing problem with mixed outcomes and also present a recursive partitioning method for mixed outcomes. Parts of this talk are based on joint work with François Bellavance (HEC Montreal), Abdessamad Dine (HEC Montreal), Jaakko Nevalainen (University of Turku) and Hannu Oja (University of Tampere).
Davy Paindaveine (Université Libre de Bruxelles)
Rank tests for principal component analysis
We construct parametric and rank-based optimal tests for eigenvectors and eigenvalues of covariance or scatter matrices in elliptical families. The parametric tests extend the Gaussian likelihood ratio tests of Anderson (1963) and their pseudo-Gaussian robustifications by Davis (1977) and Tyler (1981, 1983), with which their Gaussian versions are shown to coincide, asymptotically, under Gaussian or finite fourth-order moment assumptions, respectively. Such assumptions however restrict the scope to covariance-based principal component analysis. The rank-based tests we are proposing remain valid without such assumptions. Hence, they address a much broader class of problems, where covariance matrices need not exist and principal components are associated with more general scatter matrices. Asymptotic relative efficiencies moreover show that those rank-based tests are quite powerful; when based on van der Waerden or normal scores, they even uniformly dominate the pseudo-Gaussian versions of Anderson's procedures. The tests we are proposing thus outperform daily practice both from the point of view of validity as from the point of view of efficiency. The main methodological tool throughout is Le Cam's theory of locally asymptotically normal experiments, in the nonstandard context, however, of a "curved" parametrization. This is joint work with Marc Hallin (Université Libre de Bruxelles) and Thomas Verdebout (Université Lille III).
||Variable Selection (Contributed)
Chris Hans (The Ohio State University)
Penalized Regression via Orthant Normal Priors
Motivated by penalized optimization approaches to variable selection and parameter estimation, this paper introduces a new class of prior distributions -- the orthant normal distribution -- for the regression coefficients in a Bayesian regression model. Parameter estimates based on penalized optimization are often interpreted as the mode of a Bayesian posterior distribution. We show that the orthant normal distribution is the prior that gives rise to the elastic net estimate and, in a limiting case, the lasso. By providing a complete characterization of this prior, we allow for model-based inference that moves beyond exclusive use of the posterior mode, including coherent Bayesian prediction and formal Bayesian model comparison. In contrast to penalized optimization procedures (where the penalty parameter is often selected via a potentially unstable cross validation), the Bayesian approach allows for uncertainty about these parameters to be included in the model, or, alternatively, allows the parameters to be selected via the method of maximum marginal likelihood. We show that the orthant normal distribution has a scale-mixture of normals representation, providing additional insight into the particular form of shrinkage employed by the elastic net. Posterior inference is achieved via MCMC. This model-based approach to elastic net regression has the advantage that the basic model can be extended to accommodate more complex regression settings. Models can be built that include random effects to capture various covariance structures while at the same time inducing elastic-net-like shrinkage on the regression coefficients. We discuss approaches for incorporating prior information about dependence structure in the covariates that resemble Zellner's g-prior but that allow for lasso-like shrinkage.
Woncheol Jang (University of Georgia)
Hexagonal Operator for Regression with Shrinkage and Equality Selection: HORSES
We propose a new method called HORSES (Hexagonal Operator for Regression with Shrinkage and Equality Selection) which performs variable selection for regression with positively correlated predictors. Like other penalized approaches, the HORSES estimator can be computed via a constrained least-squares problem. Our penalty terms compromise between the L1 penalty for the coefficients and another L1 penalty for pairwise differences of coefficients. This is joint work with Johan Lim.
Xingye Qiao (University of North Carolina at Chapel Hill)
Pairwise Variable Selection for Classification
While traditional marginal variable selection methods have the merits of convenient implementation and good interpretability, they do not take the joint effects among variables into account. In some situations, variables which have strong joint effects can be passed over by marginal methods because of their small marginal effects. In the context of binary classification in supervised learning, we develop a novel method of pairwise variable selection, based on a within-class permutation test to evaluate the statistical significance of joint effects. Moreover, we introduce a new notion of variable selection quality, bivariate False Discovery Rate (biFDR), and provide an estimation procedure for biFDR. A simulated example and a real data application are analyzed to demonstrate the usefulness of the proposed approach. This is a joint work with Yufeng Liu and J. S. Marron.
Kukatharmini Tharmaratnam (Katholieke Universiteit Leuven)
Robust version of the AIC based on M, S and MM estimators for variable selection in semiparametric mixed models
Variable selection in the presence of outliers may be performed by using a robust version of Akaike's information criterion AIC. In the first part explicit expressions are obtained for such criteria when S and MM estimators are used. In addition, a version of AIC based on robust quasi-likelihood M-estimation is included. The performance of these criteria is compared to the existing AIC based on M estimators and the classical non-robust AIC. In the second part we consider semiparametric models fitted by robust penalized regression splines using a mixed model representation. We develop a robust AIC to select both parametric and non-parametric components in such semiparametric mixed models and compare with a non-robust AIC. This work is co-authored by K. Tharmaratnam and G. Claeskens.
|| Plenary Talk
|| David Madigan (Columbia University)
Statistical Methods in Drug Safety
The pharmaceutical industry and regulatory agencies rely on various data sources to ensure the safety of licensed drugs. Recent high profile drug withdrawals have led to increased scrutiny of this activity. Many statistical challenges arise in this context. This talk will describe some of these data sources and the challenges they present, focusing especially on newer large-scale data analyses.
|| Parallel Sessions
||Federal Statistics (Invited)
John Eltinge (U.S. Bureau of Labor Statistics)
Three Classes of Open Questions in the Application of Nonparametric Regression and Machine Learning Methods to Sample Surveys
This paper reviews standard approaches to the use of auxiliary data in survey sampling, and then outlines three areas for potential extensions based on nonparametric regression and machine learning methods. (1) Diagnostics for sample design and weighting. (2) Integration of sample survey data with large amounts of administrative-record data. (3) Disclosure limitation. We consider these issues in the context of both standard inference for univariate estimands, and more realistic settings that involve a large number of estimands and a large number of stakeholders. These issues lead to some extensions of large-sample and small-deviation approximation methods for complex survey data. We explore these topics in the context of the U.S. Consumer Expenditure Survey.
Leming Shi (U.S. Food and Drug Administration)
Personalized Medicine: Genomics, Bioinformatics, and the FDA-led MAQC Project
Personalized medicine depends on reliable tools in genomics and bioinformatics. The MicroArray Quality Control (MAQC) project was originally launched by the US Food and Drug Administration (FDA) in 2005 to address concerns about the reliability of microarray technologies as well as bioinformatic data analysis issues (http://edkb.fda.gov/MAQC/). The first phase of MAQC (MAQC-I) evaluated technical performance of various microarray gene expression platforms and assessed advantages and limitations of competing data analysis methods for identifying differentially expressed genes or potential biomarkers (http://www.nature.com/nbt/focus/maqc/). MAQC-II aimed to reach consensus on "best practices" of developing and validating microarray-based predictive models for preclinical and clinical applications such as the prediction of outcomes of patients with breast cancer, multiple myeloma, or neuroblastoma. MAQC-III (SEQC) is evaluating technical performance and addressing bioinformatic challenges of next-generation sequencing in transcriptome and exome analyses. The MAQC project is expected to enhance our capabilities of understanding, predicting, and preventing serious adverse drug reactions via patient-specific genomic information (MAQC-IV or PADRE), helping FDA fulfill its mission of protecting consumers and promoting public health. Disclaimer: Views expressed in this presentation are those of the presenter and not necessarily those of the US FDA.
William Winkler (Census Bureau)
Machine Learning for Record Linkage, Text Categorization, and Edit/Imputation
Machine learning methods have been applied in statistical agencies. The initial application was using the EM algorithm for naïve Bayes and general Bayesian networks to obtain 'optimal' record linkage parameters without training data. The methods were used for production software during three Decennial Censuses. Optimal parameters vary significantly across approximately 500 regions for the U.S. and reduce clerical review by 2/3 in comparison with crude but knowledgeable guesses of parameters. A minor modification of the record linkage model can be used for semi-supervised learning for text categorization and extended to a generalization of boosting in which better models (general Bayes networks) involving increasing amounts of interactions between terms are learned. Finally, similar theory and the same computational algorithms (that are as much as 100 times as fast as algorithms in commercial software) can be adapted for learning edit/imputation models that account for edits (i.e., structural zeros such as a child of less than 16 cannot be married) and preserve joint distributions in a principled manner. Because the models are a complete probability structure, imputation and estimation of imputation variance are straightforward using variants of the modeling algorithms.
||Statistical Learning (Invited)
Chunming Zhang (University of Wisconsin-Madison)
High-Dimensional Regression and Classification Under A Class of Convex Loss Functions
We investigate applications of the adaptive Lasso to high-dimensional models for regression and classification under a wide class of loss functions. We show that for the dimension growing nearly exponentially with the sample size, the resulting adaptive Lasso estimator possesses the oracle property for suitable weights. Moreover, we propose two methods, called CR and PCR, for estimating weights. Theoretical advantages of PCR over CR are analyzed. In addition, the adaptive Lasso classifier is shown to be consistent. Simulation studies demonstrate the advantage of PCR over CR in both regression and classification. The effectiveness of the proposed method is illustrated using real data sets.
Yongdai Kim (Seoul National University)
On model selection criteria for high dimensional models
I will talk about model selection criteria for high dimensional regression models where the number of covariates is much larger than the sample size. I will give a class of model selection criteria which are consistent. Also, I will discuss about the minimax optimality of various model selection criteria on high dimensions.
Yufeng Liu (University of North Carolina at Chapel Hill)
Linear classifiers are very popular, but can suffer some serious limitations when the classes have distinct sub-populations. General nonlinear classifiers can give improved classification error rates, but do not give clear interpretation of the results. In this talk, we propose the Bi-Directional Discrimination (BDD) classification method which generalizes the classifier from one hyperplane to two hyperplanes. This gives much of flexibility of a general nonlinear classifier while maintaining the interpretability of linear classifiers. The performance and usefulness of the proposed method are assessed using asymptotics, and demonstrated through analysis of simulated and real data. This talk is based on joint work with Hanwen Huang and J. S. Marron.
||Nonparametric Tests (Contributed)
Stephen Bamattre (The Ohio State University)
Temporal stability of association between two variables within an enduring subpopulation
The tau-path is a technique to detect monotone associations between a pair of variables in an unspecified subpopulation. Using the Mallows model for rankings, the method is extended to estimate the temporal stability of observed subpopulations. This procedure is applied to a marketing data set from Nationwide Insurance to discover pairs of variables for which over time there is a stable association within an enduring subpopulation. Examples include the screening of predictor variables for use in geographically targeted models, without fixing the regions of interest beforehand.
Dean Barron (Twobluecats.com)
A two sample test based on rotationally superimposable permutations
Weaknesses in Kolmogoroff Smirnoff (KS) have been described; data which exploits these may result in incorrect statistical conclusions. For the two sample case, when all the observations from the first population appear consecutively and are ranked lowest, KS is statistically significant (D = 1, n>= 8). However, when these are ranked in the middle, KS is not statistically significant
(D~0.5, n<= 32). Intuitively, though, both situations reflect different underlying populations. An approach that addresses this, pawprint (PP), is introduced, which groups together rotationally superimposable permutations. The maximum KS value found within each group replaces its individual KS values, forming a new table with different critical values. Samples are drawn from a dataset to contrast and illustrate.
Yanling Hu (University of Kentucky)
Censored Empirical Likelihood with Over-determined Hazard-type Constraints
Qin and Lawless (1994) studied the empirical likelihood method with estimating equations. They obtained very nice asymptotic properties especially when the number of estimation equations are larger than the number of parameters (over determined case). We study here a parallel setup to Qin and Lawless but uses a hazard-type empirical likelihood function and hazard-type estimating equations. The advantage of using hazard is that censored data can be handled easily. We obtained similar asymptotic results for the maximum empirical likelihood estimator and the empirical likelihood ratio test, also for the over determined case. Two examples are provided to demonstrate the potential application of the result.
Mohamed Mahmoud (Al- Azhar University)
Non-parametric Testing for Exponentiality Against NBRUE Class of life Distributions Based on Laplace Transform
The main theme of this paper is to proposed a new test for exponentiality against new better than renewal used in expectation (NBRUE) based on Laplace transform. The asymptotic property of this test is studied and the Pitman's asymptotic efficiencies of it for three alternatives are calculated and compared with other tests for exponentiality. The critical values of this test are also calculated and tabulated for sample size n = 5(1) 50 as well as its power is estimated for some alternatives, which are used in reliability, using simulation study. Finally the test is applied to some real data. This work is co-authored by M. A. W. Mahmoud and M. H. S. Al-Loqmani.
|| Plenary Talk
||Michael Jordan (University of California, Berkeley)
Completely Random Measures, Hierarchies, and Nesting in Bayesian Nonparametrics
Bayesian nonparametric modeling and inference are based on using general stochastic processes as prior distributions. Despite the great generality of this definition, the great majority of the work in Bayesian nonparametrics is based on only two stochastic processes: the Gaussian process and the Dirichlet process. Motivated by the needs of applications, I present a broader approach to Bayesian nonparametrics in which priors are obtained from a class of stochastic processes known as "completely random measures" (Kingman, 1967). In particular I will present models based on the beta process, the Bernoulli process, the gamma process and the Dirichlet process, and on hierarchical and nesting constructions that use these basic stochastic processes as building blocks. I will discuss applications of these models to several problem domains, including protein structural modeling, computational vision, natural language processing and statistical genetics.
May 21 (Friday)
|| Parallel Sessions
||Nonparametric Bayes Methods (Invited)
Wesley Johnson (UC Irvine)
Bayesian Semi-parametric Methods in Biostatistics: Selective Update
We review some recent developments in the application of Bayesian nonparametric methodology to semi-parametric problems in the areas of receiver operating characteristic curve estimation, survival analysis, modeling longitudinal data and jointly modeling longitudinal and survival data. We begin with a brief review of Mixtures of Polya Trees and Dirichlet Process Mixtures, followed by illustrations based on real data. An emphasis is given to selecting among classes of semi-parametric models eg. in survival analysis with time dependent covariates, we may wish to choose among proportional hazards, proportional odds and Cox and Oaks accelerated failure time models.
Luis Nieto-Barajas (ITAM)
A Markov gamma random field for modeling respiratory infections in Mexico
In this talk we present a Markov gamma random field prior for modeling relative risks in disease mapping data. This prior process allows for a different dependence effect with different neighbors. We describe the properties of the prior process and derive posterior distributions. The model is extended to cope with covariates and a data set of respiratory infections of children in Mexico is used as an illustration.
Marina Vannucci (Rice University)
Spiked Priors for High-Dimensional Data
This talk will address parametric and nonparametric prior models for variable selection in high-dimensional settings. Linear models and generalized settings that allow for nonlinear interactions will be considered. Inferential strategies will be discussed. Applications will be to simulated data and real data with a large number of variables.
||Statistical Learning (Invited)
Robert Krafty (University of Pittsburgh)
Canonical Correlation Analysis of Spectral and Multivariate Cross-Sectional Data
In many studies, stationary time series data and cross-sectional outcomes are collected from several independent units. Often the primary goal of the study to quantify the association between the cross-sectional outcomes and the second order spectral properties of the time series. This article addresses this question by introducing a data driven procedure for performing a canonical correlation analysis (CCA) between the log-spectra and cross-sectional outcomes. The isometry between the Hilbert space of linear combinations of a second order stochastic process and the reproducing kernel Hilbert space generated by its covariance kernel allows for a formulation of CCA whose canonical correlates and weight functions can be estimated via estimates of the covariance kernel of the log-spectra and cross-covariance kernel between the log-spectra and cross-sectional data. A penalized Whittle-likelihood based procedure is offered for obtaining method-of-moments type estimates of the mean log-spectra, the covariance kernel of the log-spectra, and the cross-covariance kernel. A new criterion for the selection of smoothing parameters to optimally estimate the linear relationship between the log-spectra and cross-sectional outcomes is introduced. This criterion minimizes the conditional Kullback-Leibler distance between the unit-specific log-spectra and the best linear unbiased predictors of the unit-specific log-spectra from the cross-sectional outcomes and log-periodograms under the estimated covariance structure. The proposed CCA procedure is used to analyze the association between the heart rate variability power spectrum during sleep and multiple measures of sleep.
Hernando Ombao (Brown University)
Functional Connectivity as a Potential Biomarker for Classification
In this talk, we will discuss models that use functional connectivity as a potential biomarker for classification. This work is motivated by the HAND experiment where participants moved the joystick either to the left or to the right upon instruction. The goal is determine the time-frequency network in the multi-channel EEG signals that could discriminate between left and right movements and also predict or classify future movements based on a single-trial multi-channel EEG. We first enumerate some potential measures of connectivity in a brain network, namely, partial coherence and mutual information. Next, we discuss methods for estimating the network. One of the key statistical challenges is that partial coherence estimates are typically obtained by inverting the spectral density matrix which may be near-singular especially when the time series in the network exhibit a high degree of cross-correlation. To avoid numerical instability, we estimate the spectral density matrix via a shrinkage procedure which is a weighted average of an initial periodogram estimator and a simple parametric estimator (e.g., based on the vector AR model). The shrinkage estimator is more computationally stable than the classical smoothed periodogram and gives a lower mean-squared error than the multi-taper method and kernel-smoothing approaches. The method will be applied to EEGs recorded during a visuo-motor experiment.
Raquel Prado (UC Santa Cruz)
Models and algorithms for on-line detection of cognitive fatigue
This work is motivated by the analysis of multiple brain signals recorded during an experiment that aimed to characterize mental fatigue in real time. The recorded brain signals can be modeled via mixtures of autoregressive (AR) processes and state-space autoregressions with structured priors on the AR coefficients. Such prior structure allows researchers to incorporate scientifically meaningful information related to various states of mental alertness. We focus on the implementation of sequential Monte Carlo methods for on-line parameter learning and filtering. We illustrate how the AR-based models can be used to describe electroencephalographic signals recorded from a subject who performed basic arithmetic calculations continuously for a period of three hours.
||Density Estimation (Contributed)
José E. Chacón
(Departamento de Matematicas, Universidad de Extremadura)
Unconstrained bandwidth matrices for multivariate kernel estimation of the density and density derivatives
Multivariate kernel estimation is an important technique in exploratory data analysis. The crucial factor which determines the performance of kernel estimation is the bandwidth matrix. Research in finding optimal bandwidth matrices began with restricted parametrizations of the bandwidth matrix which mimic univariate selectors. Progressively these restrictions were relaxed to develop more flexible procedures. A major obstacle for progress has been the intractability of the matrix analysis when treating higher order multivariate derivatives. With an alternative vectorization of these higher order derivatives, these mathematical intractabilities can be surmounted in an elegant and unified framework. In this paper we present some recent advances on the use of unconstrained bandwidth matrices for multivariate kernel estimation of the density and density derivatives.
Catherine Forbes (Monash University)
Non-Parametric Estimation of Forecast Distributions in Non-Gaussian State Space Models
This paper provides a methodology for the production of non-parametric estimates of forecast distributions, in a general non-Gaussian, non-linear state space setting. The transition densities that define the evolution of the dynamic state process are represented in closed parametric form, with the conditional distribution of the measurement error variable estimated non-parametrically. The requisite recursive filtering and prediction distributions are computed as functions of the unknown conditional error. The method is illustrated in the context of several financial models with a particular focus on the production of sequential, real time forecast distributions for volatility. This work is co-authored by Jason Ng, Catherine S. Forbes, Gael M. Martin and Brendan P.M. McCabe.
Alexandre Leblanc (University of Manitoba)
On the Boundary Effects of Bernstein Polynomial Estimators of Density and Distribution Functions
For density and distribution functions supported on [0,1], Bernstein polynomial estimators are known to have optimal Mean Integrated Squared Error (MISE) properties under the usual smoothness conditions on the function to be estimated. These estimators are also known to be well-behaved in terms of bias, as they exhibit no boundary bias. In this talk, we will discuss the fact that these estimators nevertheless do experience boundary effects. However, these boundary effects are of a different nature than what is seen, for example, with usual kernel estimators.
Leming Qu (Boise State University)
Copula density estimation by wavelet domain penalized likelihood with linear equality constraints
A copula density is the joint probability density function (PDF) of a random vector with uniform marginals. An approach to bivariate copula density estimation is introduced that is based on a maximum penalized likelihood estimation (MPLE) with penalty term being the L1 norm of the density's wavelet coefficients. The marginal unity and symmetry constraints for copula density are enforced by linear equality constraints. The L1-MPLE subject to linear equality constraints is solved by an iterative algorithm. A data-driven selection of the regularization parameter is discussed. Simulation and real data application show the effectiveness of the proposed approach.
|| Plenary Talk
|| Peter Muller (M.D. Anderson Cancer Center)
Bayesian Clustering with Regression
We propose a model for covariate-dependent clustering, i.e., we develop a probability model for random partitions that is indexed by covariates. The motivating application is inference for a clinical trial. As part of the desired inference we wish to define clusters of patients. Defining a prior probability model for cluster memberships should include a regression on patient baseline covariates. We build on product partition models (PPM). We define an extension of the PPM to include the desired regression. This is achieved by including in the cohesion function a new factor that increases the probability of experimental units with similar covariates to be included in the same cluster.
|| Parallel Sessions
||Rank Set Sampling (Invited)
Johan Lim (Seoul National University)
A kernel density estimator for the ranked set samples
In this paper, we study a kernel density estimator for the ranked set samples. We derive the asymptotic bias and variance of the estimator and find the optimal bandwidth that minimizes the integrated mean squared error (IMSE). We propose a leave-one-out cross validation procedure to find the bandwith in practice. We numerically investigate the performance of the proposed kernel estimator. We further extend the proposed methodology to estimate a symmetric density. Finally, our method is applied to estimating the density of tree data published in the pervious literature. This work is co-authored by Johan Lim, Min Chen and Sangun Park.
Kaushik Ghosh (University of Nevada - Las Vegas)
A unified approach to variations of ranked set sampling
In this talk, we develop a general theory of inference using data collected from different variations of ranked set sampling. Such variations include balanced and unbalanced ranked set sampling, balanced and unbalanced k-tuple ranked set sampling, nomination sampling, simple random sampling, as well as a combination of them. We provide methods of estimating the underlying distribution function as well as its functionals and establish the asymptotic properties of the resulting estimators. The results so obtained can be used to develop nonparametric procedures for one- and two-sample problems. We also investigate small-sample properties of these estimators and conclude with an application to a real-life example.
Xinlei Wang (Southern Methodist University)
Isotonized Estimators for Judgment Post-stratification Samples
Judgment post-stratification (JP-S) is a data collection method introduced by MacEachern, Stasny and Wolfe (2004), based on ideas similar to those in ranked set sampling. In this research, for JP-S data, we propose isotonized estimators of the mean and cumulative density function (CDF) of a population of interest, which exploit the fact that the distributions of the judgment post-strata are often stochastically ordered. Further for JP-S data with small sample sizes, we deal with the problem of empty cells, and propose modified isotonized estimators of the CDF. All these new estimators are examined by simulation studies and illustrated with data examples.
||Ranking Procedures (Invited)
Carlos Guestrin (Carnegie Mellon University)
Riffled Independence for Ranked Data
Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called riffled independence, encompassing a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the riffle shuffle, common in card games, to combine the two permutations to form a single permutation. Within the context of ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. In this talk, we provide a formal introduction to riffled independence and present algorithms for using riffled independence within Fourier-theoretic frameworks which have been explored by a number of recent papers. Additionally, we propose an automated method for discovering sets of items which are riffle independent from a training set of rankings. We show that our clustering-like algorithms can be used to discover meaningful latent coalitions from real preference ranking datasets and to learn the structure of hierarchically decomposable models based on riffled independence. This talk is joint work with Jonathan Huang.
Paul Kidwell (Lawrence Livermore National Laboratory)
A kernel density estimate for the probabilities of rankings with ties and missing items
Ranking data is frequently encountered and is not easily modeled due to the issues of ties or missing data. Previous modeling efforts have established non-parametric kernel estimation as an effective tool for modeling rankings. A discrete analogue to the triangular kernel is developed which through its combinatoric and statistical properties allows the non-parametric approach to be efficiently applied in the case of ties and extended to missing data. This approach readily extends to a scheme for visualization of ranking data which is intuitive, easy to use, and computationally efficient.
Guy Lebanon (Georgia Institute of Technology)
Visualizing Similarities between Search Engines using the Weighted Hoeffding Distance on Permutations
We explore the use of multidimensional scaling in visualizing relationships between different search engines, and between different search strategies employed by users. In the talk we will discuss the appropriateness of different metrics for this task and present some experimental results using some well known search engines.
|| Sparse Estimation (Contributed)
Bin Li (Louisiana State University)
Robust and Sparse Bridge Regression
It is known that when there are heavy-tailed errors or outliers in the response, the least squares methods may fail to produce a reliable estimator. In this paper, we proposed a generalized Huber criterion which is highly flexible and robust for large errors. We applied the new criterion to the bridge regression family, called Robust and Sparse Bridge Regression (RSBR). However, to get the RSBR solution requires solving a nonconvex minimization problem, which is a computational challenge. On the basis of recent advances in difference convex programming, coordinate descent algorithm and local linear approximation, we provide an efficient computational algorithm that attempts to solve this nonconvex problem. Numerical examples show the proposed RSBR algorithm performs well and suitable for large-scale problems.
Philippe Rigollet (Princeton University)
Optimal rates of sparse estimation and universal aggregation
A new procedure called "Exponential screening" (ES) is developed and proved to satisfy a set optimal sparsity oracle inequalities for Gaussian regression. These oracle inequalities entail not only adaptation to sparsity but also show that ES solves simultaneously and optimally all the aggregation problems previously studied. Even though the procedure is simple, its implementation is not straightforward but it can be approximated using the Metropolis algorithm, which results in a stochastic greedy algorithm and performs surprisingly well in a simulated problem of sparse recovery.
Adam Rothman (University of Michigan)
Sparse estimation of a multivariate regression coefficient matrix
We propose a procedure for constructing a sparse estimator of a multivariate regression coefficient matrix that accounts for correlation of the response variables. This method, which we call multivariate regression with covariance estimation (MRCE), involves penalized likelihood with simultaneous estimation of the regression coefficients and the covariance structure. An efficient optimization algorithm and a fast approximation are developed for computing MRCE. Using simulation studies, we show that the proposed method outperforms relevant competitors when the responses are highly correlated. We also apply the new method to a finance example on predicting asset returns.
Jeffrey Simonoff (New York University)
RE-EM Trees: A New Data Mining Approach for Longitudinal Data
Longitudinal data refer to the situation where repeated observations are available for each sampled individual. Methodologies that take this structure into account allow for systematic differences between individuals that are not related to covariates. A standard methodology in the statistics literature for this type of data is the random effects model, where these differences between individuals are represented by so-called "effects" that are estimated from the data. This paper presents a methodology that combines the flexibility of tree-based estimation methods with the structure of random effects models for longitudinal data. We apply the resulting estimation method, called the RE-EM tree, to pricing in online transactions, showing that the RE-EM tree provides improved predictive power compared to linear models with random effects and regression trees without random effects. We also perform extensive simulation experiments to show that the estimator improves predictive performance relative to regression trees without random effects and is comparable or superior to using linear models with random effects in more general situations, particularly for larger sample sizes. This is joint work with Rebecca J. Sela.
|| Plenary Talk
||David Banks (Duke University)
How We Got Here -- The Rise of Data Mining
Modern data mining is the child of statistics and computer science, with database management serving as the midwife. From the statistical side, much of the initial motivation derived from the philosophy of nonparametrics. From the computer science side, much of the impetus came from the interest in artificial intelligence. This talk reviews the interactions between these perspectives, describing the key developments that shaped the course of this emerging field.
|| Parallel Sessions
||Nonparametric Bayes Methods (Invited)
Purushottam Laud (Medical College of Wisconsin)
A Dirichlet Process Mixture Model Allowing for Mode of Inheritance Uncertainty in Genetic Association Studies
A desirable model for use in genetic association studies simultaneously considers the effect of all genetic markers and covariates. This invariably requires considering a large number of genetic markers, most of which are unrelated to the phenotype. Moreover, at each marker, the model should allow a variety of modes of inheritance: namely, additive, dominant, recessive, or over-dominant effects. MacLehose and Dunson (2009) have described a flexible multiple shrinkage approach to high-dimensional model building via Bayesian nonparametric priors. The use of these priors facilitates data-driven shrinkage to a random number of random prior locations. Adapting such techniques, we develop Bayesian semi-parametric shrinkage priors at two levels that allow data-driven shrinkage towards the various inheritance modes and, within each mode, shrinkage towards a random number of random effect sizes. The proposed method offers a natural way of incorporating into the inference the uncertainty in the mode of inheritance at each marker. We illustrate the proposed method on simulated data based on the International HapMap Project.
Steven MacEachern (The Ohio State University)
Regularization and case-specific parameters
Statisticians have long used case-specific parameters as a device to remove outlying and influential cases from an analysis. Decisions on inclusion of the parameters have traditionally been made on the basis of the size of the residual. The rise of regularization methods allows us to approach case-specific analysis in a different fashion. To exploit the power of regularization, we augment the "natural" covariates in a problem with an additional indicator for each case in the data set. We attach a penalty term for these case-specific indicators which is designed to produce a desired effect. For regression methods with squared error loss, an L1 penalty produces a regression which is robust to outliers and high leverage cases; for quantile regression methods, an L2 penalty decreases the variance of the fit enough to overcome an increase in bias. The paradigm thus allows us to robustify procedures which lack robustness and to increase the efficiency of procedures which are robust. We provide a general framework for the inclusion of case-specific parameters in regularization problems, provide new insight into existing techniques (specifically, Huber's robust regression), and illustrate the benefits of the new methodology. This is joint work with Yoonkyung Lee and Yoonsuh Jung.
Fernando Quintana (Pontificia Universidad Católica de Chile)
Bayesian Nonparametric Longitudinal Data Analysis with Embedded Autoregressive Structure: Application to Hormone Data
We develop a novel Dirichlet Process Mixture model for irregular longitudinal data. The model mixes on the two parameters of the traditional Ornstein-Uhlenbeck process with exponential covariance function and thus allows for the possibility of multiple groups with distinct autoregressive covariance structure. We illustrate the use of the model to track hormone curve data through the menopausal transition, and we also test the model on simulated data, both to check its performance in estimating mean functions as well as a variety of covariance structures.
|| Statistical Learning (Invited)
Ejaz Ahmed (University of Windsor)
Absolute Penalty and Shrinkage Estimation in Partially Linear Models
In this talk we address the problem of estimating a vector of regression parameters in a partially linear model. Our main objective is to provide natural adaptive estimators that significantly improve upon the classical procedures in the situation where some of the predictors are nuisance variables that may or may not affect the association between the response and the main predictors. In the context of two competing regression models (full and sub-models), we consider shrinkage estimation strategy. The shrinkage estimators are shown to have higher efficiency than the classical estimators for a wide class of models. We develop the properties of these estimators using the notion of asymptotic distributional risk. Further, we proposed absolute penalty type estimator (APE) for the regression parameters which is an extension of the LASSO method for linear models. The relative dominance picture of the estimators are established. Monte Carlo simulation experiments are conducted and the non-parametric component is estimated based on kernel smoothing and B-spline. Further, the performance of each procedure is evaluated in terms of simulated mean squared error. The comparison reveals that the shrinkage strategy performs better than the APE (LASSO) strategy when, and only when, there are many nuisance variables in the model. We conclude this talk by applying the suggested estimation strategies on a real data set which illustrates the usefulness of procedures in practice. This is joint work with K. Doksum and E. Raheem.
Liza Levina (University of Michigan)
Community extraction and network perturbations
Analysis of networks and in particular discovering communities within networks has been a focus of recent work in several fields, with applications ranging from citation and friendship networks to food webs and gene regulatory networks. Most of the existing community detection methods focus on partitioning the entire network into communities, with the expectation of many ties within communities and few ties between. However, in a real network there are often nodes that do not belong to any of the communities, and forcing every node into a community can distort results. Here we propose a new framework that focuses on community extraction instead of partition, extracting one community at a time. We show that the new criterion performs wells on simulated and real networks, and establish asymptotic consistency of our method under the block model assumption. In the second part of the talk, I will briefly describe a method for assessing the quality of community detection by its robustness to random network perturbations. The first part of the talk is joint work with Ji Zhu and Yunpeng Zhao (Statistics, University of Michigan); the second part is joint work with Mark Newman and Brian Karrer (Physics, University of Michigan).
Ming Yuan (Georgia Institute of Technology)
Sparse Regularization for High Dimensional Additive Models
We study the behavior of the l1 type of regularization for high dimensional additive models. Our results suggest remarkable similarities and differences between linear regression and additive models in high dimensional settings. In particular, our analysis indicates that, unlike in linear regression, l1 regularization does not yield optimal estimation for additive models of high dimensionality. This surprising observation prompts us to introduce a new regularization technique that can be shown to be optimal in the minimax sense.
||Robust Statistics (Contributed)
Richard Charnigo (University of Kentucky)
Nonparametric Derivative Estimation and Posterior Probabilities for Nanoparticle Characteristics
The characterization of nanoparticles from surface wave scattering data is of great interest in applied engineering because of its potential to advance nanoparticle-based manufacturing concepts. Meanwhile, a recent development in methodology for the nonparametric estimation of a mean response function and its derivatives has provided a valuable tool for nanoparticle characterization: namely, a mechanism to identify the most plausible configuration for a collection of nanoparticles given the estimated derivatives of surface wave scattering profiles from those nanoparticles. In this talk, after briefly reviewing the preceding work, we propose an extension that additionally furnishes posterior probabilities for the various possible configurations of nanoparticles. An empirical study is included as a demonstration. This is collaborative work with Mathieu Francoeur, Patrick Kenkel, M. Pinar Menguc, Benjamin Hall, and Cidambi Srinivasan.
Juan A. Cuesta-Albertos (Universidad de Cantabria)
Similarity of Distributions and Impartial Trimming
We say that two probabilities are similar at level c if they are contaminated versions (up to an c fraction) of the same common probability. In this talk we show how a data-driven trimming aimed to maximize similarity between distributions can be used to decide if two samples were obtained from two distributions which are similar at level c, based on the fact that the empirical distributions present an over (under)-fitting effect in the sense that trimming more (less) that the similarity level results in trimmed samples which are much closer (farther) than expected to each other. We provide illustrative examples and give some asymptotic results to justify the use of this methodology in applications. This is a joint work with Profs. P. Alvarez-Esteban, E. del Barrio and C. Matran from Universidad de Valladolid, Spain.
Kiheiji Nishida (University of Tsukuba)
On the variance-stabilizing multivariate nonparametric regression estimation
In linear regression under heteroscedastic variances, the Aitken estimator is employed to counter heteroscedasticity. Employing the same principle, we propose the multivariate Nadaraya-Watson (NW) regression estimator with variance-stabilizing bandwidth matrix (VS bandwidth matrix) that minimizes asymptotic MISE while maintaining asymptotic homoscedasticity. Our proposed bandwidth matrix is diagonal by the assumption that the sphering approach is available and is defined by global and local parameters. The NW regression estimation based on VS bandwidth matrix does not produce discontinuous point unless the density of X is sparse. This is one advantage over MSE minimizing bandwidth matrix.
Michal Pesta (Charles University in Prague, Czech Republic)
Robustified total least squares and bootstrapping with application in calibration
The solution to the errors-in-variables (EIV) problem computed through total least squares (TLS) or robustified TLS is highly nonlinear. Because of this, many statistical procedures for constructing confidence intervals and testing hypotheses cannot be applied. One possible solution to this dilemma is bootstrapping. Justification for use of the nonparametric bootstrap technique is given. On the other hand, the classical residual bootstrap could fail. Proper residual bootstrap procedure is provided and its correctness proved. The results are illustrated through a simulation study. An application of this approach to calibration data is presented.
May 22 (Saturday)
|| Parallel Sessions
||Machine Learning (Invited)
Sayan Mukherjee (Duke University)
Geometry and Topology in Inference
We use two problems to illustrate the utility of geometry and topology in statistical inference: supervised dimension reduction (SDR), and inference of (hyper) graph models. We start with a "tale of two manifolds." The focus is on the problem of supervised dimension reduction (SDR). We first formulate the problem with respect to the inference of a geometric property of the data, the gradient of the regression function with respect to the manifold that supports the marginal distribution. We provide an estimation algorithm, prove consistency, and explain why the gradient is salient for dimension reduction. We then reformulate SDR in a probabilistic framework and propose a Bayesian model, a mixture of inverse regressions. In this modeling framework the Grassman manifold plays a prominent role. The second part of the talk develops a parameterization of hypergraphs based on the geometry of points in ddimensions. Informative prior distributions on hypergraphs are induced through this parameterization by priors on point configurations via spatial processes. The approach combines tools from computational geometry and topology with spatial processes and offers greater control on the distribution of graph features than Erdos-Renyi random graphs.
Sijian Wang (University of Wisconsin)
Regularized REML for Estimation and Selection of Fixed and Random Effects in Linear Mixed-Effects Models
The linear mixed effects model (LMM) is widely used in the analysis of clustered or longitudinal data. In the practice of LMM, inference on the structure of random effects component is of great importance not only to yield proper interpretation of subject-specific effects but also to draw valid statistical conclusions. This task of inference becomes significantly challenging when a large number of fixed effects and random effects are involved in the analysis. The difficulty of variable selection arises from the need of simultaneously regularizing both mean model and covariance structures, with possible parameter constraints between the two. In this paper, we propose a novel method of regularized restricted maximum likelihood to select fixed and random effects simultaneously in the LMM. The Cholesky decomposition is invoked to ensure the positive-definiteness of the selected covariance matrix of random effects, and selected random effects are invariant with respect to the ordering of predictors appearing in the model. We develop a new algorithm that solves the related optimization problem effectively, in which the computational load turns out to be comparable with that of the Newton-Raphson algorithm for MLE or REML in the LMM. We also investigate large sample properties for the proposed estimation, including the oracle property. Both simulation studies and data analysis are included for illustration.
Jian Zhang (Purdue University)
Large-Scale Learning by Data Compression
An important challenge in machine learning is how to efficiently learn from massive training data sets, especially with limited storage and computing capability. In this talk we introduce an efficient learning method called "compressed classification", which aims to compress observations into a small number of pseudo-examples before classification. By analyzing the convergence rate of the risk, we show the classifiers learned from compressed data can closely approximate the non-compressed classifiers by effectively reducing the noise variance. We also present a hierarchical local grouping algorithm to iteratively split observations into local groups, which leads to a faster compression process than the single-layer counterpart. Our experiments with simulated and real datasets show that the proposed local-grouping-based compression method can outperform several other compression methods, and achieve competitive performance with non-compressed baseline using much less learning time for both small-scale and large-scale classification problems.
||Data Depth (Invited)
Xin Dang (University of Mississippi)
Kernelized Spatial Depth on Outlier Detection and Graph Ranking
Statistical depth functions provide center-outward ordering of points with respect to a distribution or a date set in high dimensions. Of the various depth notions, the spatial depth is appealing because of its computational efficiency. However, it tends to provide circular contours and fail to capture well the underling probabilistic geometry outside of the family of spherically symmetrical distributions. We propose a novel depth, the kernelized spatial depth(KSD), which generalizes the spatial depth via positive definite kernels. By choosing a proper kernel, the KSD captures the local structure of data while the spatial depth fails. Based on KSD, a simple outlier detector is proposed, by which an observation with a depth value less than a threshold is declared as an outlier. Upper bounds of the swamping effect (false alarm probability) are derived and used to determine the threshold. The KSD outlier detector demonstrates a competitive performance on simulated data and data sets from real applications. We also extend KSD to graph data, where pairwise relationships of objects are given and represented by edges. Several graph kernels including a new proposed one, complement Laplacian kernel, are considered for ranking the "centrality" of graph nodes. An application of graph KSD to gene data will be briefly discussed also.
Regina Liu (Rutgers University)
DD-Classifier: A new Nonparametric Classification Procedure
Most existing classification algorithms are developed by assuming either certain parametric distributions for the data or certain forms of separating surfaces. Either assumption can greatly limit the applicability of the algorithm. We introduce a novel nonparametric classification algorithm using the so-called DD-plot. This algorithm is completely nonparametric, requiring no prior knowledge of the underlying distributions or of the form of the separating surface. Thus it can be applied to a wide range of classification problems. The algorithm can be easily implemented and its classification outcome can be clearly visualized on a two-dimensional plot regardless of the dimension of the data. The asymptotic properties of the proposed classifier and its misclassification rate are studies. The DD-classifier is shown to be asymptotically equivalent to the Bayes rule under suitable conditions. The performance of DD- classifier is also examined by using simulated and real data sets. Overall, DD-classifier performs well across a broad range of settings, and compares favorably with most existing nonparametric classifiers. This is joint work with Juan Cuesta-Albertos (Universidad de Cantabria, Spain) and Jun Li (UC Riverside).
Robert Serfling (University of Texas at Dallas)
Robust, Affine Invariant, Computationally Easy Nonparametric Multivariate Outlyingness Functions
Identification of possible outliers in multivariate is of paramount importance. We desire methods which are robust, computationally easy, and affine invariant Versions based on the Mahalanobis distance meet these criteria but impose ellipsoidal contours. The spatial and projection outlyingness functions avoid this constraint but the former lacks full affine invariance and sufficient robustness, while the second is computationally intensive. Can we develop outlyingness functions which retain the favorable properties of Mahalanobis distance without confining to ellipsoidal contours? We review multivariate outlyingness functions and introduce standardizations of multivariate data which produce affine invariance of outlyingness functions. A new "spatial trimming" method is introduced to robustify the spatial approach. A notion of strong invariant coordinate system functional is introduced to standardize finite projection pursuit vectors. With these methods, we construct new outlyingness functions that are robust, affine invariant, and computationally competitive with robust Mahalanobis distance outlyingness.
John Cartmell (InterDigital LLC)
Methods to Pre-Process Training Data for K-Nearest Neighbors Algorithm
The basic K-nearest neighbor classification algorithm performs a search through the training samples, computing the distance for each training sample from the sample to be classified. Once the distances, are computed the class of the majority of the k closest points is assigned as the classification of the test sample. The training phase of the algorithm is extremely efficient as no pre-processing of the training data is required. However, the phase where test samples are classified is very dear, since for every sample to be classified the entire training class must be traversed. In this paper, we explore three methods that will reduce the number of training samples to be traversed during the classification process. Each method reduces the number of samples in each class by averaging the training samples in each class using different techniques. Therefore, instead of having to compare a test sample against all of the training samples, a test sample is only compared against the reduced set of training samples. Once these methods are described, they are used, along with other classification algorithms, on real data sets to demonstrate their effectiveness from both fidelity and performance standpoints.
Pang Du (Virginia Tech)
Cure Rate Model with Spline Estimated Components
This study proposes a nonparametric estimation procedure for cure rate data based on penalized likelihood method. In some survival analysis of medical studies, there are often long term survivors who can be considered as permanently cured. The goals in these studies are to estimate the cure probability of the whole population and the hazard rate of the non-cured subpopulation. When covariates are present as often happens in practice, to understand covariate effects on the cure probability and hazard rate is of equal importance. The existing methods are limited to parametric and semiparametric models. We propose a two-component mixture cure rate model with nonparametric forms for both the cure probability and the hazard rate function. Identifiability of the model is guaranteed by an additive assumption on hazard rate. Estimation is carried out by an EM algorithm on maximizing a penalized likelihood. For inferential purpose, we apply the Louis formula to obtain point-wise confidence intervals for cure probability and hazard rate. We then evaluate the proposed method by extensive simulations. An application to a melanoma study demonstrates the method.
Polina Khudyakov (Technion - Israel Institute of Technology)
Frailty model of customer patience in call centers
Call centers collect a huge amount of data, and this provides a great opportunity for companies to use this information for the analysis of customer needs, desires, and intentions. This study is dedicated to the analysis of customer patience, defined as the ability to endure waiting for service. This human trait plays an important role in the call center mechanism. Every call can be considered as a possibility to keep or lose a customer, and the outcome depends on the customer's satisfaction and affects the future customer's choice. The assessment of customer patience is a complicated issue because in most cases customers receive the required service before they lose their patience. To estimate the distribution of the patience, we consider all calls with non-zero service time as censored observations. Different methods, for estimating the customer patience, already exist in the literature. Some of these use either the Weibull distribution (Palm, 1953) or the standard Kaplan-Meier product-limit estimator (Brown et al., 2005, JASA, 36-50). Our work is the first attempt to apply frailty models in customer patience analysis while taking into account the possible dependency between calls of the same customer, and estimating this dependency. In this work we first extended the estimation technique of Gorfine et al (2006, Biometrika, 735-741) to address the case of different unspecified baseline hazard functions for each call, in case the customer behavior changes as s/he becomes more experienced with the call center services. Then, we provided a new class of test statistics for testing the equality of the baseline hazard functions. The asymptotic distribution of the test statistics was investigated theoretically under the null and certain local alternatives. We also provided consistent variance estimators. The test statistics properties, under finite sample size, were studied by extensive simulation study and verified the control of Type I error and our proposed sample size calculations. The utility of our proposed estimation technique and the new test statistic is illustrated by the analysis of a call center data of an Israeli commercial company that is processing up to 100,000 calls a day. This is joint work with Prof. M. Gorfine and Prof. P.Feigin.
A Method and application to measurement of service quality: A Multidimensional approach
Evaluation of service quality in a regulated industry is necessary for effective policymaking and fair markets. Service quality, while not entirely in the eyes of the beholder, varies in definition depending on the stakeholder: service provider, customer or regulator. We provide a framework for considering service quality of a regulated industry from multiple perspectives and operationalize the concepts by developing a method for incorporating differing stakeholders' interests. We define measures of relative performance, orientation and cohesion and use them to analyze industry-wide trends over time. We use these measures to study the effect of the 1996 Telecom Act.
|| Plenary Talk
||Grace Wahba (University of Wisconsin-Madison)
The LASSO-Patternsearch Algorithm: Multivariate Bernoulli Patterns of Inputs and Outputs
We describe the LASSO-Patternsearch algorithm, a two or three step procedure whose core applies a LASSO penalized likelihood to univariate Bernoulli response data Y given a very large attribute vector X from a sparse multivariate Bernoulli distribution. Sparsity here means that the conditional distribution of Y given X is assumed to have very few terms, but some may be of higher order (patterns). An algorithm which can handle a very large number (two million) of candidate patterns in a global optimization scheme is given, and it is argued that global methods have a certain advantage over greedy methods in the variable selection problem. Applications to demographic and genetic data are described. Ongoing work on correlated multivariate Bernoulli outcomes including tuning is briefly described.
|| Parallel Sessions
| Pfahl 140
Thomas Bishop (The Ohio State University / NCACI)
Activity at the Nationwide Center for Advanced Customer Insights (NCACI)
The Nationwide Mutual Insurance Company and The Ohio State University have established the Nationwide Center for Advanced Customer Insights. The objective of the center is to conduct applied research to develop customer insights using state of the art predictive modeling, data mining and advanced analytical techniques that improve Nationwide's understanding of customer behavior and consumer purchasing patterns. The center is fully funded by Nationwide and managed by Ohio State. It employs best in class OSU faculty, staff and graduate students from across the University, including faculty and students from the Departments of Marketing, Statistics, Psychology, Economics, Computer Science, and Industrial and Systems Engineering. The center manages applied business research projects involving the application of existing theory and methodologies to solve specific marketing, business and operational problems. It also manages seminal business research projects requiring state of the art research by OSU faculty and graduate students to develop new analytical methodologies. The center offers OSU faculty and graduate students research opportunities and direct access to Nationwide customer and marketing data. Nationwide has agreed to grant OSU researchers the right to publish the research results subject to coding the data to protect confidential information. Our faculty and students work directly with Nationwide executives and staff to solve marketing and business problems important to Nationwide. This presentation will address the genesis for the Center, the strategy for integrating academic, graduate student and corporate research interests aimed at applied research, and several examples of applied research projects that have been completed by the Center.
Yiem Sunbhanich (CACI/Nationwide)
Key Elements for Effective Execution of Applied Statistics in Corporate Environment
The Nationwide Mutual Insurance Company and The Ohio State University have established the Nationwide Center for Advanced Customer Insights. The objective of the center is to conduct seminal and applied business research and develop customer insights using state of the art predictive modeling, data mining and advanced analytical techniques that improve Nationwide's understanding of customer behavior and consumer purchasing patterns. Yiem Sunbhanich is the Executive-in-Residence at the Center. He will share his perspectives and experience on how to effectively transform information into actionable insights in corporate environment. Business case on proactive contact will be presented together with the key elements in making the execution of this proactive contact program successful. Examples of those key elements are problem formulation, scalable insight production process, effective communication, and incentive alignment.
Joseph Verducci (The Ohio State University)
Mining for Natural Experiments
Scientific knowledge has been amassed mostly through scientific experiments. A standard format for these is to create an experimental design, controlling the levels of key variables X to infer a response surface E[Y] = f(X), keeping the values of all potentially confounding variables Z constant throughout the experiment. The resulting knowledge about f(X) generalizes to all contexts with the same value of Z. In data mining, we typically try to find a relationship E[Y] = g(X,Z) that can be cross-validated or validated on a particular external dataset of interest. This severely limits the generalizability of the findings and leads to underestimation of the applicable false discovery rate. This talk suggests a strategy for finding "nuggets" of (subsample, variable subset) pairs that should have greater generalizability than the currently popular methods.
||Model Selection (Invited)
Chong Gu (Purdue University)
Nonparametric regression with cross-classified responses
For the analysis of contingency tables, log-linear models are widely used to explore associations among the marginals. In this talk, we present modeling tools to disaggregate contingency tables along an x-axis and estimate the probabilities of cross-classified y-variables as smooth functions of covariates. Possible correlations among longitudinal or clustered data can be entertained via random effects. A suite of R functions are made available, which incorporates a cohort of techniques including cross-validation, Kullback-Leibler projection, and Bayesian confidence intervals for odds ratios.
Yuhong Yang (University of Minnesota)
Parametric or Nonparametric? An Index for Model Selection
Parametric and nonparametric models are convenient mathematical tools to describe characteristics of data with different degrees of simplification. When a model is to be selected from a number of candidates, not surprisingly, differences occur when the data generating process is assumed to be parametric or nonparametric. In this talk, in a regression context, we will consider the question if and how we can distinguish between parametric and nonparametric situations and discuss feasibility of adaptive estimation to handle both parametric and nonparametric scenarios optimally. The presentation is based on a joint work with Wei Liu.
Ji Zhu (University of Michigan)
Penalized regression methods for ranking variables by effect size, with applications to genetic mapping studies
Multiple regression can be used to rank predictor variables according to their "unique" association with a response variable - that is, the association that is not explained by other measured predictors. Such a ranking is useful in applications such as genetic mapping studies, where one goal is to clarify the relative importance of several correlated genetic variants with weak effects. The use of classical multiple regression to rank the predictors according to their unique associations with the response is limited by difficulties due to collinearities among the predictors. Here we show that regularized regression can improve the accuracy of this ranking, with the greatest improvement occurring when the pairwise correlations among the predictor variables are strong and heterogeneous. Considering a large number of examples, we found that ridge regression generally outperforms regularization using the L1 norm for variable ranking, regardless of whether the true effects are sparse. In contrast, for predictive performance, L1 regularization performs better for sparse models and ridge regression performs better for non-sparse models. Our findings suggest that the prediction and variable ranking problems both benefit from regularization, but that different regularization approaches tend to perform best in the two settings. This is joint work with Nam-Hee Choi and Kerby Shedden.
Jinsong Chen (University of Virginia)
A generalized semiparametric single-index mixed model
The linear model in the generalized linear mixed models is not complex enough to capture the underlying relationship between the response and its associated covariates. We use a single-index model to generalize this model to have the linear combination of covariates enter the model via a nonparametric link function. We call this model a generalized semiparametric single-index mixed model. The marginal likelihood is approximated using the Laplace method. A double penalized quasi-likelihood approach is proposed for estimation. Asymptotic properties of the estimators are developed. We estimate variance components using marginal quasi-likelihood. Simulation and the study of the association between daily air pollutants and daily mortality in various counties of North Carolina are used to illustrate the models and the proposed estimation methodology. This is co-authored with Inyoung Kim (Virginia Tech University) and George R. Terrell (Virginia Tech University).
Bo Kai (College of Charleston)
New Estimation and Variable Selection Methods for Semiparametric Regression Models
In this work, we propose new estimation and variable selection procedures for the semiparametric varying-coefficient partially linear model. We first study quantile regression estimates for this model. To achieve nice efficiency properties, we further develop a semiparametric composite quantile regression (semi-CQR) procedure. We establish the asymptotic normality both the parametric and nonparametric estimates and show that they achieve the best convergence rate. Moreover, we show that the semi-CQR method is much more efficient than the least-squares based method for many non-normal errors and only loses a little efficiency for normal errors. To achieve sparsity with high-dimensional covariates, we propose adaptive penalization methods for variable selection and prove the methods possess the oracle property. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedures. This is a joint work with Runze Li and Hui Zou.
Ganna Leonenko (Swansea University)
Statistical Learning in Semiparametric Models of Remote Sensing: Empirical Divergence and Information Measures, Robust and Minimum Contrast Methods
Estimation of biophysical parameters from satellite data is one of the most challenging problems in remote sensing. We present the statistical leaning results for the radiative transfer model(FLIGHT), which calculates bidirectional reflectance distribution function (BRDF) using Monte Carlo simulation of photon transport and represents complex vegetation structures as well as angular geometry. For statistical learning in semiparametric model the empirical divergence and information measures has been applied. We also investigate a class of robust statistics and minimum contrast estimates. We find that LSE does not work very well for non-linear problem of the type investigated and that estimation of biophsycial parameters can be improved in some cases up to 13%. This talk is based on the joint work with S.Los and P.North.
Jelani Wiltshire (Florida State University)
A general class of test statistics to test for the effect of age of species on their extinction rate
Van Valen's Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen's work, various statistical approaches have been used to address the relationship between taxon duration (age) and the rate of extinction. Some of the more recent approaches to this problem using Planktonic Foraminifera (Foram) extinction data include Weibull and Exponential modeling (Parker and Arnold, 1997), and Cox proportional hazards modeling (Doran et al, 2004,2006). I propose a general class of test statistics that can be used to test for the effect of age on extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead I control for covariate effects by pairing or grouping together similar species. In my presentation I will apply my test statistics to the Foram data and to simulated data sets.
|| Parallel Sessions
||Climatic Applications (Invited)
Lasse Holmstrom (University of Oulu)
Scale space methods in climate research
Statistical scale space analysis aims to find features in the data that appear in different scales, or levels of resolution. Scale-dependent features are revealed by multi-scale smoothing, the idea being that each smooth provides information about the underlying truth at a particular scale. We discuss a Bayesian scale space technique and its application to the study of temperature variation, both past and future. Analysis of past temperatures involves fossil-based reconstructions of post Ice Age climate in northern Fennoscandia where features that appear in different time scales are of interest. Future temperatures, on the other hand, are computer climate model predictions and we seek to establish patterns of warming that appear in different spatial scales.
Cari Kaufman (UC Berkeley)
Functional ANOVA Models for Comparing Sources of Variability in Climate Model Output
Functional analysis of variance (ANOVA) models partition a functional response according to the main effects and interactions of various factors. Motivated by the question of how to compare the sources of variability in climate models run under various conditions, we develop a general framework for functional ANOVA modeling from a Bayesian viewpoint, assigning Gaussian process prior distributions to each batch of functional effects. We discuss computationally efficient strategies for posterior sampling using Markov Chain Monte Carlo algorithms, and we emphasize useful graphical summaries based on the posterior distribution of model-based analogues of the traditional ANOVA decompositions of variance. We present a case study using these methods to analyze data from the Prudence Project, a climate model inter-comparison study providing ensembles of climate projections over Europe.
Tao Shi (The Ohio State University)
Statistical Modeling of AIRS Level 3 Quantization Data
Atmospheric Infrared Sounder (AIRS) has been collecting temperatures, water vapor mass-mixing ratios, and cloud fraction at various atmosphere pressure levels. It generates 35 dimensional vectors at each 45km ground footprint in each satellite path in its level-2 data. The level 3 quantization data (L3Q) summarize valid level-2 data in each 5 degree by 5 degree latitude-longitude grid box during a time period by a set of representative vectors and their associated weights. The specialty of the data set is that the observations are empirical distributions. Most statistical methods are mainly developed for handling datasets whose observations are in R^d. Statistical inference for this type of data is an open problem. We start with the commonly used Mallows distance as a measure of distance between two distributions and build a mixture model on empirical distributions with each component being a Gaussian type distribution. We further fit the model using Data Spectroscopic type of methods for AIRS L3Q data. Finally, we will address some statistical questions such as classification and prediction on AIRS L3Q data. This is joint work with Dunke Zhou (OSU).
||Robust Methods (Invited)
Claudio Agostinelli (Ca' Foscari University)
Local Simplicial Depth
Data depth is a distribution-free statistical methodology for graphical/analytical investigation of data sets. The main applications are a center-outward ordering of multivariate observations, location estimators and some graphical presentations (scale curve, DD-plot). By definition, depth functions provide a measure of centralness which is monotonically decreasing along any given ray from the deepest point. This implies that any depth function is unable to account for multimodality and mixture distributions. To overcome this problem we introduce the notion of Local Depth which generalized the concept of depth. The Local Depth evaluates the centrality of a point conditional on a bounded neighborhood. For example, the local version of simplicial depth is the ordinary simplicial depth, conditional on random simplices whose volume is not greater than a prescribed threshold. These generalized depth functions are able to record local fluctuations of the density function and are very useful in mode detection, identification of the components in a mixture model and in the definition of "nonparametric" distance for performing cluster analysis. We provide theoretical results on the behavior of the Local Simplicial Depth and we illustrate. Finally we discuss the computational problems involved in the evaluation of the Local Simplicial Depth. This is joint work with M. Romanazzi.
Marianthi Markatou (Columbia University)
A closer look at estimators of variance of the generalization error of computer algorithms
We bring together methods from machine learning and statistics to study the problem of estimating the variance of the generalization error of computer algorithms. We study this problem in the simple context of predicting the sample mean as well as in the case of linear and kernel regression. We illustrate the role of the training and test sample size on the performance of the estimators and present a simulation study that exemplifies the characteristics of the derived variance estimators and of those existing in the literature.
Ruben Zamar (University of British Columbia)
Clustering using linear patterns
I will first describe a method called linear grouping algorithm (LGA), which can be used to detect different linear structures in a data set. LGA combines ideas from principal components, clustering methods and resampling algorithms. I will show that LGA can detect several different linear relations at once, but can be affected by the presence of outliers in the data set. I will then present a robustification of LGA based on trimming. Finally, if time allows, I will present partial likelihood extension of LGA that allows for a flexible modelling of linear clusters with different scales.
||Dimension Reduction, Manifold Learning and Graphs (Contributed)
Yuexiao Dong (Temple University)
Nonlinear inverse dimension reduction methods
Many classical dimension reduction methods, especially those based on inverse conditional moments, require the predictors to have elliptical distributions, or at least to satisfy a linearity condition. Such conditions, however, are too strong for some applications. Li and Dong (2009) introduced the notion of the central solution space and used it to modify first-order methods, such as sliced inverse regression, so that they no longer rely on these conditions. In this paper we generalize this idea to second-order methods, such as sliced average variance estimator and directional regression. In doing so we demonstrate that the central solution space is a versatile framework: we can use it to modify essentially all inverse conditional moment based methods to relax the distributional assumption on the predictors. Simulation studies and an application show a substantial improvement of the modified methods over their classical counterparts.
Andrew Smith (University of Bristol)
Nonparametric regression on a graph
The 'Signal plus Noise' model for nonparametric regression can be extended to the case of observations taken at the vertices of a graph. This model includes many familiar regression problems. This talk discusses the use of the edges of a graph to measure roughness in penalized regression. Distance between estimate and observation is measured at every vertex in the L2 norm, and roughness is penalized on every edge in the L1 norm. Thus the ideas of total-variation penalization can be extended to a graph. This presents computational challenges, so we present a new, fast algorithm and demonstrate its use with examples, including denoising of noisy images, a graphical approach that gives an improved estimate of the baseline in spectroscopic analysis, and regression of spatial data (UK house prices).
Minh Tang (Indiana University)
On the relationship between Laplacian eigenmaps and diffusion maps
Laplacian eigenmaps and diffusion maps are two popular techniques for manifold learning. Each of these techniques can be conceived as a technique that constructs a Euclidean configuration of points by graph embedding. If the graph is undirected, then the diffusion map turns out to be an anisotropic scaling of the Laplacian eigenmap.
Johan Van Horebeek (CIMAT)
ANOVA weighted Kernel PCA based on random projections
For datasets with many observations, we show how random projections can be used to perform in an efficient way kernel PCA and such that insight is obtained about variable importance.