
Conference on Nonparametric Statistics and Statistical Learning
Blackwell Inn, The Ohio State University, Columbus, OH
May 19  22, 2010

May 20 (Thursday)
9:0010:00am 
Plenary Talk 
Ballroom 
Ronald Randles (University of Florida)
Robustness of Location Estimators to Distortion
Robustness of affine equivariant location estimators is described relative to a model which creates infinitesimal distortion of a symmetric distribution. The model for distortion is based upon a class of distributions introduced by Fechner (1897). Properties of estimators are compared and contrasted in terms of their sensitivity to this type of distortion. This work is coauthored with Demetris Athienitis.

10:30am12:00pm 
Parallel Sessions 
Ballroom 
Data Mining (Invited)
Lacey Gunter
Variable selection for decision making
I will address the topic of variable selection for decision making with a focus on decisions regarding when to provide treatment and which treatment to provide. Current variable selection techniques were developed for use in a supervised learning setting where the goal is a prediction of the response. These techniques often downplay the importance of interaction variables that have small predictive ability but that are critical when the ultimate goal is decision making rather than prediction. Two new techniques will be proposed which are designed specifically to find variables that aid in decision making. Simulation results are given, along with an application of the methods on data from a randomized controlled trial for the treatment of depression.
Kevin Killourhy (Carnegie Mellon University)
Cherrypicking for complex data: robust structure discovery
Complex data often arise as a superposition of data generated from several simpler models. The traditional strategy for such cases is to use mixture modeling, but it can be problematic, especially in higher dimensions. In this talk, I consider an alternative approach, emphasizing data exploration and robustness to model misspecification. I will focus on a problem in cluster analysis with promising implications for computer security, and I will also consider applications of this strategy to problems in regression and multidimensional scaling. The talk is comprised of work done in collaboration with David L. Banks and Leanna House.
Yichao Wu (North Carolina State University)
Noncrossing largemargin probability estimation and its application to robust SVM via preconditioning
Many largemargin classifiers such as the Support Vector Machine (SVM) sidestep estimating conditional class probabilities and target the discovery of classification boundaries directly. However, estimation of conditional class probabilities can be useful in many applications. Wang, Shen and Liu [J. Wang, X. Shen, Y. Liu, Probability estimation for large margin classifiers, Biometrika 95 (2008) 149167] bridged the gap by providing an interval estimator of the conditional class probability via bracketing. The interval estimator was achieved by applying different weights to positive and negative classes and training the corresponding weighted largemargin classifiers. They propose to estimate the weighted largemargin classifiers individually. However, empirically the individually estimated classification boundaries may suffer from crossing each other even though, theoretically, they should not. In this work, we propose a technique to ensure noncrossing of the estimated classification boundaries. Furthermore, we take advantage of the estimated conditional class probabilities to precondition our training data. The standard SVM is then applied to the preconditioned training data to achieve robustness. Simulations and real data are used to illustrate their finite sample performance.

Pfahl 140 
Rank Based Methods (Invited)
Somnath Datta (University of Louisville)
Rank Tests for Clustered Paired Data When the Cluster Size is Potentially Informative
Rank based tests are alternatives to likelihood based tests popularized by their relative robustness and underlying elegant mathematical theory. There has been a serge in research activities in this area in recent years since a number of researchers are working to develop and extend rank based procedures to clustered dependent data which include situations with known correlation structures (e.g., as in mixed effects models) as well as more general form of dependence. In this talk, we consider the problem of testing the symmetry of a marginal distribution of paired differences under clustered data. However, unlike most other work in the area, we consider the possibility that the cluster size is a random variable that is statistically dependent on the variable of interest within a cluster. This situation typically arises when the clusters are defined in a natural way (e.g, not determined by the experimenter) and in which the size of the cluster may carry information about the distribution of data values within a cluster.
Denis Larocque (HEC Montreal)
Nonparametric Methods and Trees for Multiple Mixed Outcomes
Over the years, many nonparametric methods for multivariate outcomes have been proposed. These include coordinatewise and spatial rank and sign methods and others based on different notions of depth. But these methods are aimed at the situation where all outcomes are continuous. In this talk, we will present new methods specifically designed for mixed outcomes, where some outcomes are categorical and others are continuous. In particular, we will discuss the twosample testing problem with mixed outcomes and also present a recursive partitioning method for mixed outcomes. Parts of this talk are based on joint work with François Bellavance (HEC Montreal), Abdessamad Dine (HEC Montreal), Jaakko Nevalainen (University of Turku) and Hannu Oja (University of Tampere).
Davy Paindaveine (Université Libre de Bruxelles)
Rank tests for principal component analysis
We construct parametric and rankbased optimal tests for eigenvectors and eigenvalues of covariance or scatter matrices in elliptical families. The parametric tests extend the Gaussian likelihood ratio tests of Anderson (1963) and their pseudoGaussian robustifications by Davis (1977) and Tyler (1981, 1983), with which their Gaussian versions are shown to coincide, asymptotically, under Gaussian or finite fourthorder moment assumptions, respectively. Such assumptions however restrict the scope to covariancebased principal component analysis. The rankbased tests we are proposing remain valid without such assumptions. Hence, they address a much broader class of problems, where covariance matrices need not exist and principal components are associated with more general scatter matrices. Asymptotic relative efficiencies moreover show that those rankbased tests are quite powerful; when based on van der Waerden or normal scores, they even uniformly dominate the pseudoGaussian versions of Anderson's procedures. The tests we are proposing thus outperform daily practice both from the point of view of validity as from the point of view of efficiency. The main methodological tool throughout is Le Cam's theory of locally asymptotically normal experiments, in the nonstandard context, however, of a "curved" parametrization. This is joint work with Marc Hallin (Université Libre de Bruxelles) and Thomas Verdebout (Université Lille III).

Pfahl 202 
Variable Selection (Contributed)
Chris Hans (The Ohio State University)
Penalized Regression via Orthant Normal Priors
Motivated by penalized optimization approaches to variable selection and parameter estimation, this paper introduces a new class of prior distributions  the orthant normal distribution  for the regression coefficients in a Bayesian regression model. Parameter estimates based on penalized optimization are often interpreted as the mode of a Bayesian posterior distribution. We show that the orthant normal distribution is the prior that gives rise to the elastic net estimate and, in a limiting case, the lasso. By providing a complete characterization of this prior, we allow for modelbased inference that moves beyond exclusive use of the posterior mode, including coherent Bayesian prediction and formal Bayesian model comparison. In contrast to penalized optimization procedures (where the penalty parameter is often selected via a potentially unstable cross validation), the Bayesian approach allows for uncertainty about these parameters to be included in the model, or, alternatively, allows the parameters to be selected via the method of maximum marginal likelihood. We show that the orthant normal distribution has a scalemixture of normals representation, providing additional insight into the particular form of shrinkage employed by the elastic net. Posterior inference is achieved via MCMC. This modelbased approach to elastic net regression has the advantage that the basic model can be extended to accommodate more complex regression settings. Models can be built that include random effects to capture various covariance structures while at the same time inducing elasticnetlike shrinkage on the regression coefficients. We discuss approaches for incorporating prior information about dependence structure in the covariates that resemble Zellner's gprior but that allow for lassolike shrinkage.
Woncheol Jang (University of Georgia)
Hexagonal Operator for Regression with Shrinkage and Equality Selection: HORSES
We propose a new method called HORSES (Hexagonal Operator for Regression with Shrinkage and Equality Selection) which performs variable selection for regression with positively correlated predictors. Like other penalized approaches, the HORSES estimator can be computed via a constrained leastsquares problem. Our penalty terms compromise between the L1 penalty for the coefficients and another L1 penalty for pairwise differences of coefficients. This is joint work with Johan Lim.
Xingye Qiao (University of North Carolina at Chapel Hill)
Pairwise Variable Selection for Classification
While traditional marginal variable selection methods have the merits of convenient implementation and good interpretability, they do not take the joint effects among variables into account. In some situations, variables which have strong joint effects can be passed over by marginal methods because of their small marginal effects. In the context of binary classification in supervised learning, we develop a novel method of pairwise variable selection, based on a withinclass permutation test to evaluate the statistical significance of joint effects. Moreover, we introduce a new notion of variable selection quality, bivariate False Discovery Rate (biFDR), and provide an estimation procedure for biFDR. A simulated example and a real data application are analyzed to demonstrate the usefulness of the proposed approach. This is a joint work with Yufeng Liu and J. S. Marron.
Kukatharmini Tharmaratnam (Katholieke Universiteit Leuven)
Robust version of the AIC based on M, S and MM estimators for variable selection in semiparametric mixed models
Variable selection in the presence of outliers may be performed by using a robust version of Akaike's information criterion AIC. In the first part explicit expressions are obtained for such criteria when S and MM estimators are used. In addition, a version of AIC based on robust quasilikelihood Mestimation is included. The performance of these criteria is compared to the existing AIC based on M estimators and the classical nonrobust AIC. In the second part we consider semiparametric models fitted by robust penalized regression splines using a mixed model representation. We develop a robust AIC to select both parametric and nonparametric components in such semiparametric mixed models and compare with a nonrobust AIC. This work is coauthored by K. Tharmaratnam and G. Claeskens.

1:302:30pm 
Plenary Talk 
Ballroom 
David Madigan (Columbia University)
Statistical Methods in Drug Safety
The pharmaceutical industry and regulatory agencies rely on various data sources to ensure the safety of licensed drugs. Recent high profile drug withdrawals have led to increased scrutiny of this activity. Many statistical challenges arise in this context. This talk will describe some of these data sources and the challenges they present, focusing especially on newer largescale data analyses.

2:454:15pm 
Parallel Sessions 
Ballroom 
Federal Statistics (Invited)
John Eltinge (U.S. Bureau of Labor Statistics)
Three Classes of Open Questions in the Application of Nonparametric Regression and Machine Learning Methods to Sample Surveys
This paper reviews standard approaches to the use of auxiliary data in survey sampling, and then outlines three areas for potential extensions based on nonparametric regression and machine learning methods. (1) Diagnostics for sample design and weighting. (2) Integration of sample survey data with large amounts of administrativerecord data. (3) Disclosure limitation. We consider these issues in the context of both standard inference for univariate estimands, and more realistic settings that involve a large number of estimands and a large number of stakeholders. These issues lead to some extensions of largesample and smalldeviation approximation methods for complex survey data. We explore these topics in the context of the U.S. Consumer Expenditure Survey.
Leming Shi (U.S. Food and Drug Administration)
Personalized Medicine: Genomics, Bioinformatics, and the FDAled MAQC Project
Personalized medicine depends on reliable tools in genomics and bioinformatics. The MicroArray Quality Control (MAQC) project was originally launched by the US Food and Drug Administration (FDA) in 2005 to address concerns about the reliability of microarray technologies as well as bioinformatic data analysis issues (http://edkb.fda.gov/MAQC/). The first phase of MAQC (MAQCI) evaluated technical performance of various microarray gene expression platforms and assessed advantages and limitations of competing data analysis methods for identifying differentially expressed genes or potential biomarkers (http://www.nature.com/nbt/focus/maqc/). MAQCII aimed to reach consensus on "best practices" of developing and validating microarraybased predictive models for preclinical and clinical applications such as the prediction of outcomes of patients with breast cancer, multiple myeloma, or neuroblastoma. MAQCIII (SEQC) is evaluating technical performance and addressing bioinformatic challenges of nextgeneration sequencing in transcriptome and exome analyses. The MAQC project is expected to enhance our capabilities of understanding, predicting, and preventing serious adverse drug reactions via patientspecific genomic information (MAQCIV or PADRE), helping FDA fulfill its mission of protecting consumers and promoting public health. Disclaimer: Views expressed in this presentation are those of the presenter and not necessarily those of the US FDA.
William Winkler (Census Bureau)
Machine Learning for Record Linkage, Text Categorization, and Edit/Imputation
Machine learning methods have been applied in statistical agencies. The initial application was using the EM algorithm for naïve Bayes and general Bayesian networks to obtain 'optimal' record linkage parameters without training data. The methods were used for production software during three Decennial Censuses. Optimal parameters vary significantly across approximately 500 regions for the U.S. and reduce clerical review by 2/3 in comparison with crude but knowledgeable guesses of parameters. A minor modification of the record linkage model can be used for semisupervised learning for text categorization and extended to a generalization of boosting in which better models (general Bayes networks) involving increasing amounts of interactions between terms are learned. Finally, similar theory and the same computational algorithms (that are as much as 100 times as fast as algorithms in commercial software) can be adapted for learning edit/imputation models that account for edits (i.e., structural zeros such as a child of less than 16 cannot be married) and preserve joint distributions in a principled manner. Because the models are a complete probability structure, imputation and estimation of imputation variance are straightforward using variants of the modeling algorithms.

Pfahl 140 
Statistical Learning (Invited)
Chunming Zhang (University of WisconsinMadison)
HighDimensional Regression and Classification Under A Class of Convex Loss Functions
We investigate applications of the adaptive Lasso to highdimensional models for regression and classification under a wide class of loss functions. We show that for the dimension growing nearly exponentially with the sample size, the resulting adaptive Lasso estimator possesses the oracle property for suitable weights. Moreover, we propose two methods, called CR and PCR, for estimating weights. Theoretical advantages of PCR over CR are analyzed. In addition, the adaptive Lasso classifier is shown to be consistent. Simulation studies demonstrate the advantage of PCR over CR in both regression and classification. The effectiveness of the proposed method is illustrated using real data sets.
Yongdai Kim (Seoul National University)
On model selection criteria for high dimensional models
I will talk about model selection criteria for high dimensional regression models where the number of covariates is much larger than the sample size. I will give a class of model selection criteria which are consistent. Also, I will discuss about the minimax optimality of various model selection criteria on high dimensions.
Yufeng Liu (University of North Carolina at Chapel Hill)
BiDirectional Discrimination
Linear classifiers are very popular, but can suffer some serious limitations when the classes have distinct subpopulations. General nonlinear classifiers can give improved classification error rates, but do not give clear interpretation of the results. In this talk, we propose the BiDirectional Discrimination (BDD) classification method which generalizes the classifier from one hyperplane to two hyperplanes. This gives much of flexibility of a general nonlinear classifier while maintaining the interpretability of linear classifiers. The performance and usefulness of the proposed method are assessed using asymptotics, and demonstrated through analysis of simulated and real data. This talk is based on joint work with Hanwen Huang and J. S. Marron.

Pfahl 202 
Nonparametric Tests (Contributed)
Stephen Bamattre (The Ohio State University)
Temporal stability of association between two variables within an enduring subpopulation
The taupath is a technique to detect monotone associations between a pair of variables in an unspecified subpopulation. Using the Mallows model for rankings, the method is extended to estimate the temporal stability of observed subpopulations. This procedure is applied to a marketing data set from Nationwide Insurance to discover pairs of variables for which over time there is a stable association within an enduring subpopulation. Examples include the screening of predictor variables for use in geographically targeted models, without fixing the regions of interest beforehand.
Dean Barron (Twobluecats.com)
A two sample test based on rotationally superimposable permutations
Weaknesses in Kolmogoroff Smirnoff (KS) have been described; data which exploits these may result in incorrect statistical conclusions. For the two sample case, when all the observations from the first population appear consecutively and are ranked lowest, KS is statistically significant (D = 1, n>= 8). However, when these are ranked in the middle, KS is not statistically significant
(D~0.5, n<= 32). Intuitively, though, both situations reflect different underlying populations. An approach that addresses this, pawprint (PP), is introduced, which groups together rotationally superimposable permutations. The maximum KS value found within each group replaces its individual KS values, forming a new table with different critical values. Samples are drawn from a dataset to contrast and illustrate.
Yanling Hu (University of Kentucky)
Censored Empirical Likelihood with Overdetermined Hazardtype Constraints
Qin and Lawless (1994) studied the empirical likelihood method with estimating equations. They obtained very nice asymptotic properties especially when the number of estimation equations are larger than the number of parameters (over determined case). We study here a parallel setup to Qin and Lawless but uses a hazardtype empirical likelihood function and hazardtype estimating equations. The advantage of using hazard is that censored data can be handled easily. We obtained similar asymptotic results for the maximum empirical likelihood estimator and the empirical likelihood ratio test, also for the over determined case. Two examples are provided to demonstrate the potential application of the result.
Mohamed Mahmoud (Al Azhar University)
Nonparametric Testing for Exponentiality Against NBRUE Class of life Distributions Based on Laplace Transform
The main theme of this paper is to proposed a new test for exponentiality against new better than renewal used in expectation (NBRUE) based on Laplace transform. The asymptotic property of this test is studied and the Pitman's asymptotic efficiencies of it for three alternatives are calculated and compared with other tests for exponentiality. The critical values of this test are also calculated and tabulated for sample size n = 5(1) 50 as well as its power is estimated for some alternatives, which are used in reliability, using simulation study. Finally the test is applied to some real data. This work is coauthored by M. A. W. Mahmoud and M. H. S. AlLoqmani.

4:305:30pm 
Plenary Talk 
Ballroom 
Michael Jordan (University of California, Berkeley)
Completely Random Measures, Hierarchies, and Nesting in Bayesian Nonparametrics
Bayesian nonparametric modeling and inference are based on using general stochastic processes as prior distributions. Despite the great generality of this definition, the great majority of the work in Bayesian nonparametrics is based on only two stochastic processes: the Gaussian process and the Dirichlet process. Motivated by the needs of applications, I present a broader approach to Bayesian nonparametrics in which priors are obtained from a class of stochastic processes known as "completely random measures" (Kingman, 1967). In particular I will present models based on the beta process, the Bernoulli process, the gamma process and the Dirichlet process, and on hierarchical and nesting constructions that use these basic stochastic processes as building blocks. I will discuss applications of these models to several problem domains, including protein structural modeling, computational vision, natural language processing and statistical genetics.

May 21 (Friday)
8:3010:00am 
Parallel Sessions 
Ballroom 
Nonparametric Bayes Methods (Invited)
Wesley Johnson (UC Irvine)
Bayesian Semiparametric Methods in Biostatistics: Selective Update
We review some recent developments in the application of Bayesian nonparametric methodology to semiparametric problems in the areas of receiver operating characteristic curve estimation, survival analysis, modeling longitudinal data and jointly modeling longitudinal and survival data. We begin with a brief review of Mixtures of Polya Trees and Dirichlet Process Mixtures, followed by illustrations based on real data. An emphasis is given to selecting among classes of semiparametric models eg. in survival analysis with time dependent covariates, we may wish to choose among proportional hazards, proportional odds and Cox and Oaks accelerated failure time models.
Luis NietoBarajas (ITAM)
A Markov gamma random field for modeling respiratory infections in Mexico
In this talk we present a Markov gamma random field prior for modeling relative risks in disease mapping data. This prior process allows for a different dependence effect with different neighbors. We describe the properties of the prior process and derive posterior distributions. The model is extended to cope with covariates and a data set of respiratory infections of children in Mexico is used as an illustration.
Marina Vannucci (Rice University)
Spiked Priors for HighDimensional Data
This talk will address parametric and nonparametric prior models for variable selection in highdimensional settings. Linear models and generalized settings that allow for nonlinear interactions will be considered. Inferential strategies will be discussed. Applications will be to simulated data and real data with a large number of variables.

Pfahl 140 
Statistical Learning (Invited)
Robert Krafty (University of Pittsburgh)
Canonical Correlation Analysis of Spectral and Multivariate CrossSectional Data
In many studies, stationary time series data and crosssectional outcomes are collected from several independent units. Often the primary goal of the study to quantify the association between the crosssectional outcomes and the second order spectral properties of the time series. This article addresses this question by introducing a data driven procedure for performing a canonical correlation analysis (CCA) between the logspectra and crosssectional outcomes. The isometry between the Hilbert space of linear combinations of a second order stochastic process and the reproducing kernel Hilbert space generated by its covariance kernel allows for a formulation of CCA whose canonical correlates and weight functions can be estimated via estimates of the covariance kernel of the logspectra and crosscovariance kernel between the logspectra and crosssectional data. A penalized Whittlelikelihood based procedure is offered for obtaining methodofmoments type estimates of the mean logspectra, the covariance kernel of the logspectra, and the crosscovariance kernel. A new criterion for the selection of smoothing parameters to optimally estimate the linear relationship between the logspectra and crosssectional outcomes is introduced. This criterion minimizes the conditional KullbackLeibler distance between the unitspecific logspectra and the best linear unbiased predictors of the unitspecific logspectra from the crosssectional outcomes and logperiodograms under the estimated covariance structure. The proposed CCA procedure is used to analyze the association between the heart rate variability power spectrum during sleep and multiple measures of sleep.
Hernando Ombao (Brown University)
Functional Connectivity as a Potential Biomarker for Classification
In this talk, we will discuss models that use functional connectivity as a potential biomarker for classification. This work is motivated by the HAND experiment where participants moved the joystick either to the left or to the right upon instruction. The goal is determine the timefrequency network in the multichannel EEG signals that could discriminate between left and right movements and also predict or classify future movements based on a singletrial multichannel EEG. We first enumerate some potential measures of connectivity in a brain network, namely, partial coherence and mutual information. Next, we discuss methods for estimating the network. One of the key statistical challenges is that partial coherence estimates are typically obtained by inverting the spectral density matrix which may be nearsingular especially when the time series in the network exhibit a high degree of crosscorrelation. To avoid numerical instability, we estimate the spectral density matrix via a shrinkage procedure which is a weighted average of an initial periodogram estimator and a simple parametric estimator (e.g., based on the vector AR model). The shrinkage estimator is more computationally stable than the classical smoothed periodogram and gives a lower meansquared error than the multitaper method and kernelsmoothing approaches. The method will be applied to EEGs recorded during a visuomotor experiment.
Raquel Prado (UC Santa Cruz)
Models and algorithms for online detection of cognitive fatigue
This work is motivated by the analysis of multiple brain signals recorded during an experiment that aimed to characterize mental fatigue in real time. The recorded brain signals can be modeled via mixtures of autoregressive (AR) processes and statespace autoregressions with structured priors on the AR coefficients. Such prior structure allows researchers to incorporate scientifically meaningful information related to various states of mental alertness. We focus on the implementation of sequential Monte Carlo methods for online parameter learning and filtering. We illustrate how the ARbased models can be used to describe electroencephalographic signals recorded from a subject who performed basic arithmetic calculations continuously for a period of three hours.

Pfahl 202 
Density Estimation (Contributed)
José E. Chacón
(Departamento de Matematicas, Universidad de Extremadura)
Unconstrained bandwidth matrices for multivariate kernel estimation of the density and density derivatives
Multivariate kernel estimation is an important technique in exploratory data analysis. The crucial factor which determines the performance of kernel estimation is the bandwidth matrix. Research in finding optimal bandwidth matrices began with restricted parametrizations of the bandwidth matrix which mimic univariate selectors. Progressively these restrictions were relaxed to develop more flexible procedures. A major obstacle for progress has been the intractability of the matrix analysis when treating higher order multivariate derivatives. With an alternative vectorization of these higher order derivatives, these mathematical intractabilities can be surmounted in an elegant and unified framework. In this paper we present some recent advances on the use of unconstrained bandwidth matrices for multivariate kernel estimation of the density and density derivatives.
Catherine Forbes (Monash University)
NonParametric Estimation of Forecast Distributions in NonGaussian State Space Models
This paper provides a methodology for the production of nonparametric estimates of forecast distributions, in a general nonGaussian, nonlinear state space setting. The transition densities that define the evolution of the dynamic state process are represented in closed parametric form, with the conditional distribution of the measurement error variable estimated nonparametrically. The requisite recursive filtering and prediction distributions are computed as functions of the unknown conditional error. The method is illustrated in the context of several financial models with a particular focus on the production of sequential, real time forecast distributions for volatility. This work is coauthored by Jason Ng, Catherine S. Forbes, Gael M. Martin and Brendan P.M. McCabe.
Alexandre Leblanc (University of Manitoba)
On the Boundary Effects of Bernstein Polynomial Estimators of Density and Distribution Functions
For density and distribution functions supported on [0,1], Bernstein polynomial estimators are known to have optimal Mean Integrated Squared Error (MISE) properties under the usual smoothness conditions on the function to be estimated. These estimators are also known to be wellbehaved in terms of bias, as they exhibit no boundary bias. In this talk, we will discuss the fact that these estimators nevertheless do experience boundary effects. However, these boundary effects are of a different nature than what is seen, for example, with usual kernel estimators.
Leming Qu (Boise State University)
Copula density estimation by wavelet domain penalized likelihood with linear equality constraints
A copula density is the joint probability density function (PDF) of a random vector with uniform marginals. An approach to bivariate copula density estimation is introduced that is based on a maximum penalized likelihood estimation (MPLE) with penalty term being the L1 norm of the density's wavelet coefficients. The marginal unity and symmetry constraints for copula density are enforced by linear equality constraints. The L1MPLE subject to linear equality constraints is solved by an iterative algorithm. A datadriven selection of the regularization parameter is discussed. Simulation and real data application show the effectiveness of the proposed approach.

10:3011:30am 
Plenary Talk 
Ballroom 
Peter Muller (M.D. Anderson Cancer Center)
Bayesian Clustering with Regression
We propose a model for covariatedependent clustering, i.e., we develop a probability model for random partitions that is indexed by covariates. The motivating application is inference for a clinical trial. As part of the desired inference we wish to define clusters of patients. Defining a prior probability model for cluster memberships should include a regression on patient baseline covariates. We build on product partition models (PPM). We define an extension of the PPM to include the desired regression. This is achieved by including in the cohesion function a new factor that increases the probability of experimental units with similar covariates to be included in the same cluster.

1:002:30pm 
Parallel Sessions 
Ballroom 
Rank Set Sampling (Invited)
Johan Lim (Seoul National University)
A kernel density estimator for the ranked set samples
In this paper, we study a kernel density estimator for the ranked set samples. We derive the asymptotic bias and variance of the estimator and find the optimal bandwidth that minimizes the integrated mean squared error (IMSE). We propose a leaveoneout cross validation procedure to find the bandwith in practice. We numerically investigate the performance of the proposed kernel estimator. We further extend the proposed methodology to estimate a symmetric density. Finally, our method is applied to estimating the density of tree data published in the pervious literature. This work is coauthored by Johan Lim, Min Chen and Sangun Park.
Kaushik Ghosh (University of Nevada  Las Vegas)
A unified approach to variations of ranked set sampling
In this talk, we develop a general theory of inference using data collected from different variations of ranked set sampling. Such variations include balanced and unbalanced ranked set sampling, balanced and unbalanced ktuple ranked set sampling, nomination sampling, simple random sampling, as well as a combination of them. We provide methods of estimating the underlying distribution function as well as its functionals and establish the asymptotic properties of the resulting estimators. The results so obtained can be used to develop nonparametric procedures for one and twosample problems. We also investigate smallsample properties of these estimators and conclude with an application to a reallife example.
Xinlei Wang (Southern Methodist University)
Isotonized Estimators for Judgment Poststratification Samples
Judgment poststratification (JPS) is a data collection method introduced by MacEachern, Stasny and Wolfe (2004), based on ideas similar to those in ranked set sampling. In this research, for JPS data, we propose isotonized estimators of the mean and cumulative density function (CDF) of a population of interest, which exploit the fact that the distributions of the judgment poststrata are often stochastically ordered. Further for JPS data with small sample sizes, we deal with the problem of empty cells, and propose modified isotonized estimators of the CDF. All these new estimators are examined by simulation studies and illustrated with data examples.

Pfahl 140 
Ranking Procedures (Invited)
Carlos Guestrin (Carnegie Mellon University)
Riffled Independence for Ranked Data
Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called riffled independence, encompassing a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the riffle shuffle, common in card games, to combine the two permutations to form a single permutation. Within the context of ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. In this talk, we provide a formal introduction to riffled independence and present algorithms for using riffled independence within Fouriertheoretic frameworks which have been explored by a number of recent papers. Additionally, we propose an automated method for discovering sets of items which are riffle independent from a training set of rankings. We show that our clusteringlike algorithms can be used to discover meaningful latent coalitions from real preference ranking datasets and to learn the structure of hierarchically decomposable models based on riffled independence. This talk is joint work with Jonathan Huang.
Paul Kidwell (Lawrence Livermore National Laboratory)
A kernel density estimate for the probabilities of rankings with ties and missing items
Ranking data is frequently encountered and is not easily modeled due to the issues of ties or missing data. Previous modeling efforts have established nonparametric kernel estimation as an effective tool for modeling rankings. A discrete analogue to the triangular kernel is developed which through its combinatoric and statistical properties allows the nonparametric approach to be efficiently applied in the case of ties and extended to missing data. This approach readily extends to a scheme for visualization of ranking data which is intuitive, easy to use, and computationally efficient.
Guy Lebanon (Georgia Institute of Technology)
Visualizing Similarities between Search Engines using the Weighted Hoeffding Distance on Permutations
We explore the use of multidimensional scaling in visualizing relationships between different search engines, and between different search strategies employed by users. In the talk we will discuss the appropriateness of different metrics for this task and present some experimental results using some well known search engines.

Pfahl 202 
Sparse Estimation (Contributed)
Bin Li (Louisiana State University)
Robust and Sparse Bridge Regression
It is known that when there are heavytailed errors or outliers in the response, the least squares methods may fail to produce a reliable estimator. In this paper, we proposed a generalized Huber criterion which is highly flexible and robust for large errors. We applied the new criterion to the bridge regression family, called Robust and Sparse Bridge Regression (RSBR). However, to get the RSBR solution requires solving a nonconvex minimization problem, which is a computational challenge. On the basis of recent advances in difference convex programming, coordinate descent algorithm and local linear approximation, we provide an efficient computational algorithm that attempts to solve this nonconvex problem. Numerical examples show the proposed RSBR algorithm performs well and suitable for largescale problems.
Philippe Rigollet (Princeton University)
Optimal rates of sparse estimation and universal aggregation
A new procedure called "Exponential screening" (ES) is developed and proved to satisfy a set optimal sparsity oracle inequalities for Gaussian regression. These oracle inequalities entail not only adaptation to sparsity but also show that ES solves simultaneously and optimally all the aggregation problems previously studied. Even though the procedure is simple, its implementation is not straightforward but it can be approximated using the Metropolis algorithm, which results in a stochastic greedy algorithm and performs surprisingly well in a simulated problem of sparse recovery.
Adam Rothman (University of Michigan)
Sparse estimation of a multivariate regression coefficient matrix
We propose a procedure for constructing a sparse estimator of a multivariate regression coefficient matrix that accounts for correlation of the response variables. This method, which we call multivariate regression with covariance estimation (MRCE), involves penalized likelihood with simultaneous estimation of the regression coefficients and the covariance structure. An efficient optimization algorithm and a fast approximation are developed for computing MRCE. Using simulation studies, we show that the proposed method outperforms relevant competitors when the responses are highly correlated. We also apply the new method to a finance example on predicting asset returns.
Jeffrey Simonoff (New York University)
REEM Trees: A New Data Mining Approach for Longitudinal Data
Longitudinal data refer to the situation where repeated observations are available for each sampled individual. Methodologies that take this structure into account allow for systematic differences between individuals that are not related to covariates. A standard methodology in the statistics literature for this type of data is the random effects model, where these differences between individuals are represented by socalled "effects" that are estimated from the data. This paper presents a methodology that combines the flexibility of treebased estimation methods with the structure of random effects models for longitudinal data. We apply the resulting estimation method, called the REEM tree, to pricing in online transactions, showing that the REEM tree provides improved predictive power compared to linear models with random effects and regression trees without random effects. We also perform extensive simulation experiments to show that the estimator improves predictive performance relative to regression trees without random effects and is comparable or superior to using linear models with random effects in more general situations, particularly for larger sample sizes. This is joint work with Rebecca J. Sela.

2:453:45pm 
Plenary Talk 
Ballroom 
David Banks (Duke University)
How We Got Here  The Rise of Data Mining
Modern data mining is the child of statistics and computer science, with database management serving as the midwife. From the statistical side, much of the initial motivation derived from the philosophy of nonparametrics. From the computer science side, much of the impetus came from the interest in artificial intelligence. This talk reviews the interactions between these perspectives, describing the key developments that shaped the course of this emerging field.

4:005:30pm 
Parallel Sessions 
Ballroom 
Nonparametric Bayes Methods (Invited)
Purushottam Laud (Medical College of Wisconsin)
A Dirichlet Process Mixture Model Allowing for Mode of Inheritance Uncertainty in Genetic Association Studies
A desirable model for use in genetic association studies simultaneously considers the effect of all genetic markers and covariates. This invariably requires considering a large number of genetic markers, most of which are unrelated to the phenotype. Moreover, at each marker, the model should allow a variety of modes of inheritance: namely, additive, dominant, recessive, or overdominant effects. MacLehose and Dunson (2009) have described a flexible multiple shrinkage approach to highdimensional model building via Bayesian nonparametric priors. The use of these priors facilitates datadriven shrinkage to a random number of random prior locations. Adapting such techniques, we develop Bayesian semiparametric shrinkage priors at two levels that allow datadriven shrinkage towards the various inheritance modes and, within each mode, shrinkage towards a random number of random effect sizes. The proposed method offers a natural way of incorporating into the inference the uncertainty in the mode of inheritance at each marker. We illustrate the proposed method on simulated data based on the International HapMap Project.
Steven MacEachern (The Ohio State University)
Regularization and casespecific parameters
Statisticians have long used casespecific parameters as a device to remove outlying and influential cases from an analysis. Decisions on inclusion of the parameters have traditionally been made on the basis of the size of the residual. The rise of regularization methods allows us to approach casespecific analysis in a different fashion. To exploit the power of regularization, we augment the "natural" covariates in a problem with an additional indicator for each case in the data set. We attach a penalty term for these casespecific indicators which is designed to produce a desired effect. For regression methods with squared error loss, an L1 penalty produces a regression which is robust to outliers and high leverage cases; for quantile regression methods, an L2 penalty decreases the variance of the fit enough to overcome an increase in bias. The paradigm thus allows us to robustify procedures which lack robustness and to increase the efficiency of procedures which are robust. We provide a general framework for the inclusion of casespecific parameters in regularization problems, provide new insight into existing techniques (specifically, Huber's robust regression), and illustrate the benefits of the new methodology. This is joint work with Yoonkyung Lee and Yoonsuh Jung.
Fernando Quintana (Pontificia Universidad Católica de Chile)
Bayesian Nonparametric Longitudinal Data Analysis with Embedded Autoregressive Structure: Application to Hormone Data
We develop a novel Dirichlet Process Mixture model for irregular longitudinal data. The model mixes on the two parameters of the traditional OrnsteinUhlenbeck process with exponential covariance function and thus allows for the possibility of multiple groups with distinct autoregressive covariance structure. We illustrate the use of the model to track hormone curve data through the menopausal transition, and we also test the model on simulated data, both to check its performance in estimating mean functions as well as a variety of covariance structures.

Pfahl 140 
Statistical Learning (Invited)
Ejaz Ahmed (University of Windsor)
Absolute Penalty and Shrinkage Estimation in Partially Linear Models
In this talk we address the problem of estimating a vector of regression parameters in a partially linear model. Our main objective is to provide natural adaptive estimators that significantly improve upon the classical procedures in the situation where some of the predictors are nuisance variables that may or may not affect the association between the response and the main predictors. In the context of two competing regression models (full and submodels), we consider shrinkage estimation strategy. The shrinkage estimators are shown to have higher efficiency than the classical estimators for a wide class of models. We develop the properties of these estimators using the notion of asymptotic distributional risk. Further, we proposed absolute penalty type estimator (APE) for the regression parameters which is an extension of the LASSO method for linear models. The relative dominance picture of the estimators are established. Monte Carlo simulation experiments are conducted and the nonparametric component is estimated based on kernel smoothing and Bspline. Further, the performance of each procedure is evaluated in terms of simulated mean squared error. The comparison reveals that the shrinkage strategy performs better than the APE (LASSO) strategy when, and only when, there are many nuisance variables in the model. We conclude this talk by applying the suggested estimation strategies on a real data set which illustrates the usefulness of procedures in practice. This is joint work with K. Doksum and E. Raheem.
Liza Levina (University of Michigan)
Community extraction and network perturbations
Analysis of networks and in particular discovering communities within networks has been a focus of recent work in several fields, with applications ranging from citation and friendship networks to food webs and gene regulatory networks. Most of the existing community detection methods focus on partitioning the entire network into communities, with the expectation of many ties within communities and few ties between. However, in a real network there are often nodes that do not belong to any of the communities, and forcing every node into a community can distort results. Here we propose a new framework that focuses on community extraction instead of partition, extracting one community at a time. We show that the new criterion performs wells on simulated and real networks, and establish asymptotic consistency of our method under the block model assumption. In the second part of the talk, I will briefly describe a method for assessing the quality of community detection by its robustness to random network perturbations. The first part of the talk is joint work with Ji Zhu and Yunpeng Zhao (Statistics, University of Michigan); the second part is joint work with Mark Newman and Brian Karrer (Physics, University of Michigan).
Ming Yuan (Georgia Institute of Technology)
Sparse Regularization for High Dimensional Additive Models
We study the behavior of the l1 type of regularization for high dimensional additive models. Our results suggest remarkable similarities and differences between linear regression and additive models in high dimensional settings. In particular, our analysis indicates that, unlike in linear regression, l1 regularization does not yield optimal estimation for additive models of high dimensionality. This surprising observation prompts us to introduce a new regularization technique that can be shown to be optimal in the minimax sense.

Pfahl 202 
Robust Statistics (Contributed)
Richard Charnigo (University of Kentucky)
Nonparametric Derivative Estimation and Posterior Probabilities for Nanoparticle Characteristics
The characterization of nanoparticles from surface wave scattering data is of great interest in applied engineering because of its potential to advance nanoparticlebased manufacturing concepts. Meanwhile, a recent development in methodology for the nonparametric estimation of a mean response function and its derivatives has provided a valuable tool for nanoparticle characterization: namely, a mechanism to identify the most plausible configuration for a collection of nanoparticles given the estimated derivatives of surface wave scattering profiles from those nanoparticles. In this talk, after briefly reviewing the preceding work, we propose an extension that additionally furnishes posterior probabilities for the various possible configurations of nanoparticles. An empirical study is included as a demonstration. This is collaborative work with Mathieu Francoeur, Patrick Kenkel, M. Pinar Menguc, Benjamin Hall, and Cidambi Srinivasan.
Juan A. CuestaAlbertos (Universidad de Cantabria)
Similarity of Distributions and Impartial Trimming
We say that two probabilities are similar at level c if they are contaminated versions (up to an c fraction) of the same common probability. In this talk we show how a datadriven trimming aimed to maximize similarity between distributions can be used to decide if two samples were obtained from two distributions which are similar at level c, based on the fact that the empirical distributions present an over (under)fitting effect in the sense that trimming more (less) that the similarity level results in trimmed samples which are much closer (farther) than expected to each other. We provide illustrative examples and give some asymptotic results to justify the use of this methodology in applications. This is a joint work with Profs. P. AlvarezEsteban, E. del Barrio and C. Matran from Universidad de Valladolid, Spain.
Kiheiji Nishida (University of Tsukuba)
On the variancestabilizing multivariate nonparametric regression estimation
In linear regression under heteroscedastic variances, the Aitken estimator is employed to counter heteroscedasticity. Employing the same principle, we propose the multivariate NadarayaWatson (NW) regression estimator with variancestabilizing bandwidth matrix (VS bandwidth matrix) that minimizes asymptotic MISE while maintaining asymptotic homoscedasticity. Our proposed bandwidth matrix is diagonal by the assumption that the sphering approach is available and is defined by global and local parameters. The NW regression estimation based on VS bandwidth matrix does not produce discontinuous point unless the density of X is sparse. This is one advantage over MSE minimizing bandwidth matrix.
Michal Pesta (Charles University in Prague, Czech Republic)
Robustified total least squares and bootstrapping with application in calibration
The solution to the errorsinvariables (EIV) problem computed through total least squares (TLS) or robustified TLS is highly nonlinear. Because of this, many statistical procedures for constructing confidence intervals and testing hypotheses cannot be applied. One possible solution to this dilemma is bootstrapping. Justification for use of the nonparametric bootstrap technique is given. On the other hand, the classical residual bootstrap could fail. Proper residual bootstrap procedure is provided and its correctness proved. The results are illustrated through a simulation study. An application of this approach to calibration data is presented.

May 22 (Saturday)
8:3010:00am 
Parallel Sessions 
Ballroom 
Machine Learning (Invited)
Sayan Mukherjee (Duke University)
Geometry and Topology in Inference
We use two problems to illustrate the utility of geometry and topology in statistical inference: supervised dimension reduction (SDR), and inference of (hyper) graph models. We start with a "tale of two manifolds." The focus is on the problem of supervised dimension reduction (SDR). We first formulate the problem with respect to the inference of a geometric property of the data, the gradient of the regression function with respect to the manifold that supports the marginal distribution. We provide an estimation algorithm, prove consistency, and explain why the gradient is salient for dimension reduction. We then reformulate SDR in a probabilistic framework and propose a Bayesian model, a mixture of inverse regressions. In this modeling framework the Grassman manifold plays a prominent role. The second part of the talk develops a parameterization of hypergraphs based on the geometry of points in ddimensions. Informative prior distributions on hypergraphs are induced through this parameterization by priors on point configurations via spatial processes. The approach combines tools from computational geometry and topology with spatial processes and offers greater control on the distribution of graph features than ErdosRenyi random graphs.
Sijian Wang (University of Wisconsin)
Regularized REML for Estimation and Selection of Fixed and Random Effects in Linear MixedEffects Models
The linear mixed effects model (LMM) is widely used in the analysis of clustered or longitudinal data. In the practice of LMM, inference on the structure of random effects component is of great importance not only to yield proper interpretation of subjectspecific effects but also to draw valid statistical conclusions. This task of inference becomes significantly challenging when a large number of fixed effects and random effects are involved in the analysis. The difficulty of variable selection arises from the need of simultaneously regularizing both mean model and covariance structures, with possible parameter constraints between the two. In this paper, we propose a novel method of regularized restricted maximum likelihood to select fixed and random effects simultaneously in the LMM. The Cholesky decomposition is invoked to ensure the positivedefiniteness of the selected covariance matrix of random effects, and selected random effects are invariant with respect to the ordering of predictors appearing in the model. We develop a new algorithm that solves the related optimization problem effectively, in which the computational load turns out to be comparable with that of the NewtonRaphson algorithm for MLE or REML in the LMM. We also investigate large sample properties for the proposed estimation, including the oracle property. Both simulation studies and data analysis are included for illustration.
Jian Zhang (Purdue University)
LargeScale Learning by Data Compression
An important challenge in machine learning is how to efficiently learn from massive training data sets, especially with limited storage and computing capability. In this talk we introduce an efficient learning method called "compressed classification", which aims to compress observations into a small number of pseudoexamples before classification. By analyzing the convergence rate of the risk, we show the classifiers learned from compressed data can closely approximate the noncompressed classifiers by effectively reducing the noise variance. We also present a hierarchical local grouping algorithm to iteratively split observations into local groups, which leads to a faster compression process than the singlelayer counterpart. Our experiments with simulated and real datasets show that the proposed localgroupingbased compression method can outperform several other compression methods, and achieve competitive performance with noncompressed baseline using much less learning time for both smallscale and largescale classification problems.

Pfahl 140 
Data Depth (Invited)
Xin Dang (University of Mississippi)
Kernelized Spatial Depth on Outlier Detection and Graph Ranking
Statistical depth functions provide centeroutward ordering of points with respect to a distribution or a date set in high dimensions. Of the various depth notions, the spatial depth is appealing because of its computational efficiency. However, it tends to provide circular contours and fail to capture well the underling probabilistic geometry outside of the family of spherically symmetrical distributions. We propose a novel depth, the kernelized spatial depth(KSD), which generalizes the spatial depth via positive definite kernels. By choosing a proper kernel, the KSD captures the local structure of data while the spatial depth fails. Based on KSD, a simple outlier detector is proposed, by which an observation with a depth value less than a threshold is declared as an outlier. Upper bounds of the swamping effect (false alarm probability) are derived and used to determine the threshold. The KSD outlier detector demonstrates a competitive performance on simulated data and data sets from real applications. We also extend KSD to graph data, where pairwise relationships of objects are given and represented by edges. Several graph kernels including a new proposed one, complement Laplacian kernel, are considered for ranking the "centrality" of graph nodes. An application of graph KSD to gene data will be briefly discussed also.
Regina Liu (Rutgers University)
DDClassifier: A new Nonparametric Classification Procedure
Most existing classification algorithms are developed by assuming either certain parametric distributions for the data or certain forms of separating surfaces. Either assumption can greatly limit the applicability of the algorithm. We introduce a novel nonparametric classification algorithm using the socalled DDplot. This algorithm is completely nonparametric, requiring no prior knowledge of the underlying distributions or of the form of the separating surface. Thus it can be applied to a wide range of classification problems. The algorithm can be easily implemented and its classification outcome can be clearly visualized on a twodimensional plot regardless of the dimension of the data. The asymptotic properties of the proposed classifier and its misclassification rate are studies. The DDclassifier is shown to be asymptotically equivalent to the Bayes rule under suitable conditions. The performance of DD classifier is also examined by using simulated and real data sets. Overall, DDclassifier performs well across a broad range of settings, and compares favorably with most existing nonparametric classifiers. This is joint work with Juan CuestaAlbertos (Universidad de Cantabria, Spain) and Jun Li (UC Riverside).
Robert Serfling (University of Texas at Dallas)
Robust, Affine Invariant, Computationally Easy Nonparametric Multivariate Outlyingness Functions
Identification of possible outliers in multivariate is of paramount importance. We desire methods which are robust, computationally easy, and affine invariant Versions based on the Mahalanobis distance meet these criteria but impose ellipsoidal contours. The spatial and projection outlyingness functions avoid this constraint but the former lacks full affine invariance and sufficient robustness, while the second is computationally intensive. Can we develop outlyingness functions which retain the favorable properties of Mahalanobis distance without confining to ellipsoidal contours? We review multivariate outlyingness functions and introduce standardizations of multivariate data which produce affine invariance of outlyingness functions. A new "spatial trimming" method is introduced to robustify the spatial approach. A notion of strong invariant coordinate system functional is introduced to standardize finite projection pursuit vectors. With these methods, we construct new outlyingness functions that are robust, affine invariant, and computationally competitive with robust Mahalanobis distance outlyingness.

Pfahl 202 
Applications (Contributed)
John Cartmell (InterDigital LLC)
Methods to PreProcess Training Data for KNearest Neighbors Algorithm
The basic Knearest neighbor classification algorithm performs a search through the training samples, computing the distance for each training sample from the sample to be classified. Once the distances, are computed the class of the majority of the k closest points is assigned as the classification of the test sample. The training phase of the algorithm is extremely efficient as no preprocessing of the training data is required. However, the phase where test samples are classified is very dear, since for every sample to be classified the entire training class must be traversed. In this paper, we explore three methods that will reduce the number of training samples to be traversed during the classification process. Each method reduces the number of samples in each class by averaging the training samples in each class using different techniques. Therefore, instead of having to compare a test sample against all of the training samples, a test sample is only compared against the reduced set of training samples. Once these methods are described, they are used, along with other classification algorithms, on real data sets to demonstrate their effectiveness from both fidelity and performance standpoints.
Pang Du (Virginia Tech)
Cure Rate Model with Spline Estimated Components
This study proposes a nonparametric estimation procedure for cure rate data based on penalized likelihood method. In some survival analysis of medical studies, there are often long term survivors who can be considered as permanently cured. The goals in these studies are to estimate the cure probability of the whole population and the hazard rate of the noncured subpopulation. When covariates are present as often happens in practice, to understand covariate effects on the cure probability and hazard rate is of equal importance. The existing methods are limited to parametric and semiparametric models. We propose a twocomponent mixture cure rate model with nonparametric forms for both the cure probability and the hazard rate function. Identifiability of the model is guaranteed by an additive assumption on hazard rate. Estimation is carried out by an EM algorithm on maximizing a penalized likelihood. For inferential purpose, we apply the Louis formula to obtain pointwise confidence intervals for cure probability and hazard rate. We then evaluate the proposed method by extensive simulations. An application to a melanoma study demonstrates the method.
Polina Khudyakov (Technion  Israel Institute of Technology)
Frailty model of customer patience in call centers
Call centers collect a huge amount of data, and this provides a great opportunity for companies to use this information for the analysis of customer needs, desires, and intentions. This study is dedicated to the analysis of customer patience, defined as the ability to endure waiting for service. This human trait plays an important role in the call center mechanism. Every call can be considered as a possibility to keep or lose a customer, and the outcome depends on the customer's satisfaction and affects the future customer's choice. The assessment of customer patience is a complicated issue because in most cases customers receive the required service before they lose their patience. To estimate the distribution of the patience, we consider all calls with nonzero service time as censored observations. Different methods, for estimating the customer patience, already exist in the literature. Some of these use either the Weibull distribution (Palm, 1953) or the standard KaplanMeier productlimit estimator (Brown et al., 2005, JASA, 3650). Our work is the first attempt to apply frailty models in customer patience analysis while taking into account the possible dependency between calls of the same customer, and estimating this dependency. In this work we first extended the estimation technique of Gorfine et al (2006, Biometrika, 735741) to address the case of different unspecified baseline hazard functions for each call, in case the customer behavior changes as s/he becomes more experienced with the call center services. Then, we provided a new class of test statistics for testing the equality of the baseline hazard functions. The asymptotic distribution of the test statistics was investigated theoretically under the null and certain local alternatives. We also provided consistent variance estimators. The test statistics properties, under finite sample size, were studied by extensive simulation study and verified the control of Type I error and our proposed sample size calculations. The utility of our proposed estimation technique and the new test statistic is illustrated by the analysis of a call center data of an Israeli commercial company that is processing up to 100,000 calls a day. This is joint work with Prof. M. Gorfine and Prof. P.Feigin.
Padma Sastry
A Method and application to measurement of service quality: A Multidimensional approach
Evaluation of service quality in a regulated industry is necessary for effective policymaking and fair markets. Service quality, while not entirely in the eyes of the beholder, varies in definition depending on the stakeholder: service provider, customer or regulator. We provide a framework for considering service quality of a regulated industry from multiple perspectives and operationalize the concepts by developing a method for incorporating differing stakeholders' interests. We define measures of relative performance, orientation and cohesion and use them to analyze industrywide trends over time. We use these measures to study the effect of the 1996 Telecom Act.

10:3011:30am 
Plenary Talk 
Ballroom 
Grace Wahba (University of WisconsinMadison)
The LASSOPatternsearch Algorithm: Multivariate Bernoulli Patterns of Inputs and Outputs
We describe the LASSOPatternsearch algorithm, a two or three step procedure whose core applies a LASSO penalized likelihood to univariate Bernoulli response data Y given a very large attribute vector X from a sparse multivariate Bernoulli distribution. Sparsity here means that the conditional distribution of Y given X is assumed to have very few terms, but some may be of higher order (patterns). An algorithm which can handle a very large number (two million) of candidate patterns in a global optimization scheme is given, and it is argued that global methods have a certain advantage over greedy methods in the variable selection problem. Applications to demographic and genetic data are described. Ongoing work on correlated multivariate Bernoulli outcomes including tuning is briefly described.

1:002:30pm 
Parallel Sessions 
Pfahl 140 
Applications (Invited)
Thomas Bishop (The Ohio State University / NCACI)
Activity at the Nationwide Center for Advanced Customer Insights (NCACI)
The Nationwide Mutual Insurance Company and The Ohio State University have established the Nationwide Center for Advanced Customer Insights. The objective of the center is to conduct applied research to develop customer insights using state of the art predictive modeling, data mining and advanced analytical techniques that improve Nationwide's understanding of customer behavior and consumer purchasing patterns. The center is fully funded by Nationwide and managed by Ohio State. It employs best in class OSU faculty, staff and graduate students from across the University, including faculty and students from the Departments of Marketing, Statistics, Psychology, Economics, Computer Science, and Industrial and Systems Engineering. The center manages applied business research projects involving the application of existing theory and methodologies to solve specific marketing, business and operational problems. It also manages seminal business research projects requiring state of the art research by OSU faculty and graduate students to develop new analytical methodologies. The center offers OSU faculty and graduate students research opportunities and direct access to Nationwide customer and marketing data. Nationwide has agreed to grant OSU researchers the right to publish the research results subject to coding the data to protect confidential information. Our faculty and students work directly with Nationwide executives and staff to solve marketing and business problems important to Nationwide. This presentation will address the genesis for the Center, the strategy for integrating academic, graduate student and corporate research interests aimed at applied research, and several examples of applied research projects that have been completed by the Center.
Yiem Sunbhanich (CACI/Nationwide)
Key Elements for Effective Execution of Applied Statistics in Corporate Environment
The Nationwide Mutual Insurance Company and The Ohio State University have established the Nationwide Center for Advanced Customer Insights. The objective of the center is to conduct seminal and applied business research and develop customer insights using state of the art predictive modeling, data mining and advanced analytical techniques that improve Nationwide's understanding of customer behavior and consumer purchasing patterns. Yiem Sunbhanich is the ExecutiveinResidence at the Center. He will share his perspectives and experience on how to effectively transform information into actionable insights in corporate environment. Business case on proactive contact will be presented together with the key elements in making the execution of this proactive contact program successful. Examples of those key elements are problem formulation, scalable insight production process, effective communication, and incentive alignment.
Joseph Verducci (The Ohio State University)
Mining for Natural Experiments
Scientific knowledge has been amassed mostly through scientific experiments. A standard format for these is to create an experimental design, controlling the levels of key variables X to infer a response surface E[Y] = f(X), keeping the values of all potentially confounding variables Z constant throughout the experiment. The resulting knowledge about f(X) generalizes to all contexts with the same value of Z. In data mining, we typically try to find a relationship E[Y] = g(X,Z) that can be crossvalidated or validated on a particular external dataset of interest. This severely limits the generalizability of the findings and leads to underestimation of the applicable false discovery rate. This talk suggests a strategy for finding "nuggets" of (subsample, variable subset) pairs that should have greater generalizability than the currently popular methods.

Pfahl 202 
Model Selection (Invited)
Chong Gu (Purdue University)
Nonparametric regression with crossclassified responses
For the analysis of contingency tables, loglinear models are widely used to explore associations among the marginals. In this talk, we present modeling tools to disaggregate contingency tables along an xaxis and estimate the probabilities of crossclassified yvariables as smooth functions of covariates. Possible correlations among longitudinal or clustered data can be entertained via random effects. A suite of R functions are made available, which incorporates a cohort of techniques including crossvalidation, KullbackLeibler projection, and Bayesian confidence intervals for odds ratios.
Yuhong Yang (University of Minnesota)
Parametric or Nonparametric? An Index for Model Selection
Parametric and nonparametric models are convenient mathematical tools to describe characteristics of data with different degrees of simplification. When a model is to be selected from a number of candidates, not surprisingly, differences occur when the data generating process is assumed to be parametric or nonparametric. In this talk, in a regression context, we will consider the question if and how we can distinguish between parametric and nonparametric situations and discuss feasibility of adaptive estimation to handle both parametric and nonparametric scenarios optimally. The presentation is based on a joint work with Wei Liu.
Ji Zhu (University of Michigan)
Penalized regression methods for ranking variables by effect size, with applications to genetic mapping studies
Multiple regression can be used to rank predictor variables according to their "unique" association with a response variable  that is, the association that is not explained by other measured predictors. Such a ranking is useful in applications such as genetic mapping studies, where one goal is to clarify the relative importance of several correlated genetic variants with weak effects. The use of classical multiple regression to rank the predictors according to their unique associations with the response is limited by difficulties due to collinearities among the predictors. Here we show that regularized regression can improve the accuracy of this ranking, with the greatest improvement occurring when the pairwise correlations among the predictor variables are strong and heterogeneous. Considering a large number of examples, we found that ridge regression generally outperforms regularization using the L1 norm for variable ranking, regardless of whether the true effects are sparse. In contrast, for predictive performance, L1 regularization performs better for sparse models and ridge regression performs better for nonsparse models. Our findings suggest that the prediction and variable ranking problems both benefit from regularization, but that different regularization approaches tend to perform best in the two settings. This is joint work with NamHee Choi and Kerby Shedden.

Pfahl 302 
Semiparametrics (Contributed)
Jinsong Chen (University of Virginia)
A generalized semiparametric singleindex mixed model
The linear model in the generalized linear mixed models is not complex enough to capture the underlying relationship between the response and its associated covariates. We use a singleindex model to generalize this model to have the linear combination of covariates enter the model via a nonparametric link function. We call this model a generalized semiparametric singleindex mixed model. The marginal likelihood is approximated using the Laplace method. A double penalized quasilikelihood approach is proposed for estimation. Asymptotic properties of the estimators are developed. We estimate variance components using marginal quasilikelihood. Simulation and the study of the association between daily air pollutants and daily mortality in various counties of North Carolina are used to illustrate the models and the proposed estimation methodology. This is coauthored with Inyoung Kim (Virginia Tech University) and George R. Terrell (Virginia Tech University).
Bo Kai (College of Charleston)
New Estimation and Variable Selection Methods for Semiparametric Regression Models
In this work, we propose new estimation and variable selection procedures for the semiparametric varyingcoefficient partially linear model. We first study quantile regression estimates for this model. To achieve nice efficiency properties, we further develop a semiparametric composite quantile regression (semiCQR) procedure. We establish the asymptotic normality both the parametric and nonparametric estimates and show that they achieve the best convergence rate. Moreover, we show that the semiCQR method is much more efficient than the leastsquares based method for many nonnormal errors and only loses a little efficiency for normal errors. To achieve sparsity with highdimensional covariates, we propose adaptive penalization methods for variable selection and prove the methods possess the oracle property. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedures. This is a joint work with Runze Li and Hui Zou.
Ganna Leonenko (Swansea University)
Statistical Learning in Semiparametric Models of Remote Sensing: Empirical Divergence and Information Measures, Robust and Minimum Contrast Methods
Estimation of biophysical parameters from satellite data is one of the most challenging problems in remote sensing. We present the statistical leaning results for the radiative transfer model(FLIGHT), which calculates bidirectional reflectance distribution function (BRDF) using Monte Carlo simulation of photon transport and represents complex vegetation structures as well as angular geometry. For statistical learning in semiparametric model the empirical divergence and information measures has been applied. We also investigate a class of robust statistics and minimum contrast estimates. We find that LSE does not work very well for nonlinear problem of the type investigated and that estimation of biophsycial parameters can be improved in some cases up to 13%. This talk is based on the joint work with S.Los and P.North.
Jelani Wiltshire (Florida State University)
A general class of test statistics to test for the effect of age of species on their extinction rate
Van Valen's Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen's work, various statistical approaches have been used to address the relationship between taxon duration (age) and the rate of extinction. Some of the more recent approaches to this problem using Planktonic Foraminifera (Foram) extinction data include Weibull and Exponential modeling (Parker and Arnold, 1997), and Cox proportional hazards modeling (Doran et al, 2004,2006). I propose a general class of test statistics that can be used to test for the effect of age on extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead I control for covariate effects by pairing or grouping together similar species. In my presentation I will apply my test statistics to the Foram data and to simulated data sets.

2:454:15pm 
Parallel Sessions 
Pfahl 140 
Climatic Applications (Invited)
Lasse Holmstrom (University of Oulu)
Scale space methods in climate research
Statistical scale space analysis aims to find features in the data that appear in different scales, or levels of resolution. Scaledependent features are revealed by multiscale smoothing, the idea being that each smooth provides information about the underlying truth at a particular scale. We discuss a Bayesian scale space technique and its application to the study of temperature variation, both past and future. Analysis of past temperatures involves fossilbased reconstructions of post Ice Age climate in northern Fennoscandia where features that appear in different time scales are of interest. Future temperatures, on the other hand, are computer climate model predictions and we seek to establish patterns of warming that appear in different spatial scales.
Cari Kaufman (UC Berkeley)
Functional ANOVA Models for Comparing Sources of Variability in Climate Model Output
Functional analysis of variance (ANOVA) models partition a functional response according to the main effects and interactions of various factors. Motivated by the question of how to compare the sources of variability in climate models run under various conditions, we develop a general framework for functional ANOVA modeling from a Bayesian viewpoint, assigning Gaussian process prior distributions to each batch of functional effects. We discuss computationally efficient strategies for posterior sampling using Markov Chain Monte Carlo algorithms, and we emphasize useful graphical summaries based on the posterior distribution of modelbased analogues of the traditional ANOVA decompositions of variance. We present a case study using these methods to analyze data from the Prudence Project, a climate model intercomparison study providing ensembles of climate projections over Europe.
Tao Shi (The Ohio State University)
Statistical Modeling of AIRS Level 3 Quantization Data
Atmospheric Infrared Sounder (AIRS) has been collecting temperatures, water vapor massmixing ratios, and cloud fraction at various atmosphere pressure levels. It generates 35 dimensional vectors at each 45km ground footprint in each satellite path in its level2 data. The level 3 quantization data (L3Q) summarize valid level2 data in each 5 degree by 5 degree latitudelongitude grid box during a time period by a set of representative vectors and their associated weights. The specialty of the data set is that the observations are empirical distributions. Most statistical methods are mainly developed for handling datasets whose observations are in R^d. Statistical inference for this type of data is an open problem. We start with the commonly used Mallows distance as a measure of distance between two distributions and build a mixture model on empirical distributions with each component being a Gaussian type distribution. We further fit the model using Data Spectroscopic type of methods for AIRS L3Q data. Finally, we will address some statistical questions such as classification and prediction on AIRS L3Q data. This is joint work with Dunke Zhou (OSU).

Pfahl 202 
Robust Methods (Invited)
Claudio Agostinelli (Ca' Foscari University)
Local Simplicial Depth
Data depth is a distributionfree statistical methodology for graphical/analytical investigation of data sets. The main applications are a centeroutward ordering of multivariate observations, location estimators and some graphical presentations (scale curve, DDplot). By definition, depth functions provide a measure of centralness which is monotonically decreasing along any given ray from the deepest point. This implies that any depth function is unable to account for multimodality and mixture distributions. To overcome this problem we introduce the notion of Local Depth which generalized the concept of depth. The Local Depth evaluates the centrality of a point conditional on a bounded neighborhood. For example, the local version of simplicial depth is the ordinary simplicial depth, conditional on random simplices whose volume is not greater than a prescribed threshold. These generalized depth functions are able to record local fluctuations of the density function and are very useful in mode detection, identification of the components in a mixture model and in the definition of "nonparametric" distance for performing cluster analysis. We provide theoretical results on the behavior of the Local Simplicial Depth and we illustrate. Finally we discuss the computational problems involved in the evaluation of the Local Simplicial Depth. This is joint work with M. Romanazzi.
Marianthi Markatou (Columbia University)
A closer look at estimators of variance of the generalization error of computer algorithms
We bring together methods from machine learning and statistics to study the problem of estimating the variance of the generalization error of computer algorithms. We study this problem in the simple context of predicting the sample mean as well as in the case of linear and kernel regression. We illustrate the role of the training and test sample size on the performance of the estimators and present a simulation study that exemplifies the characteristics of the derived variance estimators and of those existing in the literature.
Ruben Zamar (University of British Columbia)
Clustering using linear patterns
I will first describe a method called linear grouping algorithm (LGA), which can be used to detect different linear structures in a data set. LGA combines ideas from principal components, clustering methods and resampling algorithms. I will show that LGA can detect several different linear relations at once, but can be affected by the presence of outliers in the data set. I will then present a robustification of LGA based on trimming. Finally, if time allows, I will present partial likelihood extension of LGA that allows for a flexible modelling of linear clusters with different scales.

Pfahl 302 
Dimension Reduction, Manifold Learning and Graphs (Contributed)
Yuexiao Dong (Temple University)
Nonlinear inverse dimension reduction methods
Many classical dimension reduction methods, especially those based on inverse conditional moments, require the predictors to have elliptical distributions, or at least to satisfy a linearity condition. Such conditions, however, are too strong for some applications. Li and Dong (2009) introduced the notion of the central solution space and used it to modify firstorder methods, such as sliced inverse regression, so that they no longer rely on these conditions. In this paper we generalize this idea to secondorder methods, such as sliced average variance estimator and directional regression. In doing so we demonstrate that the central solution space is a versatile framework: we can use it to modify essentially all inverse conditional moment based methods to relax the distributional assumption on the predictors. Simulation studies and an application show a substantial improvement of the modified methods over their classical counterparts.
Andrew Smith (University of Bristol)
Nonparametric regression on a graph
The 'Signal plus Noise' model for nonparametric regression can be extended to the case of observations taken at the vertices of a graph. This model includes many familiar regression problems. This talk discusses the use of the edges of a graph to measure roughness in penalized regression. Distance between estimate and observation is measured at every vertex in the L2 norm, and roughness is penalized on every edge in the L1 norm. Thus the ideas of totalvariation penalization can be extended to a graph. This presents computational challenges, so we present a new, fast algorithm and demonstrate its use with examples, including denoising of noisy images, a graphical approach that gives an improved estimate of the baseline in spectroscopic analysis, and regression of spatial data (UK house prices).
Minh Tang (Indiana University)
On the relationship between Laplacian eigenmaps and diffusion maps
Laplacian eigenmaps and diffusion maps are two popular techniques for manifold learning. Each of these techniques can be conceived as a technique that constructs a Euclidean configuration of points by graph embedding. If the graph is undirected, then the diffusion map turns out to be an anisotropic scaling of the Laplacian eigenmap.
Johan Van Horebeek (CIMAT)
ANOVA weighted Kernel PCA based on random projections
For datasets with many observations, we show how random projections can be used to perform in an efficient way kernel PCA and such that insight is obtained about variable importance.


