|
Recently, generalized inference has become an efficient and useful tool which gives more accurate intervals for a variety of intractable complex problems such as the Behrens-Fisher problem. In this talk, we will present a generalized inference solution of typical Behrens-Fisher problem in general location-scale families. The proposed solution is based on the minimum risk equivariant estimators and thus, the underlying approach is an extension of the methods based on maximum likelihood estimators and conditional inference, which have been so far, applied to some specific distributions. Finally, we will present some simulation results as well as analysis results of two real data sets.
Empirical likelihood is a popular nonparametric or semi-parametric statistical method with many nice statistical properties. Yet when the sample size is small, or the dimension of the accompanying estimating function is high, the application of the empirical likelihood method can be hindered by low precision of the chisquare approximation and by non-existence of solutions to the estimating equations. In this paper, we show that the adjusted empirical likelihood is effective at addressing both problems. With a specific level of adjustment, the adjusted empirical likelihood achieves the high-order precision of the Bartlett correction, in addition to the advantage of a guaranteed solution to the estimating equations. Simulation results indicate that the confidence regions constructed by the adjusted empirical likelihood have coverage probabilities comparable to or substantially more accurate than the original empirical likelihood enhanced by the Bartlett correction.
We study estimation and variable selection problems in mixture-of-experts (MOE) models. A new modified maximum likelihood estimation (MMLE) method is proposed. It is shown that the MMLE is root-n consistent, and simulations indicate its better finite sample behavior compared to the ordinary MLE. For variable selection, we apply two penalty functions to the modified likelihood.The method is computationally efficient, and theoretically it is shown to be consistent in variable selection. Two Bayesian information criteria are suggested for data adaptive choice of tuning parameters. A modified EM-Newton-Raphson algorithm is developed for numerical computations. The performance of the method is also studied through simulations. A real data analysis is presented.
There is a large amount of publicly available financial information on publicly traded corporations, usually on a quarterly year time period. These same corporations also undergo bankruptcy or acquisition through merger. It is natural to model these in a discrete time framework due to the nature of the data. We consider a bivariate discrete time hazard model. The framework is similar to that in classical biostatistics modeling, where one treats the two forms of exit from the system, namely bankruptcy and merger/acquisition, but with additional information on the type of exit. In biostatistics the cause of exit (usually death) is not known explicitly.
Such models are constructed and fit to a data base of some 12,000 publicly traded US corporations. With a large number of covariates some data reduction is needed. Both in and out of sample prediction is considered. A constant baseline hazard model does not fit well, so a smooth baseline hazard model is considered. This later model seems to give a reasonable fit in terms of prediction, and has a nice robustness property. Some tools for model assessment are developed. One useful tool for this is a limit theorem on rare multinomials which is originally due to McDonald (1980).
This is joint work with Dr Taehan Bae.
In Bayesian analysis of multi-way contingency tables, the selection of a prior distribution for either the loglinear parameters or the cell probabilities parameters is a major challenge. In this talk we define a flexible family of conjugate priors for the wide class of discrete hierarchical loglinear models which includes the class of graphical models. These priors are defined as the Diaconis-Ylvisaker conjugate priors on the loglinear parameters subject to "baseline constraints" under multinomial sampling. We also derive the induced prior on the cell probabilities and show that the induced prior is a generalization of the hyper Dirichlet prior. We show that this prior has several desirable properties and illustrate its usefulness by identifying the most probable decomposable, graphical and hierarchical loglinear models for a six-way contingency table.
This work has been done in cooperation with Jinnan Liu and Adrian Dobra.
It is often reasonable to assume that the dependence structure of a bivariate continuous distribution belongs to the class of extreme-value copulas. The latter are characterized by their Pickands dependence function. The talk is concerned with a procedure for testing whether this function belongs to a given parametric family. The test is based on a Cramér-von Mises statistic measuring the distance between an estimate of the parametric Pickands dependence function and either one of two nonparametric estimators thereof studied by Genest and Segers (2009). As the limiting distribution of the test statistic depends on unknown parameters, it must be estimated via a parametric bootstrap procedure, whose validity is established. Monte Carlo simulations are used to assess the power of the test, and an extension to dependence structures that are left-tail decreasing in both variables is considered.
In this talk, we study the detection of the multiple change points of parameters of generalized lambda distributions (GLD). The advantage of studying the GLD is that the GLD family is broad and flexible, compared to the other distributions. There are fewer restrictions on the distribution while fitting the data. We combine the binary segmentation procedure together with Schwarz information criterion (SIC) to search all the possible change points in the data. The method is applied on fibroblast cancer cell line data which is publicly available, and change points are successfully located.
Repeated measurements are collected in a variety of situations and are generally characterized by a mixed model where the correlation within the subject is specified by the random effects. In such a mixed model, we propose a multiple comparison procedure based on a variant of the Schwarz information criterion (SIC, Schwarz, 1978). The derivation of SIC indicates that SIC serves as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. Therefore, an approximated posterior probability for a candidate model can be calculated based upon SIC. We suggest a variant of SIC which includes the terms which are asymptotically negligible in the derivation of SIC. The variant improves upon the performance of SIC in small and moderate sample-size applications. Based upon the proposed variant, the corresponding posterior probability is calculated for each candidate model. A hypothesis testing for multiple comparisons involves one or more models in the candidate class, the posterior probability of the hypothesis testing is therefore evaluated as the sum of the posterior probabilities for the models associated with the testing. The approximated posterior probability based on the variant accommodates the effect of the prior on each model in the candidate class, and therefore is more effectively approximated than that based on SIC for conducting multiple comparisons. We derive the computational formula of the approximated posterior probability based on the variant in the mixed model. The applications demonstrate that the proposed procedure based on the SIC variant can perform effectively in multiple comparisons.
In recent decades stochastic graphs have been used in many fields to explain the evolution of a set of random objects (vertices),along with a relationship structure (edges). We consider statistical inference in a dynamic random graph, in the absence of edge information. It is shown that the dynamic behavior of the graph, accompanied with the vertex information is a useful in making inference about the edges. The problem is motivated by the foot-and-mouth disease (FMD) outbreak in the UK in 2001. A stochastic Euclidean graph model with Markov property, is introduced to model this epidemic. In addition, it is shown that the existing information, is sufficient to draw inference about the model and hence the missing edges.
In this talk I will discuss Bayesian hypothesis testing in the two sample problem. I will introduce some procedures based on a Bayesian nonparameteric formulation, and examine their performance in comparison to classical nonparametric procedures.
This is joint work with Chris Holmes (Oxford), François Caron (Bordeaux) and Jim Griffin (Kent).
Estimation of the incidence rate of a disease generally entails the follow-up of a disease-free cohort until a sufficient number of incident cases of the disease have been observed. Sometimes it is possible, however, to avoid the time and cost of carrying out an incidence study by following prevalent cases with the disease forward for a relatively short time period. That is, we may identify prevalent cases through a cross-sectional survey and follow them forward as part of what is known as a prevalent cohort study with follow-up. In this presentation we show how one may find the maximum likelihood estimator of the age-specific constant incidence rate from a prevalent cohort study with follow-up. Our key expression is related to the well-known epidemiological relationship between incidence, prevalence and disease duration. We apply our results to estimate the incidence rate of dementia in Canada.
Joint work with Victor Addona (Macalester College, St. Paul, MN) and Masoud Asgharian (McGill University).
Suppose that Y = (Yi) is a normal random matrix with mean Xb and covariance s2 In, where b is a p-dimensional vector (bj), X = (Xij) is an n×p matrix with Xij Î {-1,1}; this corresponds to a factorial design with -1,1 representing low or high level respectively, or corresponds to a weighing design with -1,1 representing an object j with weight bj placed on the left and right of a chemical balance respectively. E-optimal designs Z are chosen that are robust in the sense that they remain E-optimal when the covariance of Yi,Yi¢ is r > 0 for i ¹ i¢. Within a smaller class of designs similar results are obtained with respect to a general class of optimality criteria which include the A- and D-criteria.
The talk is based on my three joint papers with Joe Masaro published in 2008 in JSPI and LAA.
In this talk, the mode estimator based on the Parzen-Rosenblatt kernel estimator is considered (Parzen, 1962). In light of Shi et al. (2009), under mild conditions, we establish the relationship between the convergence rate of the mode estimator and the window width. In this way, we obtain a better convergence rate of the mode estimator.
This is joint work with X. Shi and B. Miao.
The composite likelihood method has been proposed and systematically discussed by Besag (1974), Lindsay (1988), and Cox and Reid (2004). The approach based on using the composite likelihood, especially the pairwise likelihood, has received increasing attention in recent years due to the simplicity in defining the objective function and computational advantages when dealing with data with complex structures. In this talk, I will discuss some modeling issues concerning the composite likelihood formulation.
This is joint work with Nancy Reid.
The geometric down-weighting method can be applied to enlarge an existing discrete distribution family. The enlarged family has one more parameter which regulates the decreasing rate of probability mass function, thus, yielding new moment features. Applying the geometric down-weighting method to a family with infinite mean, we can obtain an enlarged family which can have both finite and infinite means. Such an enlarged family can accommodate for the heavy-tailed count data, because it allows various tail heaviness. Particularly, when applying this method to the two-parameter discrete stable family which has infinite mean, we obtain a three-parameter discrete distribution family called the generalized Poisson-inverse Gaussian (GPIG). Apart from the extremely heavy-tailed discrete stable, the GPIG family extends the over-dispersed Poisson-inverse Gaussian (PIG) and also includes the equally-dispersed Poisson. Therefore, the GPIG family is flexible in handling less or more over-dispersed count data. We illustrate the GPIG family by the application of the citation counts of published articles in 1990 in JASA and JSPI respectively.