Réunion d'hiver 2004 de la SMC

Mathematical Methods in Statistics / Méthodes mathématiques en statistique
Org: Russell Steele, Alain Vandal and/et David Wolfson (McGill)

JEAN-FRANCOIS ANGERS, Dép. de math/stat, Université de Montréal, C. P. 6128 Succ. Centre-ville, Montréal, QC H3C 3J7
Mixture of Zero Inflated Densities

In several real life examples one encounters count data where the number of zeros is such that usual discrete probability density functions does not fit the data. Quite often the number of zeros is large, and hence the data are zero-inflated. Furthermore, the histogram is often multimodal indicating that the data may come from different sub-populations. In such a situation, a zero inflated model along with a mixture of discrete probability density functions can be considered and a Bayesian analysis can be carried out. Using the EM algorithm, Bayesian estimates and credibility intervals for the different parameters are obtained. The proposed technique is illustrated using a real life data set.

MASOUD ASGHARIAN, McGill University
On the Singularities of the Information Matrix

The information matrix plays a central role when establishing asymptotic normality of parameter estimates in problems of statistical inference. One recurring condition for asymptotic normality is that the information matrix be positive definite. For many problems, however, this condition seems virtually impossible to verify. An important class of models where this is the case, is the class of mixture models for which the form of the information matrix prevents the verification of this crucial condition. Using the Subimmersion Theorem we show that under identifiability and Le Cam's smoothness conditions the set of singularities of the information matrix is a nowhere dense set. Under further conditions we demonstrate that this set is also of measure zero, provided that the score function, when considered as a function on a complex domain conformable with the parameter space, is bounded by a statistic whose second moment exists. We also study the measure of this set when parameter orthogonality is possible. In particular, it is shown that one can find a reparameterization under which the set of singularities of the information matrix is nowhere dense and of measure zero set, provided that parameter orthogonality is possible. Our results, therefore, suggest that in problems for which the tangible conditions of identifiability and smoothness may be assumed, positive definiteness of the information matrix "rarely" fails to hold.

CHRISTIAN GENEST, Université Laval, Québec, Canada G1K 7P4
Testing independence revisited / Un nouveau regard sur les tests d'indépendance

Testing independence is a time-honored problem in statistics. This issue will be revisited here in the light of the theory of copulas. It will be argued that outside the normal paradigm, effective tests of independence should be rank-based, and that nonparametric tests of independence yield most powerful and robust procedures that those based on the likelihood. The small- and large-sample properties of locally most powerful procedures will be compared to those of standard tests, notably through the notion of Pitman's asymptotic relative efficiency. Tests based on Cramér-von Mises and Kolmogorov-Smirnov functionals of Deheuvels' empirical copula process will also be considered. This presentation will be based on joint work with J.-F. Quessy, B. Rémillard, and F. Verret.
Le problème de tester l'indépendance est classique en statistique. La question sera abordée ici sous l'angle de la théorie des copules. On fera valoir qu'en dehors du cadre normal, tout bon test d'indépendance devrait être fondé sur les rangs des observations et qu'à cet égard, les procédures non paramétriques sont généralement plus puissantes et robustes que celles qui s'appuient sur la vraisemblance. Le comportement de tests de rangs localement les plus puissants sera comparé à celui de procédures classiques, tant dans de petits que de grands échantillons, notamment au moyen de la notion d'efficacité relative asymptotique de Pitman. On s'intéressera en outre à des tests construits à partir de fonctionnelles de type Cramér-von Mises et Kolmogorov-Smirnov du processus de copule expérimental de Deheuvels. Cette présentation s'appuie sur des travaux réalisés en collaboration avec J.-F. Quessy, B. Rémillard et F. Verret.

PIERRE LEGENDRE, Université de Montréal, C.P. 6128, Succ. Centre-ville, Montréal, Québec H3C 3J7
What are the important spatial scales in an ecosystem?

Spatial heterogeneity of ecological structures comes either from the physical forcing of environmental variables or from community processes. In both cases, spatial structuring plays a functional role in ecosystems. Ecological models should explicitly take into account the spatial organization of ecosystems.
A canonical (regression-type) modeling method has been developed, which allows the decomposition of the variance of multivariate (e.g., species abundance) data table into four components:

(a) a non-spatially-structured component explained by the environmental variables in the model,
(b) a spatially-structured component of environmental variation,
(c) a spatially-structured fraction which is not explained by the environmental variables and possibly results from community dynamics, and
(d) a residual fraction.
The first three components can be mapped separately, providing new insights into community dynamics.
In previous work, we used a polynomial function of the geographic coordinates of the sampling sites to represent broad-scale spatial variation. We found a way of representing spatial structures at all scales in these analyses. This is obtained by eigenvalue decomposition of a truncated distance matrix among sampling sites. The behavior of this method has been investigated using numerical simulations and real data sets. When sampling occurred along a transect or a regular grid, this modeling method allows the estimation of the variance associated with each spatial scale in the observation window. A graph of the resulting F-statistic against scales is called a Scalogram. It indicates the significant spatial scales present in the multivariate data under study-for example, a community composition data table.

References

[1]
D. Borcard, P. Legendre and P. Drapeau, Partialling out the spatial component of ecological variation. Ecology 73(1992), 1045-1055.

[2]
D. Borcard and P. Legendre, Environmental control and spatial structure in ecological communities: an example using Oribatid mites (Acari, Oribatei). Environmental and Ecological Statistics 1(1994), 37-61.

[3]
, All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices. Ecological Modelling 153(2002), 51-68.

[4]
D. Borcard, P. Legendre, C. Avois-Jacquet and H. Tuomisto, Dissecting the spatial structure of ecological data at multiple scales. Ecology (in press), 2004.

[5]
P. Legendre and D. Borcard, Quelles sont les échelles spatiales importantes dans un écosystème ? In: J.-J. Droesbeke, M. Lejeune et G. Saporta (éds), Analyse statistique de données spatiales. Editions TECHNIP, Paris (in press), 2004.

[6]
P. Legendre, J. A. Rusak and D. Borcard, Temporal scales of zooplankton variation and resistance in a whole-lake acidification experiment. Limnology & Oceanography, submitted.

[7]
P. Legendre and L. Legendre, Numerical ecology. 2nd English edition, Elsevier Science, 1998.

BRENDA MacGIBBON, Département de mathématiques, Université du Québec à Montréal
Exact inference for categorical data

Exact methods of inference for parameter significance and goodness-of-fit with categorical data have recently been the subject of renewed statistical interest because contingency tables are arising in many applications which have integer entries of counts small enough in some cells to cause doubt about the validity of multivariate normal approximations. On the other hand, the tables in these applications have entries large enough in other cells to make enumeration difficult. Exact computational methods fall into two groups: complete enumeration and Monte Carlo methods. These methods will be illustrated on a data set of categories of congenital heart malformations for sibling pairs and a case-control one. This is joint work with Yuguo Chen and Ian Dinwoodie.

NEAL MADRAS, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3
A Model for Tracking the History of the AIDS Epidemic

The epidemiology of AIDS poses many challenging statistical and mathematical problems. In particular, tracking and forecasting the AIDS epidemic is complicated by very long incubation times. Individuals with HIV infections are often diagnosed before developing AIDS, and the ensuing treatment makes it difficult to model incubation times.
I shall describe a new model that accounts for early detection without introducing bias. We use a Gibbs sampler Monte Carlo simulation to estimate the probabilities of diagnoses and the total number of new HIV infections each year among homosexual men in Ontario.

PAUL MARRIOT, University of Waterloo, 200 University Ave. W., Waterloo, Ontario N2L 3G1
Mixture Models and Geometry

The class of statistical models known as mixtures have wide applicability in applied problems due to their flexibility, naturalness and interpretability. However despite their apparent simplicity the inference problem associated with them remains hard, both from a theoretical and a practical standpoint. This talk gives an overview of some methods which use geometric techniques to understand the problem of inference under mixture models. The recently introduced class of local mixtures is shown to have many applications, managing to retain a great deal of flexibility and interpretability while having excellent inference properties. Also discussed will be some interesting issues which arise when you transfer ideas from one mathematical area (differential geometry) to another (statistical inference).

BRUNO REMILLARD, HEC Montréal
Bootstrapping methods for empirical processes

In this talk I will show that some bootstrapping methods work for empirical processes, while some other methods do not work. Examples will include parametric bootstrap for goodness-of-fit test for copula families and multiplier methods for empirical processes of pseudo-observations.

LOUIS-PAUL RIVEST, Laval
Utilisation des quaternions pour la modélisation statistique du mouvement humain

En biomécanique la mesure du mouvement humain fait intervenir des systèmes de caméra qui enregistrent la position de marqueurs fixés sur les membres du sujet. Les coordonnées des marqueurs sont convertis en des matrices de rotations 3 ×3 qui donnent les orientations des membres à l'étude.
Le mouvement d'une articulation est ensuite calculé comme étant la suite temporelle des matrices de rotation donnant l'orientation relative, l'un part rapport à l'autre, des deux segments de l'articulation. Cette suite de rotations est représentée sous la forme de trois séries temporelles d'angles d'Euler, respectivement associées à des mouvements de type flexion-extension, abduction-adduction et rotation externe-interne. Cet exposé présente un modèle statistique qui permet d'identifier des erreurs de mesure pour certains types de mouvement. Il postule qu'un changement judicieux de l'orientation des systèmes d'axes des deux segments d'une articulation permet de représenter le mouvement à l'aide d'une seule suite d'angles d'Euler; ainsi ce mouvement serait, par exemple, une flexion-extension pure. Cet exposé montre que l'écriture des matrices de rotation sous la forme de quaternions permet d'ajuster ce modèle de façon relativement simple.

ALAIN VANDAL, McGill University, 805 rue Sherbrooke ouest, Montréal, Québec H3A 2K6
Weak order partitioning of interval orders, with application to survival analysis

We propose a partition of the set of linear extensions of any interval order, in which each subset is itself the set of linear extensions of a weak order. This partitioning technique lends itself immediately to the definition of a Markov chain on the partitioning weak orders. Because the number of linear extensions of a weak order is easily computed, this chain can be used in Monte Carlo fashion to draw a linear extension uniformly from the interval order. For the statistical investigation of interval orders, this technique is more attractive than Bubley & Dyers, as the correlation between statistics on consecutive linear extensions in the chain is very small. Using an alternate, computationally simple technique to draw linear extensions in non-uniform fashion, we show how we can estimate the number of linear extensions in an interval order using this Markov chain Monte Carlo. We also show how the Markov chain can be used to produce rank statistics for interval-censored failure times of subjects in a control or treatment group, under the null hypothesis of equivalence between control and treatment.