2025 CMS Winter Meeting

Toronto, Dec 5 - 8, 2025

Abstracts        

Mathematics of Machine Learning
Org: Ben Adcock (Simon Fraser University), Ricardo Baptista (University of Toronto) and Giang Tran (University of Waterloo)
[PDF]

HUNG-HSU CHOU, University of Pittsburgh
More is Less: Understanding Compressibility of Neural Networks via Implicit Regularization and Neural Collapse  [PDF]

Despite their recent successes, most modern machine learning algorithms lack theoretical guarantees, which are crucial to further development towards delicate tasks. One mysterious phenomenon is that, among infinitely many possible ways to fit data, the algorithms often find the ``good'' ones, even when the definition of ``good'' is not specified by the designers. In this talk I will approach this from both the microscopic view and the macroscopic view, with empirical and theoretical study of the connection between the good solutions in neural networks and the sparse solutions in compressed sensing. The key concepts are the implicit bias/regularization in machine learning models, and the neural collapse phenomenon induced by the block structure of neural tangent kernel, which can be used for out-of-distribution detection.

ISAAC GIBBS, University of California, Berkeley

AVI GUPTA, Simon Fraser University

MOHAMED HIBAT-ALLAH, University of Waterloo

SPENCER HILL, Queen’s University

SHIKHAR JAISWAL, University of Toronto

ANASTASIS KRATSIOS, McMaster University

SOPHIE MORIN, Polytechnique Montréal
Equivariant machine learning for collision detection of ellipses and related shapes  [PDF]

Computing the distance between two objects in space, in particular determining whether they have collided or are very close to doing so, is essential in a wide range of computational applications. It is trivial when the objects are spheres, line segments, or other very simple shapes, and good algorithms are known for polytopes. However, something as apparently simple as the distance between two ellipses in the plane remains surprisingly difficult if one wants both speed and accuracy. In this talk, I will discuss an equivariant machine learning framework for this problem and present some results from ongoing work.

RACHEL MORRIS, Concordia University
Regularity guarantees for adversarially robust learning  [PDF]

While neural network image classification enjoys high success rates in most settings, recent work discovered that well-targeted adversarial attacks can transform a correctly classified image into one that is visually indistinguishable from the original but that completely fools the classification algorithm. This has sparked many new approaches to classification which include an adversary in the training process: such an adversary can improve robustness and generalization properties, at the cost of decreased accuracy and increased training time. By considering a “worst-case” adversary, the resulting mathematical model for adversarial training can be understood as an energy minimization problem with a regularizing nonlocal perimeter term. In this presentation, I will discuss my current work studying regularity guarantees for the decision boundary of an adversarially robust minimizer. In particular, for a continuous and bounded underlying density, the decision boundary is $C^2$ smooth. I will discuss using explicit geometric perturbations and second variation analysis to show singular points (i.e. corners, cusps) are suboptimal. For the smoother points, I will demonstrate how leveraging necessary conditions allows one to upgrade $C^1$ regularity to $C^2$ regularity.

CAMERON MUSCO, University of Massachusetts Amherst
Structured Matrix Approximation via Matrix-Vector Products  [PDF]

In this talk, I will give an overview of recent progress on the problem of structured matrix approximation from matrix-vector products. Given a target matrix A that can only be accessed through a limited number of (possibly adaptively chosen) matrix-vector products, we seek to find a near-optimal approximation to A from some structured matrix class – e.g., a low-rank approximation, a hierarchical low-rank approximation, a sparse or diagonal approximation, etc. This general problem arises across the computational sciences and data science, both in algorithmic applications and, more recently, in scientific machine learning, where it is closely related to the problem of linear operator learning from input/output samples.

I will overview recent work, where we give 1) optimal algorithms for approximating A with a matrix with a fixed sparsity pattern (e.g., a diagonal or banded matrix), 2) the first algorithms with strong relative error bounds for hierarchical low-rank approximation, and 3) the first bounds for generic structured families with sample complexity depending on the parametric complexity of the family. I will highlight several open questions on structured matrix approximation and its applications to operator learning.

ESHA SAHA, University of Alberta

MATTHEW THORPE, University of Warwick
How Many Labels Do You Need in Semi-Supervised Learning?  [PDF]

Semi-supervised learning (SSL) is the problem of finding missing labels from a partially labelled data set. The heuristic one uses is that “similar feature vectors should have similar labels”. The notion of similarity between feature vectors explored in this talk comes from a graph-based geometry where an edge is placed between feature vectors that are closer than some connectivity radius. A natural variational solution to the SSL is to minimise a Dirichlet energy built from the graph topology. And a natural question is to ask what happens as the number of feature vectors goes to infinity? In this talk I will give results on the asymptotics of graph-based SSL using an optimal transport topology. The results will include a lower bound on the number of labels needed for consistency.

ALEX TOWNSEND, Cornell University

YUNAN YANG, Cornell University
Training Distribution Optimization in the Space of Probability Measures  [PDF]

A central question in data-driven modeling is: from which probability distribution should training samples be drawn to most effectively approximate a target function or operator? This work addresses this question in the setting where “effectiveness” is measured by out-of-distribution (OOD) generalization accuracy across a family of downstream tasks. We formulate the problem as minimizing the expected OOD generalization error, or an upper bound thereof, over the space of probability measures. The optimal sampling distribution depends jointly on the model class (e.g., kernel regressors, neural networks), the evaluation metric, and the target map itself. Building on this characterization, we propose two adaptive, target-dependent data selection algorithms based on bilevel and alternating optimization. The resulting surrogate models exhibit significantly improved robustness to distributional shifts and consistently outperform models trained with conventional, non-adaptive, or target-independent sampling across benchmark problems in function approximation, operator learning, and inverse modeling.


© Canadian Mathematical Society : http://www.cms.math.ca/