Optimization and Data Science
Org:
Michael Friedlander (University of British Columbia),
Abraham P Punnen (Simon Fraser University) and
Mohamed Tawhid (Thompson Rivers University)
[
PDF]
 MONTAZ ALI, Witwatersrand University
Convex Formulation for Planted QuasiClique Recovery [PDF]

In this talk, we consider the planted quasiclique or $\gamma$clique problem. This problem is an extension of the well known planted clique problem which is NPhard. The maximum quasiclique problem is applicable in community detection, information retrieval and biology. We propose a convex formulation using nuclear norm minimization for planted quasiclique recovery. We carry out numerical experiments using our convex formulation and the existing mixed integer programming formulations. Results show that the convex formulation performs better than the mixed integer formulations when $\gamma$ is greater than a particular threshold.
 ALEKSANDR ARAVKIN, University of Washington (Institute for Health Metrics and Evaluation)
A Robust Risk Score for Evaluating Evidence in Global Health [PDF]

How strong is the relationship between (red meat, alcohol) and heart disease? (Sugar sweetened beverages, BMI) and diabetes? (Smoking, air pollution) and lung cancer? Each of these riskoutcome pairs has been the subject of numerous large studies. With results in hand, can we rate the evidence, and compare these risks to guide policy, and initiate further studies? \newline
We present a new methodology to answer these questions. The methodology comprises modeling betweenstudy disagreement, capturing nonlinear doseresponse relationships, detecting outliers, and accounting for nonlinear observation mechanisms inherent in how studies report results. It is now used to analyze more than 470 riskoutcome pairs in the Global Burden of Disease study, conducted by IHME and collaborators. We present the main model, highlight the role of optimization, and include recent results for select riskoutcome pairs of interest.
 YANKAI CAO, University of British Columbia
A Global Optimization Algorithm for Clustering Problems [PDF]

We present a reducedspace spatial branch and bound strategy for twostage stochastic nonlinear programs. At each node, a lower bound is constructed by relaxing the nonanticipativity constraints, and an upper bound is constructed by fixing the firststage variables. Both lower and upper bounds can be computed by solving individual scenario subproblems. Another key property is that we only need to perform branching on the firststage variables to guarantee convergence.
We also extend this algorithm to address clustering problems (a class of unsupervised learning). Preliminary numerical results have demonstrated that our algorithm is able to solve problems with hundreds of samples to global optimality within a reasonable amount of time, while stateoftheart global solvers can only deal with tens of samples. Moreover, global optimization can significantly improve performance on several datasets compared with local optimal algorithms, such as kmeans.
 MONICA GABRIELA COJOCARU, University of Guelph
On Mintyvariational inequalities and evolutionary stable states of generalized stable games [PDF]

In this work we emphasize the role played by Minty variational inequalities in evolutionary game theory for studying neutrally and evolutionary stable states of nonlinear population games. This connection allows deriving new results on the sets of neutrally stable and evolutionary stable states for generalized stable games as well as stability results for the replicator dynamics.
 PAULA FERMÍN CUETO, University of Edinburgh
Machine learning and statistical methods for characterising and predicting capacity degradation of Liion cells [PDF]

In automotive applications, Liion cells are typically considered to have reached their endoflife when they are down to 80\% of their initial capacity. However, the degradation of these cells typically displays a "knee" pattern: the capacity degrades at a slow rate up to a socalled "kneepoint", after which it degrades very rapidly until its endoflife. This kneepoint therefore gives a more advanced warning of the cell's degradation than the endoflife. Nevertheless, the industry does not have a standard definition or identification method for this crucial metric.
In this talk, we present robust statistical methods to identify two different kneepoints in capacity degradation data of Liion cells. Following this identification step, we show how machine learning algorithms can be employed to successfully predict the occurrence of these kneepoints from the first few discharge cycles of a cell's life. We rely on feature engineering to overcome the challenge of working with a very small, yet highdimensional data set and we quantify the uncertainty of the predictions to build trust in our models.
 GONÇALO DOS REIS, University of Edinburgh
State of Health for the capacity and internal resistance of Liion cells: A machine learning approach with knees and elbows [PDF]

Degradation of lithiumion cells with respect to increases of internal resistance (IR) has negative implications for rapid charging times and thermal management of cell in electric vehicles and energy storage applications. Despite this, IR and associated IR State of Health have received much less attention than the State of Health with respect to capacity degradation in Liion research. We address this by building on recent developments on ``knee" identification for capacity degradation curves. We propose the concepts of ``elbowpoint" and ``elbowonset" for IR degradation curves, and create an identification algorithm for these variables.
We use machine learning Neural Network techniques to build independent capacity and IR predictor models achieving a MAPE of 0.4\% and 1.6\%, respectively. We then use the IR model to synthesize internal resistance data to complete the dataset from Attia et al 2020 for which no IR data was logged.
 LUKASZ GOLAB, University of Waterloo
Explanation Tables [PDF]

I will present a solution to the following data summarization problem: given a dataset with multiple categorical features and a binary outcome, construct a summary that offers an interpretable explanation of the factors affecting the outcome in terms of the feature value combinations. We refer to such a summary as an explanation table, which is a disjunction of overlapping patterns over the features, where each pattern specifies a conjunction of attribute=value conditions. We give an efficient and effective greedy algorithm to construct explanation tables that uses sampling to prune the space of candidate patterns. This is joint work with Parag Agrawal, Kareem El Gebaly, Guoyao Feng, Flip Korn, Divesh Srivastava and Michael Vollmer.
 WARREN HARE, University of British Columbia
Imaginary Derivative Free Optimization [PDF]

Consider the problem of minimizing an objective function that is provided by a blackbox. Suppose that, while the optimization problem seeks a realvalued solution, the blackbox is capable of accepting complexvalued input and returning complexvalued output. We explore using complexvariables in a modelbased derivativefree optimization method. We begin by discussion how to construct such model and then present the results of some numerical experiments. Results suggest that the quality of a modelbased DFO algorithm is (i) highly impacted by the number of function evaluations required to create the models and (ii) also impacted by the accuracy of the created models, but to a lesser degree.
 THOMAS HUMPHRIES, University of Washington Bothell
Unrolled iterative algorithm for CT image reconstruction with learned penalty term [PDF]

In the last three to four years there has been an explosion of interest in using deep learning techniques to address challenging problems in computed tomography (CT) image reconstruction, such as lowdose, sparseview, and limited angle imaging. A wide variety of approaches have been proposed, including using deep neural networks (DNN) as pre or postprocessing steps, using neural networks to encode prior information within existing iterative reconstruction algorithms, or learning to solve the inverse problem altogether.
We present a CT reconstruction approach which unrolls a standard iterative algorithm and trains it endtoend as a DNN. The DNN consists of fixed layers, corresponding to the basic iterative algorithm, as well as trainable layers, which have the effect of perturbing the solution between iterations. The trainable layers can be viewed as replacing the negative gradient of an unknown penalty function or regularizer, which can vary with the iteration number. In numerical experiments, we test the approach on sparseview and limitedangle CT problems, and study the effect of network architecture on the effectiveness of the algorithm. The proposed method provides significant improvement over the basic iterative algorithm, as well as total variation minimization approach. Joint work with Yiran Jia, Noah McMichael, Pedro Mokarzel, and Dong Si (UW Bothell).
 ABDELMONEM IBRHAIM, Mathematics Department, Faculty of Science, AlAzhar University, Egypt
Binary whale optimization algorithm for feature selection [PDF]

The principle of any approach for solving feature selection problem is to find a subset of the original features. Since finding a minimal subset of the features is an NPhard problem, it is necessary to develop and propose practical and efficient heuristic algorithms. The whale optimization algorithm is a recently developed natureinspired metaheuristic optimization algorithm that imitates the hunting behavior of humpback whales to solve continuous optimization problems. In this paper, we propose a novel binary whale optimization algorithm (BWOA) to solve the feature selection problem. BWOA is especially desirable and appealing for feature selection problem whenever there is no heuristic information that can lead the search to the optimal minimal subset. Nonetheless, whales can find the best features as they hunt the prey. Rough set theory (RST) is one of the effective algorithms for feature selection. We use RST with BWOA as the first experiment, and in the second experiment, we use a wrapper approach with three different classifiers for feature selection. Also, we verify the performance and the effectiveness of the proposed algorithm by performing our experiments using 32 datasets from the UCI machine learning repository and comparing the proposed algorithm with some powerful existing algorithms in the literature. The results show that the proposed algorithm can provide an efficient tool to find a minimal subset of the original features.
 ZHAOSONG LU, University of Minnesota
FirstOrder Augmented Lagrangian Methods for Convex Conic Programming [PDF]

In this talk, we propose some firstorder augmented Lagrangian (AL) methods for solving a class of convex conic programming with adaptive update on penalty parameters and inexactness associated with the AL subproblems. We establish their firstorder oracle complexity for finding an approximate Karush–Kuhn–Tucker (KKT) point. To our best knowledge, our complexity is the lowest one among all existing firstorder AL methods for finding an approximate KKT point.
 IBRAHIM NUMANAGIĆ, University of Victoria
Optimization in Pharmacogenomics [PDF]

Highthroughput sequencing provides the means to determine the allelic decomposition— the exact sequence content of all gene copies present in the sample— for any gene of interest. When applied to pharmaceutical genes, such decomposition can be used to inform the drug treatment and dosage decisions. However, many clinically and functionally important genes are highly polymorphic and have undergone structural alterations, and as such present a significant challenge for the existing genotyping methods.
Here we present a combinatorial optimization framework based on integer linear programming that is able to efficiently solve this problem for various pharmacogenes, including those with structural alterations. We also show how to adapt these linear programs for the emerging longrange barcoded sequencing datasets.
 COURTNEY PAQUETTE, McGill University
Halting Time is Predictable for Large Models: A Universality Property and Averagecase Analysis [PDF]

Averagecase analysis computes the complexity of an algorithm averaged over all possible inputs. Compared to worstcase analysis, it is more representative of the typical behavior of an algorithm, but remains largely unexplored in optimization. One difficulty is that the analysis can depend on the probability distribution of the inputs to the model. However, we show that almost all instances of highdimensional data are indistinguishable to firstorder algorithms. Particularly for a class of largescale problems, which includes random least squares and onehidden neural networks with random weights, the halting time is independent of the probability distribution. With this barrier for averagecase analysis removed, we provide the first explicit averagecase convergence rates showing a tighter complexity not captured by traditional worstcase analysis. Finally, numerical simulations suggest this universality property holds in greater generality. Joint work with Elliot Paquette, Fabian Pedregosa, and Bart van Merriënboer
 MARK SCHMIDT, University of British Columbia
Faster Algorithms for Deep Learning? [PDF]

The last 10 years have seen a revolution in stochastic gradient methods, with variancereduced methods like SAG/SVRG provably achieving faster convergence rates than all previous methods. These methods give dramatic speedups in a variety of applications, but have had virtually no impact to the practice of training deep models. We hypothesize that this is due to the overparameterized nature of modern deep learning models, where the models are so powerful that they could fit every training example with zero error (at least theoretically). Such overparameterization nullifies the benefits of variancereduced methods, because in some sense it leads to "easier" optimization problems. In this work, we present algorithms specifically designed for overparameterized models. This leads to methods that provably achieve Nesterov acceleration, methods that automatically tune the stepsize as they learn, and methods that achieve superlinear convergence with secondorder information.
 XIAOPING SHI, Thompson Rivers University
Graphbased changepoint test [PDF]

Modeling highdimensional time series is necessary in many fields such as neuroscience, signal processing, network evolution, text analysis, and image analysis. Such a time series may contain unknown multiple changepoints. For example, the time of cell divisions can be accessed using an automatic embryo monitoring system by a timelapse observation. When a cell divides at some time point, the distribution of pixel values in the corresponding frame will change, and hence the detection of cell divisions can be formulated as a multiple changepoint problem. In this talk, we introduce a powerful changepoint test in terms of the shortest Hamiltonian path (SHP).
 TAMON STEPHEN, Simon Fraser University
Minimal Cuts Set and Computing with Monotone Boolean Functions [PDF]

We consider methods of generating Minimal Cut Sets in computational biology. This leads us to monotone Boolean functions, which can be used model phenomena in a variety of large data applications. Computing with monotone Boolean functions is an interesting and worthwhile mathematical challenge. We briefly introduce some of the techniques and issues involved.
 JABED TOMAL AND JAN CIBOROWSKI, Thompson Rivers University
Detection of environmental thresholds by assessing discontinuities in slopes and variances via a Bayesian regression model [PDF]

Coauthor: Jan J.H. Ciborowski, Professor, Biological Sciences, University of Calgary
Abstract: An ecological threshold occurs at a point along an environmental stress gradient at which there is a discontinuous change in the conditional distribution of a biological response. Traditionally, ecological thresholds are estimated from the discontinuities in the central tendency (e.g., slope) using a piecewise linear regression model (PLRM). However, thresholds can also be manifested as changes in the range of natural variation (e.g., conditional variance) for a given level of the environmental stress. In this paper, we defined a Bayesian PLRM by incorporating experts’ knowledge about the relationships between the biological response relative to environmental stress represented via prior distributions. The posterior distributions of the thresholds are obtained by combining the information in the data with experts’ prior knowledge, and optimized via Gibbs sampling and the Metropolis algorithm. We applied our method to two datasets relating an index of the health of marshnesting bird communities to habitat alteration (areal extent of land development adjacent to wetlands) within the Detroit River Area of Concern. Our preliminary analysis identified two potential thresholds  one manifested via the change in slope and the other observed from increased variance across the environmental stress gradient.
 XUEKUI ZHANG, University of Victoria
The Optimal Design of Clinical Trials with Potential Biomarker Effects, A Novel Computational Approach [PDF]

As a future trend of healthcare, personalized medicine tailors medical treatments to individual patients. It requires to identify a subset of patients with the best response to treatment. The subset can be defined by a biomarker (e.g. expression of a gene) and its cutoff value. Topics on subset identification have received massive attention. There are over 2 million hits by keyword searches on Google Scholar. However, designing clinical trials that utilize the discovered uncertain subsets/biomarkers is not trivial and rarely discussed in the literature. This leads to a gap between research results and realworld drug development.
To fill in this gap, we formulate the problem of clinical trial design into an optimization problem involving highdimensional integration, and propose a novel computational solution based on MonteCarlo and smoothing methods. Our method utilizes the modern techniques of GeneralPurpose computing on Graphics Processing Units for largescale parallel computing. Compared to a published method in threedimensional problems, our approach is more accurate and 133 times faster. This advantage increases when dimensionality increases. Our method is scalable to higherdimensional problems since the precision bound of our estimated study power is a finite number not affected by dimensionality.