## SEMINAR

*Our seminar series covers a broad set of topics related to artificial intelligence (AI), machine learning (ML), and statistics. The talks range in scope from applications of AI/ML to tackle hard problems in science and engineering, to ML theory and novel ML techniques, to high-performance computing and new software packages. We aim to bring together AI/ML researchers and domain experts to discuss exciting topics at the intersection of AI/ML and science/engineering.*

Upcoming Talks:

## Phase Space Reconstruction from Accelerator Beam Measurements Using Neural Networks and Differentiable Simulations

**Date: September 30, 2022 1:00 pm Pacific**

Speaker: Ryan Roussel (SLAC)

Characterizing the phase space distribution of particle beams in accelerators is a central part of accelerator understanding and performance optimization. However, conventional reconstruction-based techniques either use simplifying assumptions or require specialized diagnostics to infer high-dimensional (> 2D) beam properties. In this work, we introduce a general-purpose algorithm that combines neural networks with differentiable particle tracking to efficiently reconstruct high-dimensional phase space distributions without using specialized beam diagnostics or beam manipulations. We demonstrate that our algorithm reconstructs detailed 4D phase space distributions with corresponding confidence intervals in both simulation and experiment using a single focusing quadrupole and diagnostic screen. This technique allows for the measurement of multiple correlated phase spaces simultaneously, enabling simplified 6D phase space reconstruction diagnostics in the future.

## Making EdgeML Smaller, Faster and Smarter

**Date: October 7, 2022 1:00 pm Pacific**

Speaker: Audrey Corbeil Therrien (U. Sherbrooke)

## Past Talks:

## A Method for Quantifying Position Reconstruction Uncertainty in Astroparticle Physics using Bayesian Networks

**Date: September 9, 2022 1:00 pm Pacific**

Speaker: Christina Peters (U. Delaware)

In this presentation, I will demonstrate a method for position reconstruction in astroparticle physics using a Bayesian network which enables per-interaction uncertainty quantification. Robust position reconstruction is paramount for enabling rare-event discoveries by dark matter detection experiments, as it allows for focus on interactions occurring only within the central volume of a detector where there are fewer backgrounds. As a proof of concept, I will demonstrate the utility of the method using simulated data based on the XENONnT detector, which is a dual-phase xenon time-projection chamber. In the talk I will introduce Bayesian networks, describe how they are well suited to the problem of position reconstruction, and give an example of how the per-interaction uncertainties can be utilized for anomaly detection as well as within experimental analyses.

## Tokamak Operation Design and Control with Deep Reinforcement Learning in KSTAR

**Date: August 29, 2022 3:00 pm Pacific**

Speaker: Jaemin Seo (Princeton)

## Interdisciplinary Point Cloud Methods for Particle Physics

**Date: July 29, 2022 1:00 pm Pacific**

Speaker: Mariel Pettee (LBL)

Many breakthroughs in machine learning (ML) have emerged from the challenge of how to represent and analyze interesting datasets in their most natural forms. Even highly-specialized data like particle deposits in the ATLAS detector, however, share common characteristics with other data from diverse disciplines. In this talk, I’ll discuss some of my work on ATLAS dealing with various representations of hadronic final states such as taus and pions as sequences, images, and 3D point clouds for ML applications. I’ll also share some of my independent work leading teams of researchers at the forefront of AI-generated choreography as an example of how our field can benefit from looking broadly for inspiration beyond our own specialized disciplines.

## Uncertainty aware learning and a cautionary tale on machine learning theory uncertainties

**Date: June 24, 2022 1:00 pm Pacific**

Speaker: Aishik Ghosh (UC Irvine)

Machine learning models trained on simulated data pick up subtle patterns in high-dimensional feature spaces, some of which are not well-modelled by the simulators. These give rise to systematic uncertainties in physics measurements. The popular solution often discussed for this is to use debiasing techniques to make the model invariant to the source of uncertainty (nuisance parameters). We propose the opposite approach, that is to train a model that is fully aware of uncertainties and their corresponding nuisance parameters, which allows to adapt to their correct values from data at the time of inference. We show that this strategy actually enhances the sensitivity of the final physics measurement. In a second study, we investigate the dangers of using ML to try to mitigate theory uncertainties. Theory uncertainties may arise from our inability to properly simulate certain physics processes (like hadronization) or compute higher order quantum field theory terms. We show that in these cases, debiasing techniques only serve to hide the true bias / uncertainty from the physicist rather than actually reducing them.

## Variable importance and explainable AI

**Date: June 10, 2022 1:00 pm Pacific**

Speaker: Art Owen (Stanford)

In order to explain what a black box algorithm does we can start by studying which variables are important for its decisions. Variable importance is studied by making hypothetical changes to predictor variables. Changing parameters one at a time can produce input combinations that are outliers or very unlikely. They can be physically impossible, or even logically impossible. It is problematic to base an explanation on outputs corresponding to impossible inputs. We introduced the cohort Shapley (CS) measure to avoid this problem, based on Shapley value from cooperative game theory. There are many tradeoffs in picking a variable importance measure, so CS is not the unique reasonable choice. One interesting property of CS is that it can detect `redlining', meaning the impact of a protected variable on an algorithm's output when that algorithm was trained without the protected variable.

This talk is based on recent joint work with Masayoshi Mase and Ben Seilert. The opinions expressed are my own, and not those of Stanford, the National Science Foundation, or Hitachi, Ltd.

## CryoAI: A Generative Approach for Inverse Problem Solving in Single Particle Cryo-EM

**Date: May 13, 2022 1:00 pm Pacific**

Speaker: Axel Levy (Stanford/SLAC)

Cryo-electron microscopy (cryo-EM) has become a tool of fundamental importance in structural biology, helping us understand the basic building blocks of life. The algorithmic challenge of cryo-EM is to jointly estimate the unknown 3D poses and the 3D electron scattering potential of a biomolecule from millions of extremely noisy 2D images. Existing reconstruction algorithms, however, cannot easily keep pace with the rapidly growing size of cryo-EM datasets due to their high computational and memory cost. I will present cryoAI, an ab initio reconstruction algorithm for homogeneous conformations that uses direct gradient-based optimization of particle poses and the electron scattering potential from single-particle cryo-EM data. CryoAI combines a learned encoder that predicts the poses of each particle image with a physics-based decoder to aggregate each particle image into a neural representation of the scattering potential volume.

## The Rubin Observatory's LSST Camera Data Acquisition System

**Date: April 29, 2022 1:00 pm Pacific**

Speaker: J. Gregg Thayer (SLAC)

The Rubin Observatory's LSST Camera will read out its 3.2 Gigapixels every 15 seconds every night for 10 years. The Data Acquisition (DAQ) system provides the path these pixel data take between the sensors and the consumers of those images. It is a highly parallel system including firmware in each of the 71 front end electronics boards, a 14-slot ATCA shelf containing SLAC-built electronics 150m away, and a fully redundant fiber link between them. The ATCA shelf contains an array of over 100 Systems on Chip. These are connected to each other in a modified full mesh 10Gb Ethernet network which client machines connect to via 10GbE SFP+. The DAQ also contains 196TB of SSD storage for caching more than 2 days of image data. This allows continued observations in the event of a failure of network connectivity down the mountain.

## Language models for mathematics

**Date: April 15, 2022 1:00 pm Pacific**

Speaker: Francois Charton (Meta AI)

Transformers, a deep neural network architecture, were originally designed for natural language translation. They can be used for mathematics by considering that solving a problem amounts to 'translating' it into its solution. I present three cases: symbolic (integration), numeric (linear algebra), and hybrid (symbolic regression), describe key learnings and outstanding problems, such as data generation, and out of domain generalization. Finally, I discuss potential applications to theoretical physics.

## Towards Differentiable Programming in High Energy Physics with MadJax differentiable Matrix Elements

**Date: April 8, 2022 1:00 pm Pacific**

Speaker: Michael Kagan (SLAC)

MadJax is a tool for generating and evaluating differentiable matrix elements of high energy particle scattering processes. As such, it is a step towards a differentiable programming paradigm in high energy physics that facilitates the incorporation of high energy physics domain knowledge, encoded in simulation software, into gradient-based learning and optimization pipelines. In this talk, we will discuss the MadJax implementation and show example applications of simulation based inference and normalizing flow based matrix element modeling, with capabilities enabled uniquely with differentiable matrix elements.

## Introduction to Hierarchical Temporal Memory and its application to Anomaly Detection

**Date: March 18, 2022 1:00 pm Pacific**

Speaker: Surya Pathak (Red Hat)

Hierarchical Temporal Memory (HTM) is a biologically inspired machine learning technology that aims to capture the structural and algorithmic properties of the neocortex. It is a continuous learning algorithm derived from neuroscience that models spatial and temporal streaming data. In this talk, I will introduce the concept of HTM along with its application to anomaly detection on real world data.

## Lessons from debugging and building trust in AIs in deployment

**Date: March 4, 2022 1:00 pm Pacific**

Speaker: James Zou (Stanford)

Translating AI from research to deployment is a major challenge and opportunity. I will share some of our experiences in evaluating and debugging AI models in practice. I will also discuss some new approaches that we developed for building more trustworthy AI.

## Second-order physics constrained learning

**Date: February 18, 2022 10:00 am Pacific**

Speaker: Eric Darve (Stanford)

Physics-informed machine learning and inverse modeling require the solution of ill-conditioned non-convex optimization problems. First-order methods, such as SGD and ADAM, and quasi-Newton methods, such as BFGS and L-BFGS, have been applied with some success to optimization problems involving deep neural networks in computational engineering inverse problems. However, empirical evidence shows that convergence and accuracy for these methods remain a challenge. Our study unveiled at least two intrinsic defects of these methods when applied to coupled systems of partial differential equations (PDEs) and deep neural networks (DNNs): (1) convergence is often slow with long plateaus that make it difficult to determine whether the method has converged or not; (2) quasi-Newton methods do not provide a sufficiently accurate approximation of the Hessian matrix; this typically leads to early termination (one of the stopping criteria of the optimizer is satisfied although the achieved error is far from minimal). Based on these observations, we propose to use trust region methods for optimizing coupled systems of PDEs and DNNs. Specifically, we developed an algorithm for second-order physics constrained learning, an efficient technique to calculate Hessian matrices based on computational graphs. We show that trust region methods overcome many of the defects and exhibit remarkable fast convergence and superior accuracy compared to ADAM, BFGS, and L-BFGS.

## Self-supervised Scene Representation Learning

**Date: February 4, 2022 1:00 pm Pacific**

Speaker: Vincent Sitzmann (MIT)

Given only a single picture, people are capable of inferring a mental representation that encodes rich information about the underlying 3D scene. We acquire this skill not through massive labeled datasets of 3D scenes, but through self-supervised observation and interaction. Building machines that can infer similarly rich neural scene representations is critical if they are to one day parallel people’s ability to understand, navigate, and interact with their surroundings. This poses a unique set of challenges that sets neural scene representations apart from conventional representations of 3D scenes: Rendering and processing operations need to be differentiable, and the type of information they encode is unknown a priori, requiring them to be extraordinarily flexible. At the same time, training them without ground-truth 3D supervision is a highly underdetermined problem, highlighting the need for structure and inductive biases without which models converge to spurious explanations.

## The Learnt Geometry of Collider Events

**Date: January 28, 2022 1:00 pm Pacific**

Speaker: Jack Collins (SLAC)

Particle collider events, when imbued with a metric which characterizes the 'distance' between two events (such as an Earth Movers Distance), can be thought of as populating a data manifold in a metric space. The geometric properties of this manifold reflect the physics encoded in the distance metric. I will show how the geometry of collider events can be probed at varying scales of interest using a class of machine learning architectures called Variational Autoencoders. I will introduce notions of scaling dimensionality of representations learnt by the VAE that I believe are novel, and which reflect and quantify the underlying complexity of the training dataset. If there is time, I will also describe two potentially novel approaches to unsupervised classification that are inspired by these notions of dimensionality.

## Fast AI at the edge for particle physics

**Date: January 21, 2022 1:00 pm Pacific**

Speaker: Jennifer Ngadiuba (FNAL)

The Large Hadron Collider at CERN provides up to 200 proton-proton interactions every 25 ns leading to the production of thousands of charged and neutral particles per second passing through the detector volume. As the detectors consist of hundreds of millions of sensors to record the passage of each particle, the experiments at the LHC have to deal with extreme data rates of hundreds of TB per second. To bring down these rates to manageable levels for offline processing and storage, the experiments implement a trigger system that analyze and accept collision events in real-time. There is a fundamental challenge in doing so due to the very strict latency and amount of resources available to perform such analysis. Therefore, in order to preserve most of the interesting physics, basic algorithms are executed on Field-Programmable Gate Arrays (FPGA). Most recently we have started exploring and developing new AI techniques to replace these rules with an advanced analysis that could offer enhanced accuracy while meeting such strict system constraints. In this talk, I will discuss the recent developments in the field of fast AI to achieve such goal, focusing on the application to LHC experiments where the "big data" environment is among the most challenging in HEP.

## Reconstructing the Subhalo Mass Function from Strong Gravitational Lensing using Simulation-Based Inference

**Date: November 19, 2021 1:00 pm Pacific**

Speaker: Sebastian Wagner-Carena (Stanford)

Constraining the distribution of small-scale structure in our universe will allow us to probe alternatives to the cold dark matter (CDM) paradigm. Strong gravitational lensing offers a unique window into small dark matter halos because these halos impart a gravitational lensing signal even if they do not host luminous galaxies. However, the millions of free parameters in gravitational lensing by a substructure population makes directly evaluating the likelihood intractable. In this talk, I will present our group’s work using simulation-based inference techniques to return posterior estimates of the distribution of subhalos inside galaxy-mass host halos. We combine a hierarchical inference approach with some of the tools used in sequential neural posterior estimation to reliably infer the subhalo mass function across a variety of configurations. We find that our technique scales efficiently to large lens populations; with 10 strong gravitational lenses we forecast a constraining power competitive with current flux ratio statistics, and with 100 lenses we find that our technique returns sensitivities comparable with current Milky Way satellite constraints. In the 1000 lens regime accessible by future surveys, we demonstrate an unprecedented constraining power on the subhalo mass function. Our work reveals the potential of strong lensing imaging to probe dark matter at small scales.

## “All the Lenses”: Toward Large-Scale Hierarchical Inference of the Hubble Constant Using Bayesian Deep Learning

**Date: October 29, 2021 1:00 pm Pacific**

Speaker: Ji Won Park (Stanford)

Precise constraints on the Hubble constant (H0) can shed light on the nature of dark matter and dark energy, arguably the biggest mysteries of modern cosmology. An astrophysical phenomenon known as strong gravitational lensing enables direct measurements of H0. Seven strong gravitational lenses have been “hand-analyzed” over the last ten years, but next-generation telescope surveys will increase the sample size to tens of thousands of lenses, creating a demand for novel methods that can model large volumes of noisy data. I demonstrate the use of Bayesian neural networks (BNNs) in rapidly extracting cosmological information from the image, catalog, and time series data associated with these lenses. Quantifying various sources of uncertainty is key to minimizing systematic bias on H0. Being both accurate and efficient, the BNN pipeline is a promising tool that can combine information from all the lenses -- with varying types and signal-to-noise ratios -- into a large-scale hierarchical Bayesian model.

## Black-box optimisation with Local Generative Surrogates and its application in the SHiP experiment

**Date: October 22, 2021 10:00 am Pacific**

Speaker: Sergey Shirobokov (Twitter)

We propose a novel method for gradient-based optimisation of black-box simulators using local surrogate models (https://arxiv.org/abs/2002.04632). In domains such as HEP, many processes are modeled with non-differentiable simulators (such as GEANT4). However, often one wants to optimise some parameters of the detector or other apparatus relying on the knowledge from the simulator. To address such cases, we utilise deep generative models to approximate a simulator in the local neighbourhood and perform optimisation. In cases when the optimised parameter space is constrained to a low dimension sub-space, we observe that our method outperforms Bayesian optimisation, numerical optimisation, and REINFORCE-based approaches.

## Vector Symbolic Architectures for Autonomous Science

**Date: October 8, 2021 1:00 pm Pacific**

Speaker: Michael Furlong (University of Waterloo)

Abstract: Automating exploration often involves information theoretic cost functions which can be expensive to compute. Planetary missions are constrained by size, weight, and power concerns, as well as environmental conditions, that limit the type and amount of computing that can be deployed on these missions.

Neuromorphic computing promises to reduce power requirements needed for deploying high-performance computing, enabling constrained systems to be more capable, but they can be challenging to program. Vector Symbolic Architectures, originally developed in the context of cognitive modelling, have proven useful as a paradigm for programming these computers.

In this talk we will be discussing how a particular Vector Symbolic Architecture can be used to efficiently execute two tasks commonly found in autonomous science applications: anomaly detection and Bayesian optimization. We will show how these algorithms can be computed with time and memory complexity that is constant in the number of observations collected, making them favourable algorithms for long-term operations in resource constrained computing environments.

## Bayesian Techniques for Accelerator Characterization and Control

**Date: October 1, 2021 1:00 pm Pacific**

Speaker: Ryan Roussel (SLAC National Accelerator Laboratory)

Abstract: Accelerators and other large experimental facilities are complex, noisy systems that are difficult to characterize and control efficiently. Bayesian statistical modeling techniques are well suited to this task, as they minimize the number of experimental measurements needed to create robust models, by incorporating prior, but not necessarily exact, information about the target system. Furthermore, these models inherently consider noisy and/or uncertain measurements and can react to time-varying systems. Here we will describe several advanced methods for using these models in accelerator characterization and optimization. First, we describe a method for rapid, turn-key exploration of input parameter spaces using little-to-no prior information about the target system. Second, we highlight how these models can take hysteresis effects into account and create in-situ models of individual magnetic elements.

**Computational Imaging: Reconciling Physical and Learned Models**

**Date: July 2, 2021 1:00 pm**

Speaker: Ulugbek Kamilov (Washington University in St. Louis)

Abstract: Computational imaging is a rapidly growing area that seeks to enhance the capabilities of imaging instruments by viewing imaging as an inverse problem. There are currently two distinct approaches for designing computational imaging methods: model-based and learning-based. Model-based methods leverage analytical signal properties and often come with theoretical guarantees and insights. Learning-based methods leverage data-driven representations for best empirical performance through training on large datasets. This talk presents Regularization by Artifact Removal (RARE), as a framework for reconciling both viewpoints by providing a learning-based extension to the classical theory. RARE relies on pre-trained “artifact-removing deep neural nets” for infusing learned prior knowledge into an inverse problem, while maintaining a clear separation between the prior and physics-based acquisition model. Our results indicate that RARE can achieve state-of-the-art performance in different computational imaging tasks, while also being amenable to rigorous theoretical analysis. We will focus on the applications of RARE in biomedical imaging, including magnetic resonance and tomographic imaging.

This talk will be based on the following references:

- J. Liu, Y. Sun, C. Eldeniz, W. Gan, H. An, and U. S. Kamilov, “RARE: Image Reconstruction using Deep Priors Learned without Ground Truth,” IEEE J. Sel. Topics Signal Process., vol. 14, no. 6, pp. 1088-1099, October 2020.
- Z. Wu, Y. Sun, A. Matlock, J. Liu, L. Tian, and U. S. Kamilov, “SIMBA: Scalable Inversion in Optical Tomography using Deep Denoising Priors,” IEEE J. Sel. Topics Signal Process., vol. 14, no. 6, pp. 1163-1175, October 2020.
- J. Liu, Y. Sun, W. Gan, X. Xu, B. Wohlberg, and U. S. Kamilov, “SGD-Net: Efficient Model-Based Deep Learning with Theoretical Guarantees,” IEEE Trans. Comput. Imag., vol. 7, pp. 598-610, June 2021.

## Deep Learning for Anomaly Detection

**Date: June 25, 2021 1:00 pm**

Speaker: Ziyi Yang (Stanford)

Abstract: Anomaly Detection (AD) refers to the process of identifying abnormal observations that deviate from what is defined as normal. With applications in many real-world scenarios, anomaly detection has become an important research field in ML and AI. However, detecting anomalies in high-dimensional space is challenging. In some high-dimensional cases, previous AD algorithms fail to correctly model the normal data distribution. Also the understanding on the detection mechanism of AD models remained limited. To address these challenges and questions, in this talk, first I will present the Regularized Cycle-consistent GAN (RCGAN) that introduces a penalty distribution in the modeling of normal data distribution. We theoretically show that the penalty distribution regularizes the discriminator and generator towards the normal data manifold. Second, we explore anomaly detection with domain adaptation where the normal data distribution is non-static. We propose to extract the common features of source and target domain data and train an anomaly detector using the extracted features.

Slides and video.

## Machine-Learning for Modeling Complex Materials and Media

**Date: June 18, 2021 1:00 pm**

Speaker: Serveh Kamrava (USC)

Abstract: In recent years, machine learning (ML) approaches have made it possible to extract and explore intricate patterns from big data. One of the fields that can benefit from the computational advantages that ML offers is materials characterization where we have complex heterogeneous morphology. The morphology of complex systems is one of the determinant elements that control a variety of their properties, such as flow, transport, and mechanical behaviors. Such properties are often estimated using experimental and computational methods, which can be very costly and time-demanding. As such, faster and more automatic methods are required. Machine learning provides an alternative solution for this problem. In this presentation, I will present a deep learning method that can take the 3D morphology of complex materials and estimate their transport properties. Then, I will talk about a novel method using which one can quantify the accuracy of augmentation methods for adding more data to ML and identify the method that can provide the best set of data by minimizing the discrepancy and expanding the variability. For the next topic, I will discuss the application of deep learning for dynamic data when they change with time for a transport problem on a complex membrane system. I close this particular topic by describing how the governing equations can be used in ML for filling the gap in data and reducing the amount of data for ML. These results will be compared with a fully data-driven ML method.

## Autonomous analysis of synchrotron X-ray experiments with applications to metal nanoparticle synthesis

**Date: May 7, 2021 1:00 pm**

Speaker: Sathya Chitturi (Stanford)

Abstract: A critical step in developing autonomous pipelines for materials synthesis experiments is automatic interpretation of characterization experiments. In this talk, we present an example of a closed-loop bayesian optimization pipeline for metal nanoparticle synthesis using real-time information from Small-angle X-ray Scattering (SAXS) experiments. This approach has previously successfully created libraries of monodisperse Pd nanoparticles with user-specified sizes. In addition, we describe a CNN-based method used to interpret complementary X-ray diffraction data. Here CNN regression models are trained for each crystal class to predict lattice parameters for the corresponding unit-cell. A key component of this work involves data augmentation schemes which capture sources of experimental noise in order to improve model generalizability. The lattice parameter estimates are subsequently refined using an automatic whole-pattern fitting algorithm

## Going Beyond Global Optima with Bayesian Algorithm Execution

**Date: April 30, 2021 1:00 pm**

Speaker: Willie Neiswanger

In many real world problems, we want to infer some property of an expensive black-box function f, given a budget of T function evaluations. One example is budget constrained global optimization of f, for which Bayesian optimization is a popular method. Other properties of interest include local optima, level sets, integrals, or graph-structured information induced by f. Often, we can find an algorithm A to compute the desired property, but it may require far more than T queries to execute. Given such an A, and a prior distribution over f, we refer to the problem of inferring the output of A using T evaluations as Bayesian Algorithm Execution (BAX). In this talk, we present a procedure for this task, InfoBAX, that sequentially chooses queries that maximize mutual information with respect to the algorithm's output. Applying this to Dijkstra's algorithm, for instance, we infer shortest paths in synthetic and real-world graphs with black-box edge costs. Using evolution strategies, we yield variants of Bayesian optimization that target local, rather than global, optima. We discuss InfoBAX, and give background on other information-based methods for Bayesian optimization as well as on the probabilistic uncertainty models which underlie these methods.

## Signal Decomposition via Distributed Optimization

**Date: April 23, 2021 1:00 pm**

Speaker: Bennet Meyers (Stanford/SLAC)

We consider the well-studied problem of decomposing a time series signal into some components, each with different characteristics. We propose a simple and general framework for decomposition of a signal into a number of signal classes, each defined by a loss function and possibly constraints, via optimization. We describe a number of useful signal classes, and give a distributed optimization method for computing the decomposition, that scales well and is extensible. The method finds the optimal decomposition when the signal class constraints and loss functions are convex, and appears to be a good heuristic when they are not.

## Equitable Valuation of Data

**Date: April 16, 2021 1:00 pm**

Speaker: Amirata Ghorbani

As data becomes the fuel driving technological and economic growth, a fundamental challenge is how to quantify the value of data in algorithmic predictions and decisions. For example, in healthcare and consumer markets, it has been suggested that individuals should be compensated for the data that they generate, but it is not clear what is an equitable valuation for individual data. In this talk, we discuss a principled framework to address data valuation in the context of supervised machine learning. Given a learning algorithm trained on a number of data points to produce a predictor, we propose data Shapley as a metric to quantify the value of each training datum to the predictor performance. Data Shapley value uniquely satisfies several natural properties of equitable data valuation. We introduce Monte Carlo and gradient-based methods to efficiently estimate data Shapley values in practical settings where complex learning algorithms, including neural networks, are trained on large datasets. We then briefly discuss the notion distributional Shapley, where the value of a point is defined in the context of underlying data distribution

## LassoNet: A Neural Network with Feature Sparsity

**Date: April 2, 2021 1:00 pm**

Speaker: Ismael Lemhadri (Stanford)

Much work has been done recently to make neural networks more interpretable, and one approach is to arrange for the network to use only a subset of the available features. In linear models, Lasso (or L1-regularized) regression assigns zero weights to the most irrelevant or redundant features, and is widely used in data science. However the Lasso only applies to linear models. Here we introduce LassoNet, a neural network framework with global feature selection. Our approach enforces a hierarchy: specifically a feature can participate in a hidden unit only if its linear representative is active. Unlike other approaches to feature selection for neural nets, our method uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly. As a result, it delivers an entire regularization path of solutions with a range of feature sparsity. On systematic experiments, LassoNet significantly outperforms state-of-the-art methods for feature selection and regression. The LassoNet method uses projected proximal gradient descent, and generalizes directly to deep networks. It can be implemented by adding just a few lines of code to a standard neural network.

## Machine Learning for Big Data Cosmology and High Energy Physics

**Date: February 23, 2021 1:00 pm**

Speaker: Agnes Ferte

In the context of future galaxy surveys such as the Legacy Survey of Space and Time (LSST), I proposed an application of unsupervised learning algorithms such as Self-Organizing Maps to efficiently explore the theory space of cosmological models. In the first part of my talk, I will explain the challenges motivating this research and present our first results aiming at categorizing theories of gravity probed by weak gravitational lensing, one of the main cosmological observables that will be measured by LSST. Many experiments of the FPD at SLAC present computational challenges such as data reduction on the fly or physics simulations that require similar machine learning applications and developments. In the second part of my talk, I will present how I will expand the use of unsupervised learning algorithms to other areas at the FPD and contribute to the application of machine learning to LSST, other cosmology experiments and high energy physics experiments.

## Beyond Deep Learning in Fundamental Physics

**Date: February 16, 2021 1:00 pm**

Speaker: Lukas Heinrich

The experiments at the Large Hadron Collider (LHC) are testament to the success of the reductionist approach to science: the analytical modelling of the 100 million data channels of HEP is patently hard but through a deep, hierarchical stack of simulation across many length and energy-scales and a physics-driven, expert-designed dimensionality reduction procedure, inference on the fundamental parameters of quantum field theory is achievable. In recent years, advancements in Machine Learning techniques have provided physicists promising new tools to analyze the LHC data. To exploit them fundamental questions need to be addressed: How do we formulate ML optimization goals to align with our science goals? How can we translate known constraints in the data into appropriate inductive biases of the trained algorithms? Can we express and incorporate uncertainties and maintain interpretability to achieve safe inference? In light of these challenges I will discuss in this talk recent progress i end-to-end gradient-based optimization, Active Learning, simulator-assisted probabilistic programming.

## Machine Learning for Dark Matter

**Date: February 12, 2021 1:00 pm**

Speaker: Bryan Ostdiek (Harvard)

There is five times more dark matter than ordinary matter in the universe, but we have almost no idea what it is. To learn about the possible interactions of dark matter, physicists use complementary data from cosmological probes, astroparticle observations, and particle colliders. There is an increasing need for advanced analytics and machine learning to process these vastly growing datasets. This talk details examples using machine learning in each of the three realms. First, I demonstrate using image recognition techniques on images of strongly lensed galaxies to constrain dark matter properties. Second, I use machine learning to uncover the phase space distribution of dark matter near the Earth, which directly impacts the interpretation of direct detection experiments. Finally, I examine how unsupervised learning methods can aid collider searches for dark matter. The talk concludes with comments on the intersection of machine learning and physics.

## Searching for dark matter in the sky with machine learning

**Date: February 9, 2021 1:00 pm**

Speaker: Siddharth Mishra Sharma (NYU)

The next decade will see a deluge of new cosmological data that will enable us to accurately map out the distribution of matter in the local Universe, image billion of stars and galaxies to unprecedented precision, and create high-resolution maps of the Milky Way. Signatures of new physics may be hiding in these observations, offering significant discovery potential for uncovering physics beyond the Standard Model, in particular the nature of dark matter. At the same time, the complexity of astrophysical data provides significant challenges to carrying out these searches using conventional methods. I will describe how overcoming these issues will require a qualitative shift in how we approach modeling and inference in cosmology, connecting particle physics properties to cosmological observables and bringing together several recent advances in machine learning and simulation-based inference. I will present several applications of these methods. I will show how they can be used to combine information from tens of thousands of strong gravitational lensing systems in order to infer structural properties of our Universe that can be directly linked to the microphysical properties of dark matter. Finally, I will present an application to the long-standing problem of understanding the nature of the Galactic Center gamma-ray excess, highlighting challenges associated with analyzing real data and discussing ways to overcome them.

Slides are available for those who have Stanford account. Video is available with a password upon request (contact Kazuhiro Terao).

## Quantum Kernel Methods for the Classification of High-dimensional Data on a Superconducting Processor

**Date: December 11, 2020 1:00 pm**

Speaker: Evan Peters (Fermilab, University of Waterloo IQC)

We present a quantum kernel method for high-dimensional data analysis using the Google Sycamore superconducting quantum computer architecture. Our experiment utilizes the largest number of qubits to date compared to prior quantum kernel method experiments. We study an application in the domain of cosmology - a benchmark supernova type classification problem using 67 features with no dimensionality reduction and without vanishing kernel elements. While most experimental work to date has considered synthetic datasets of low dimension, and disregarded the importance of shot statistics and mean kernel element size, we show that the analysis of real, high dimensional datasets requires careful attention to these features when constructing a circuit ansatz.

## Online Bayesian Optimization for the SECAR Recoil Mass Separator

**Date: December 11, 2020 11:00 am**

Speaker: Sara Miskovich (Michigan State University)

The SEparator for CApture Reactions (SECAR) is a next-generation recoil separator system under commissioning at the National Superconducting Cyclotron Laboratory (NSCL) and Facility for Rare Isotope Beams (FRIB) at Michigan State University. SECAR is optimized for the direct measurement of capture reactions on unstable nuclei that drive some stars to explode and synthesize crucial nuclei that make up our universe. Once SECAR is operational, these precise measurements will improve our understanding of astrophysical processes such as X-ray bursts, novae and supernovae. To maximize the performance of the device, ion optical optimizations and careful beam alignment need to be achieved, which can be time consuming and difficult to achieve through manual tuning. This talk will focus on the first development of an online Bayesian optimization that utilizes a Gaussian process model to tune the beam through the complex system and improve its ion optical properties by optimizing magnet settings. The method is shown to improve recoil separator performance and save operational time for future scientific experiments.

Machine Learning with Quantum Computers

**Date: December 4, 2020 10:00 am**

Speaker: Maria Schuld (Xanadu, University of KwaZulu-Natal)

A growing number of papers are searching for intersections between High Energy Physics and the emerging field of Quantum Machine Learning. This talk gives an introduction to the latter, while critically discussing potential connections to HEP. A focus lies on the most popular approach to machine learning with quantum computers, which interprets quantum circuits as machine learning models that load input data and produce predictions. By optimizing the quantum circuit, the "quantum model" can be trained like a neural network. To offer a glimpse of the opportunities and challenges of this approach, I will discuss different aspects of such "variational quantum machine learning algorithms", including their close links to kernel methods and integration into modern machine learning pipelines.

## Reservoir computing using digital logic gate networks

**Date: November 20, 2020 11:00 am**

Speaker: Heidi Komkov (The Institute for Research in Electronics and Applied Physics, University of Maryland)

As Moore's law is coming to an end, new types of computing architectures must be explored to continue the pace of advancement in computing power. At the same time, applications of machine learning are exploding. Reservoir computing is a brain-inspired machine learning method which has shown promise for very rapid time series prediction. The reservoir functions as a recurrent neural network, and substituting a physical system for a computer-based simulation has the potential to allow computation at high speed and very low power. We use an autonomous Boolean network as a reservoir, which uses individual CMOS digital logic gates to implement the nonlinear elements used in machine learning architectures. In this talk I'll show results from an field programmable gate array (FPGA) reservoir and my designs of a 180nm application specific integrated circuit (ASIC) that has been fabricated this year.

## Power efficient hardware accelerators for machine learning, combinatorial optimization, and pattern matching applications

**Date: November 13, 2020 11:00 am**

Speaker: Cat Graves (Hewlett Packard Labs)

The dramatic rise of data-intensive workloads has revived special-purpose hardware and architectures for continuing improvements in computational speed and energy efficiency. While traditional CMOS ASICs deliver some performance gains, typically by limiting data movement or implementing “in-memory computation”, such approaches still suffer from low power efficiency. New proposals leveraging emerging non-volatile resistive RAM (ReRAM) devices for in-memory computation are highly attractive in a variety of application domains. While originally developed for as digital (binary) high density non-volatile memories, ReRAM devices have demonstrated a wide range of behaviors and properties – such as a wide range of tunable analog resistance and non-linear dynamics – which motivate their use in novel functions and new computational models. Many recent in-memory compute studies have focused on crossbar circuit architectures, demonstrating their application for neural networks, scientific computing and signal processing. However, other circuit primitives – such as content addressable memories (CAMs) and combined systems such as crossbar arrays and non-linear elements– have shown further promise for mapping a diverse range of complimentary computational models such as finite state machines, pattern matching, hashing algorithms and Hopfield neural networks for tackling optimization problems. In this talk, I will review the exciting opportunities for in-memory computational primitives levering non-volatile ReRAM devices and their circuits and architectures for enabling low power, high-throughput computation in a variety of application domains. Recent lab demonstrations of various applications mapped to these in-memory computational circuit primitives based on memristor devices will be shown and I will also give an outlook on performance.

**Generative Models and Symmetries **

**Date: November 5, 2020 10:00 am**

Speaker: Danilo Rezende (Google DeepMind)

The study of symmetries in Physics has revolutionized our understanding of the world. Inspired by this, I will focus on our recent work on incorporating Gauge symmetries into normalizing flow generative models and its potential applications in the sciences and ML.

**Multi-Objective Bayesian Optimization for Accelerator Tuning **

**Date: October 30, 2020 1:00 pm**

Speaker: Ryan Roussell (University of Chicago)

Particle accelerators require constant tuning during operation to meet beam quality, total charge and particle energy requirements for use in a wide variety of physics, chemistry and biology experiments. Maximizing the performance of an accelerator facility often necessitates multi-objective optimization, where operators must balance trade-offs between multiple objectives simultaneously, often using limited, temporally expensive beam observations. Usually, accelerator optimization problems are solved offline, prior to actual operation, with advanced beamline simulations and parallelized optimization methods (NSGA-II, Swarm Optimization). Unfortunately, it is not feasible to use these methods for online multi-objective optimization, since beam measurements can only be done in a serial fashion, and these optimization methods require a large number of measurements to converge to a useful solution. Here, we introduce a multi-objective Bayesian optimization scheme, which finds the full Pareto front of an accelerator optimization problem efficiently in a serialized manner and is thus a critical step towards practical online multi-objective optimization in accelerators. This method uses a set of Gaussian process surrogate models, along with a multi-objective acquisition function, which reduces the number of observations needed to converge by at least an order of magnitude over current methods. We demonstrate how this method can be modified to specifically solve optimization challenges posed by the tuning of accelerators. This includes the addition of optimization constraints, objective preferences and costs related to changing accelerator parameters.

**Machine Learning Techniques for Optics Measurements and Corrections**

**Date: October 28, 2020 8:00 am**

Speaker: Elena Fol (CERN)

Recently, the application of ML has grown in accelerator physics, in particular in the domain of diagnostics and control. One of the first applications of ML at the LHC is focused on optics measurements and corrections. Unsupervised Learning has been applied to automatic detection of beam position monitors faults to improve optics analysis, demonstrating successful results in operation. A novel ML-based approach for the estimation of magnet errors is developed, using supervised regression models trained on a large set of LHC optics simulations. Also, autoencoder neural networks have found their application in denoising of measurements data and reconstruction of missing data points. The results and future plans for these studies will be discussed following a brief introduction to relevant ML concepts.

## Superconducting Radio-Frequency Cavity Fault Classification Using Machine Learning at Jefferson Laboratory

**Date: October 23, 2020 1:00 pm **

Speaker: Christopher Tennant (Jefferson Laboratory)

We report on the development of machine learning models for classifying C100 superconducting radio-frequency (SRF) cavity faults in the Continuous Electron Beam Accelerator Facility (CEBAF) at Jefferson Lab. CEBAF is a continuous-wave recirculating linac utilizing 418 SRF cavities to accelerate electrons up to 12 GeV through 5-passes. Of these, 96 cavities (12 cryomodules) are designed with a digital low-level RF system configured such that a cavity fault triggers waveform recordings of 17 RF signals for each of the 8 cavities in the cryomodule. Subject matter experts (SME) are able to analyze the collected time-series data and identify which of the eight cavities faulted first and classify the type of fault. This information is used to find trends and strategically deploy mitigations to problematic cryomodules. However manually labeling the data is laborious and time-consuming. By leveraging machine learning, near real-time – rather than post-mortem – identification of the offending cavity and classification of the fault type has been implemented. We discuss the development and performance of the ML models as well as valuable lessons learned in bringing a ML system to deployment.

## Analytical and Parametric Model Fitting for Inverse Problems, Data Reduction, and Pattern Recognition

**Date: October 21, 2020 8:00 am**

Speaker: Youssef Nashed (ANL, Stats Perform)

Many scientific and engineering challenges can be formulated as fitting a model to existing data. Whether it is comparing a scientific simulation to known experimental observations, finding a continuous representation of sparse/discrete data points, or the values of model parameters which generalize to unforeseen data examples given historical data; all these tasks share a common underlying principle of model fitting, but with different choices made in the model formulation (parametric or analytical) and the assumptions made about the data (acquisition scheme, noise to signal ratio, continuity, or information locality). In this talk I will highlight a few use cases under this framework. Specifically, I will address research conducted at Argonne National Laboratory for X-ray image reconstruction problems, data reduction for scientific simulations, and deep learning approaches for replacing expensive iterative optimization. Additionally, I will present more recent work for sports computer vision applications that enable real time player detection, tracking, and activity prediction from broadcast video.

## Deep Learning and Quantum Gravity

**Date: October 15, 2020 4:00 pm**

Speaker: Koji Hashimoto (Osaka University)

Formulating quantum gravity is one of the final goals of fundamental physics. Recent progress in string theory brought a concrete formulation called AdS/CFT correspondence, in which a gravitational spacetime emerges from lower-dimensional non gravitational quantum systems, but we still lack in understanding how the correspondence works. I discuss similarities between the quantum gravity and deep learning architecture, by regarding the neural network as a discretized spacetime. In particular, the questions such as, when, why and how a neural network can be a space or a spacetime, may lead to a novel way to look at machine learning. I implement concretely the AdS/CFT framework into a deep learning architecture, and show the emergence of a curved spacetime as a neural network, from a given training data of quantum systems.

## Bayesian Optimization and Machine Learning for Accelerating Scientific Discovery

**Date: October 9, 2020 1:00 pm**

Speaker: Stefano Ermon (Stanford)

Applications of AI in the physical sciences require new advances in representing, reasoning about, and acquiring knowledge from data and domain expertise. Motivated by these challenges, I will present new approaches for calibrating ML systems so that predicted probabilities are more reflective of real-world uncertainty, i.e., better capture what is or isn't known by the system. I will discuss approaches to automatically acquire data to reduce uncertainty through maximally informative experiments, focusing on the design of charging protocols for electric batteries and other challenging problems in science and engineering. Finally, I will discuss opportunities for incorporating domain knowledge to further accelerate the process.

video

## Physics-informed machine learning for accelerated modeling and optimization of complex systems

**Date: October 2, 2020 1:00 pm**

Speaker: Paris Perdikaris (University of Pennsylvania)

The towering empirical success of machine learning is promising a pathway for transforming observations to actionable knowledge. Specific to modeling and optimizing complex physical and engineering systems, there is a need for methods that can seamlessly synthesize data of variable fidelity, leverage prior domain knowledge, respect the laws of physics, and provide robust predictions with quantified uncertainty. In this talk I will provide an overview of data-driven techniques that aim to address these needs, and highlight their advantages and limitations through the lens of different application studies. Specifically, we will discuss the effectiveness of Gaussian processes in integrating multi-fidelity data to accelerate the prediction of large scale computational models, as well as the potential of physics-informed deep learning models in tackling a diverse range of forward and inverse problems in computational physics. Finally, I will also discuss the role of predictive uncertainty in closing the observations-to-predictions loop as a proxy for judicious data acquisition and experimental design.

video

*More past talks can be accessed here.*

*More past talks can be accessed here.*