The LINAC Coherent Light Source (LCLS) is the world's first hard x-ray laser. A $1bn instrument, LCLS produces light that is bright enough (1012 photons), fast enough (~20 fs pulses), and of the right wavelength (~1 Å) to image chemistry, biomolecules, plasmas, and materials. Check out the LCLS youtube feed for some highlights of the science done at LCLS.

Typical experiments at LCLS are conduced over the course of 5 days by outside user groups. Users arrive and work together in intense 12-hour shifts to make their one-off experiments work. Two new experiments every week are typical. During a single experiment, up to order 100 TB of raw data will be collected in the form of images, vectors, and scalars. Making sense of all that data is a major challenge. The SLAC ML initiative aims to deploy the latest ML technology to help maximize the science potential of LCLS.

Compounding this challenge, SLAC is currently building a $1bn+ upgrade to LCLS, known as LCLS-II. The current LCLS operates at 120 Hz. LCLS-II will operate at up to 1 MHz, producing TB of data per second at the maximum anticipated rates. It's not possible to save all that data to disk, so SLAC needs to come up with data analysis methods to deal with this new paradigm: to reduce the data to a managable quantity, automatically when possible, and in real time so we can know if our experiments are actually working.

To answer these challenges, the ML initiative is pursuing a number of ongoing research and development efforts:

Dealing with a Data Firehose: Data Reduction

At LCLS-II data rates, the transfer of raw data is unreasonable if not impossible. The development of ultra-low latency, high-throughput Edge Machine Learning (EdgeML) for use in the LCLS-II Data Reduction Pipeline (DRP) is focused on deploying ML models to FPGAs that are near or ideally directly inside detector electronics.  These models will analyze the continuous stream of data, discarding useless data and categorizing the rest for downstream processes. We are using two internally driven R&D projects as exemplars of the EdgeML paradigm: the 2d-TimeTool project which is targeting a spectrogram x-ray/optical relative delay measurement with sub-fs precision and the CookieBox angle resolved electron detector project which is targeting the attosecond scale recovery of x-ray pulse shapes as well as angle resolved photo-electron and Auger electron spectroscopy.  Both use case examples are targeting data ingestion rates in the 50 GB/s - 1 TB/s range.

Anomaly Detection: Finding the Interesting Needle in the Haystack

In the high data velocity environment of LCLS-II, it's only possible to save a small set of data. Data are chosen based on how we expect the data to look -- but what if those assumptions are wrong? The most interesting scientific results are often completely unexpected -- we can't just look for what we expect from an experiment. Therefore, the ML group in partnership with LCLS and outside collaborators are developing various anomaly detection systems that will run online. Anomalous data will be saved to disc during running experiments for immediate inspection. Whether they capture interesting science or something that went wrong, experimenters at LCLS will be able to take immediate action.

CSPAD/protein image courtesy of Dr. Chun Yoon

Automated Crystallography: Learning from Past Data

Protein crystallography reveals the atomic scale of biology. At LCLS, we study not only how proteins -- biology's nano-scale machines -- are built, but also how they move. To do this, researchers obtain diffraction images with many peaks of various shapes and sizes. Lead by Chuck Yoon at LCLS, researchers are training neural networks on previously measured protein diffraction images so new ones can be automatically recognized and analyzed.

Photon Finding: Is that One Photon or Two?

Atom-scale dynamics in condensed matter underly many phenomena of interest: superconductivity, phase transitions, defect formation. Powerful techniques for measuring these dynamics across many decades (from fs to seconds) are therefore in great demand. One such technique is x-ray speckle correlation spectroscopy, XPCS. To perform XPCS measurements at the fastest timescales and on the smallest samples, we need to count every photon scattered by the sample. SLAC researchers are currently using ML methods to extract every photon from x-ray diffraction images to push the boundaries of what is possible in XPCS and other photon hungry experiments.

XFEL Ghost Imaging

In conventional imaging, light falling on an object produces a two-dimensional image on a detector – whether the back of your eye, the megapixel sensor in your cell phone or an advanced X-ray detector. Ghost imaging, on the other hand, constructs an image by analyzing how random patterns of light shining onto the object affect the total amount of light coming off the object. Using ghost imaging, we leverage the idea that measurement is easier than control, to reach sub-femtosecond pump probe times, ultrahigh spectral resolutions, or simpler data taking modes.