Slide Outline for AI/ML+Physics Part 3: Designing an Architecture

Architectures

What is Physics?

Case Study: Pendulum

SINDy + AUTOENCODER

Defining a Function Space

Turbulence Modelling: Galilean Invariance

ResNets

UNets

Physics Informed Neural Networks

Lagrangian Neural Networks

Deep Operator Networks

Fourier Neural Operators

Graph Neural Networks

Invariance and Equivariance

Preamble

PHYSICS INFORMED MACHINE LEARNING
PHYSICAL MODELS FROM DATA VIA OPTIMIZATION
1. DECIDE ON PROBLEM
(What are we modeling?)
2. CURATE DATA
(WHAT DATA WILL INFORM THE MODEL?)
3. DESIGNAN ARCHITECTURE
(RNN, AUTOENCODER, DMD, SINDY?)
4. CRAFT A Loss FUNCTION
(WHAT MODELS ARE "GOOD"?)
5. EMPLOY OPTIMIZATION
(WHAT ALGORITHMS TO TRAIN MODEL?)

00:01 - 00:21

Physics Informed Machine Learning involves building models from data that either have a physical basis or are used to discover new physics or incorporate physics into the machine learning process. Today, we will focus on stage three, which involves designing and architecture.

Designing and architecture is a crucial part of the machine learning pipeline. There are various architectures that can be used to discover physics and embed physics into machine learning. One popular area in physics informed machine learning is the neural network zoo, which is a figure from Nathan Kutz and the book Data Driven Science and Engineering.

The neural network zoo provides an overview of different types of neural network architectures that can be used for specific tasks in machine learning. Some examples include auto encoder networks, Gans, deeper current networks, and many more. These architectures are created by combining different neural network building blocks.

Architectures

01:05

Today, we will discuss architecture. We will explore the definition of architecture, the various types of architectures, and how they can be more or less physical. Our choice of architecture may also contain implicit assumptions. The figure presented is from five years ago and represents only a small portion of the many architectures being explored and developed today.

NEURAL ARCHITECTURE
Hippocampus
Henry Gray (1918) Anatomy of the Human Body
Santiago Ramón y Cajal, 1911 (public domain)

02:13

The series on architectures is inspired by architectures in the brain in neuroscience systems. Animals, including mammals, humans, rodents, fish, and insects, have nervous systems and brains that interact with and process data from the real world to make decisions and move their bodies. There are rich architectures in our brains and nervous systems. A figure obtained from Bing Brunton shows a hand sketch by Cajal of neuronal architectures observed in microscope imaging, specifically a section of the hippocampus. This architecture is multi-scale with connections across different regions performing various computations. Our understanding of these architectures in neuroscience has inspired neural network and machine learning architectures in the modern era. Convolutional neural networks and image processing are inspired by observations in the visual cortex. The fields of neuroscience and machine learning are evolving together, with increasing data on neuronal architectures. There will be videos on neural-inspired computing and architectures in the future. In the class on physics and machine learning, we will cover a broad range of different architectures.

NEURAL ARCHITECTURE
https://commons.wikimedia.org/wiki/File:Connectome.jpg
Source: Webs'r'us, uploaded by CFCF; Author: jgmarcelino
Santiago Ramón y Cajal, 1911 (public domain)

03:47

Researchers are using neural networks and machine learning to study physical systems and improve their learning algorithms by incorporating physics. Some important methods include resnet residual networks, deep recurrent neural networks with jump connections, and the unit architecture, which assumes the world being modeled is multi-scale in space and time. Operator networks, like the Fourier neural Operator, are popular for analyzing physical systems such as partial differential equations. SINDy, the sparse identification of nonlinear dynamics, is a generalized linear regression method used to learn a differential equation from data. PINNs, physics-informed neural networks, are a significant area of research in modern physics and machine learning. Other Operator methods and architectures for PDEs and ODEs are also important topics in this field.

ResNet Block
ARCHITECTURES
3x3 CONV BATCH RELU 3x3 CONV BATCH
RELU
STRIDE=1 NORM
STRIDE=1 NORM
PAD=1
PAD=1
128 64 64
input
image
output
tile
segmentation
128 128
aldt alax
conv 3x3, ReLU
copy and crop
max pool 2x2
1024
up-conv 2x2
conv 1x1
BRANCH NET
a(x)
Fourier layer 1 Fourier layer 2
Fourier layer T
u(x)
Fourier layer
TRUNK NET
e(x)

04:35 - 05:40

We will cover each topic in depth with code, examples, and case studies, dedicating at least half an hour or an hour to each. There is a wealth of material available, with approximately five hours focused solely on Cindy. This allows for a deep dive into equation discovery for those interested. Today's discussion centers on architectures that are specifically beneficial for physics, aiding in the development of models that are more physical and require less data due to implicit biases that add structure and physics to machine learning architectures.

Physics plays a crucial role in machine learning, but the term itself needs clarification. While the Wikipedia definition involves matter, energy, and change, I prefer to define physics in the context of the capabilities we want our machine learning models to possess. Historically, physics has been characterized by simple and interpretable principles such as F equals MA and E equals MC squared. These fundamental laws are easy to understand and generalize, making them valuable in the development of machine learning models.

What is Physics?

06:30 - 07:28

Generalizability is a key feature of physics, as it describes both an apple falling and launching a rocket to the moon. Physics isn't just about matter and energy, but also how the brain and other systems work. There are rules that govern complex systems, which we can learn through machine learning and apply to our models. For me, interpretability and generalizability are essential for physical simplicity.

WHAT IS PHYSICS?
Interpretability / Generalizability
Parsimony / Simplicity
EVERYTHING SHOULD BE MADE
AS SIMPLE AS POSSIBLE,
BUT NOT SIMPLER.
Albert Einstein

08:48 - 08:49

These are related to another perspective that promotes physicality and models. A great Einstein quote states that everything should be made as simple as possible to describe data but not simpler. In the era of machine learning, we seek models that are as simple as possible to describe data and no simpler. This principle of simplicity or parsimony has been the gold standard in physics for 2000 years. From Aristotle to Einstein, models that are more beautiful, parsimonious, and as simple as possible typically encapsulate the core bits of physics. These models are more interpretable and tend to generalize well without overfitting.

In the history of science, from astrology to astronomy, and from alchemy to chemistry, every major leap forward in our understanding of physics has resulted in simpler and more universal descriptions. This is a crucial point to consider. Another area where essential physics can be captured and discovered through machine learning is in the concepts of symmetries, invariances, and conservation laws. Most of our partial differential equations, such as mass conservation, momentum conservation, and energy conservation, typically arise from the conservation of some quantity.

WHAT IS PHYSICS?
Interpretability / Generalizability
Parsimony / Simplicity
Symmetries / Invariances / Conservation

10:02

Mass, momentum, and energy are conserved in our universe, leading to fundamental invariants that create symmetries in data. These symmetries, variances, and conservation laws are core principles in physics that can be incorporated into machine learning algorithms. For example, the laws of physics remain unchanged when translating or rotating objects.

In considering architecture choices, it is essential to enforce or promote these physical principles and discover new symmetries. This concept of physics is crucial not only for architecture design but also for defining loss functions and optimization algorithms used in training machine learning models.

WHAT IS PHYSICS?
Interpretability / Generalizability
Parsimony / Simplicity
Symmetries / Invariances / Conservation
YES!

12:08

The takeaway is that we want our machine learning models to be interpretable, generalizable, simple, and parsimonious while enforcing known symmetries, variances, and conservation of the physical world. We should incorporate thousands of years of human experience learning physics into our models.

For example, let's consider a pendulum in a lab as a physical system. The data representation is a high-dimensional vector of a time series of pixels from a video. Although the data is high-dimensional, the system has low-dimensional meaning, such as the angle and angular velocity of the pendulum.

As humans, we can extract key features and patterns from high-dimensional data to identify important variables like angle and angular velocity. We may choose a machine learning architecture, like an autoencoder network, to compress the data and find the best representation of the variables.

We can also use the architecture to learn differential equations governing the evolution of the variables, such as the dynamics of the pendulum. By selecting a machine learning architecture that is adept at learning differential equations, like the sparse identification of nonlinear dynamics, we can achieve this goal.

Case Study: Pendulum

12:40 - 14:58

I use optimization to find the fewest library elements that describe dynamics. This is an architecture - a space of functions to describe observed data. There is a loss function and optimization algorithm to find the best function in the search space parameterized by the architecture. There are two architectures that relate to physics. One example is compression, assuming low-dimensional physics, using the SINDy library procedure to get a differential equation. This is an example of architectures promoting physics, as outlined in a paper by Kathleen Champion, Nathan Kutz, and myself. The paper combines a deep neural network autoencoder to learn a low-dimensional coordinate system for physics.

SINDy + AUTOENCODER

16:06

A SINDy model is used to show how dynamics evolve in a low-dimensional coordinate system. This approach helps to highlight the simplicity and sparsity of physical principles in physics. Custom loss functions are often necessary to effectively train architectures in this parameterized space of functions. These loss functions are essential for optimizing the models within the chosen architecture.

Defining a Function Space

17:12

Architecture refers to the various types of structures used in machine learning models. These can include neural networks, support vector machines, regression models, and more. The goal of a machine learning model is to take input data X and predict an output Y using a function F that is learned through adjusting parameters like weights.

For example, in a neural network, X is the input, Y is the output, and θ represents the parameters to adjust. In a SINDy model, the goal is to predict the time derivatives of a system state X dot using a set of polynomials with weights θ. The choice of architecture helps to constrain the possible functions that can describe the input-output relationship.

Architectures like feed-forward neural networks, autoencoders, and SINDy models all use different parameterizations to optimize the function to fit observed data. These architectures allow for the enforcement of symmetries, conservation laws, or simplicity in the model. Ultimately, the goal is to find the best function by tuning the free parameters using optimization algorithms and loss functions.

Turbulence Modelling: Galilean Invariance

TURBULENCE MODELING: GALILEAN INVARIANCE
Tensor input
du'u'
du'v'
du'w'
layer
T(n)
Reynolds Stresses
Merge output
layer
= Vp +
Input
Invariant
layer
Output
input layer
layer
Final
Hidden layers
hidden
Hidden layers
layer
g(n)
Ling, Kurzawski, Templeton, JFM, 807, 2016

20:52

Here are a few examples of interesting architectures. One of my favorites is from 2016, when Julia Ling and her collaborators built a deep neural network to predict Reynolds stresses for fluid flow simulations. This is important for modeling turbulence in industrial applications. The custom architecture they used in panel B includes a tensor input layer that enforces Galilean invariance, meaning the physics remains consistent in different reference frames.

Another powerful architecture is the residual network, introduced in 2015. This type of deep architecture includes skip connections and is designed to behave like a numerical integrator. It has been widely cited and is commonly used in modern machine learning.

ResNets

RESIDUAL NETWORK
(RESNET)
Xk+1
Euler Integrator!
Xk+1 = xk+f(xk)
ResNet Block
3x3 CONV BATCH
RELU
3x3 CONV BATCH
RELU
STRIDE=1 NORM
STRIDE=1 NORM
PAD=1
PAD=1
He, Zhang, Ren, Sun, CVPR, 2015

23:37 - 24:03

This architecture promotes the idea of time stepping forward, making it ideal for time series data and dynamical systems. The unit architecture is highly effective for super resolution image segmentation.

UNets

U -NET - ARCHITECTURE
BASIS OF DIFFUSION MODELS
SEGMENTATION
SUPER RESOLUTION DIFFUSION MODELS
input
image
output
tile
segmentation
128 128
256 128
conv 3x3, ReLU
copy and crop
1024
max pool 2x2
1024
up-conv 2x2
conv 1x1
Ronneberger, Fischer, Brox, arxiv, 2015

24:26

The basis of many diffusion models includes an inductive implicit bias. The structure of these models highlights the multiscale nature of observations in the real world, both in space and time. When looking at a picture of the real world, this multiscale structure is evident. This architecture is adept at parameterizing natural images, scenes of traffic, cities, and other similar objects.

Physics Informed Neural Networks

YSICS INFORMED NEURAL NETWORKS
4. CRAFT A Loss FUNCTION
(WHAT MODELS ARE "GOOD"?)
aldt
alax
Data
Virtual
Raissi, Perdikaris, Karniadakis, JCP 378, 2023

25:16

Physics informed neural networks are an important topic that will be explored in depth. The lecture series will focus on this concept within the context of crafting a loss function in the fourth stage of machine learning. These architectures and loss functions are often intertwined, with custom loss functions relying on specific architectures and vice versa.

Physics informed neural networks are particularly useful for estimating complex quantities such as fluid velocity fields or spatial fields with varying components. By utilizing automatic differentiation in neural network environments like pytorch and tensorflow, partial derivatives of these quantities can be computed without manual coding. These derivatives can then be incorporated into a loss function to enforce the satisfaction of physical laws, such as partial differential equations.

In essence, physics informed neural networks combine architecture and loss function elements to effectively utilize neural networks for obtaining the necessary quantities to satisfy the physics-based constraints within the loss function.

Lagrangian Neural Networks

AGRANGIAN NEURAL NETWORKS
Baseline NN
Loss of
Double Pendulum
Energy
+ m2)1202 + 12232
cos(01 - 02)
+(m1
m2)gl1 cos 01
+m2gl2 cos O2
Lagrangian NN
Observe State
Conservation of
over Time
Take Gradients:
Energy
Generalized
Coordinates
(No need for canonical
dqdq
coordinates)