AI/ML+Physics Part 3: Designing an Architecture - presented by Prof. Steve Brunton

AI/ML+Physics Part 3: Designing an Architecture

Prof. Steve Brunton

Prof. Steve Brunton

Preamble

Physics Informed Machine Learning involves building models from data that either have a physical basis or are used to discover new physics or incorporate physics into the machine learning process. Today, we will focus on stage three, which involves designing and architecture.

Designing and architecture is a crucial part of the machine learning pipeline. There are various architectures that can be used to discover physics and embed physics into machine learning. One popular area in physics informed machine learning is the neural network zoo, which is a figure from Nathan Kutz and the book Data Driven Science and Engineering.

The neural network zoo provides an overview of different types of neural network architectures that can be used for specific tasks in machine learning. Some examples include auto encoder networks, Gans, deeper current networks, and many more. These architectures are created by combining different neural network building blocks.

Architectures

Today, we will discuss architecture. We will explore the definition of architecture, the various types of architectures, and how they can be more or less physical. Our choice of architecture may also contain implicit assumptions. The figure presented is from five years ago and represents only a small portion of the many architectures being explored and developed today.

The series on architectures is inspired by architectures in the brain in neuroscience systems. Animals, including mammals, humans, rodents, fish, and insects, have nervous systems and brains that interact with and process data from the real world to make decisions and move their bodies. There are rich architectures in our brains and nervous systems. A figure obtained from Bing Brunton shows a hand sketch by Cajal of neuronal architectures observed in microscope imaging, specifically a section of the hippocampus. This architecture is multi-scale with connections across different regions performing various computations. Our understanding of these architectures in neuroscience has inspired neural network and machine learning architectures in the modern era. Convolutional neural networks and image processing are inspired by observations in the visual cortex. The fields of neuroscience and machine learning are evolving together, with increasing data on neuronal architectures. There will be videos on neural-inspired computing and architectures in the future. In the class on physics and machine learning, we will cover a broad range of different architectures.

References
  • 1.
    https://cir.nii.ac.jp/crid/1370004237605141637
  • 2.
    https://books.google.com/books/about/Anatomy_of_the_Human_Body.html?id=uaQMAAAAYAAJ&redir_esc=y

Researchers are using neural networks and machine learning to study physical systems and improve their learning algorithms by incorporating physics. Some important methods include resnet residual networks, deep recurrent neural networks with jump connections, and the unit architecture, which assumes the world being modeled is multi-scale in space and time. Operator networks, like the Fourier neural Operator, are popular for analyzing physical systems such as partial differential equations. SINDy, the sparse identification of nonlinear dynamics, is a generalized linear regression method used to learn a differential equation from data. PINNs, physics-informed neural networks, are a significant area of research in modern physics and machine learning. Other Operator methods and architectures for PDEs and ODEs are also important topics in this field.

References
  • 1.
    https://cir.nii.ac.jp/crid/1370004237605141637
  • 2.
    https://commons.wikimedia.org/wiki/File:Connectome.jpg

We will cover each topic in depth with code, examples, and case studies, dedicating at least half an hour or an hour to each. There is a wealth of material available, with approximately five hours focused solely on Cindy. This allows for a deep dive into equation discovery for those interested. Today's discussion centers on architectures that are specifically beneficial for physics, aiding in the development of models that are more physical and require less data due to implicit biases that add structure and physics to machine learning architectures.

Physics plays a crucial role in machine learning, but the term itself needs clarification. While the Wikipedia definition involves matter, energy, and change, I prefer to define physics in the context of the capabilities we want our machine learning models to possess. Historically, physics has been characterized by simple and interpretable principles such as F equals MA and E equals MC squared. These fundamental laws are easy to understand and generalize, making them valuable in the development of machine learning models.

What is Physics?

Generalizability is a key feature of physics, as it describes both an apple falling and launching a rocket to the moon. Physics isn't just about matter and energy, but also how the brain and other systems work. There are rules that govern complex systems, which we can learn through machine learning and apply to our models. For me, interpretability and generalizability are essential for physical simplicity.

These are related to another perspective that promotes physicality and models. A great Einstein quote states that everything should be made as simple as possible to describe data but not simpler. In the era of machine learning, we seek models that are as simple as possible to describe data and no simpler. This principle of simplicity or parsimony has been the gold standard in physics for 2000 years. From Aristotle to Einstein, models that are more beautiful, parsimonious, and as simple as possible typically encapsulate the core bits of physics. These models are more interpretable and tend to generalize well without overfitting.

In the history of science, from astrology to astronomy, and from alchemy to chemistry, every major leap forward in our understanding of physics has resulted in simpler and more universal descriptions. This is a crucial point to consider. Another area where essential physics can be captured and discovered through machine learning is in the concepts of symmetries, invariances, and conservation laws. Most of our partial differential equations, such as mass conservation, momentum conservation, and energy conservation, typically arise from the conservation of some quantity.

Mass, momentum, and energy are conserved in our universe, leading to fundamental invariants that create symmetries in data. These symmetries, variances, and conservation laws are core principles in physics that can be incorporated into machine learning algorithms. For example, the laws of physics remain unchanged when translating or rotating objects.

In considering architecture choices, it is essential to enforce or promote these physical principles and discover new symmetries. This concept of physics is crucial not only for architecture design but also for defining loss functions and optimization algorithms used in training machine learning models.

The takeaway is that we want our machine learning models to be interpretable, generalizable, simple, and parsimonious while enforcing known symmetries, variances, and conservation of the physical world. We should incorporate thousands of years of human experience learning physics into our models.

For example, let's consider a pendulum in a lab as a physical system. The data representation is a high-dimensional vector of a time series of pixels from a video. Although the data is high-dimensional, the system has low-dimensional meaning, such as the angle and angular velocity of the pendulum.

As humans, we can extract key features and patterns from high-dimensional data to identify important variables like angle and angular velocity. We may choose a machine learning architecture, like an autoencoder network, to compress the data and find the best representation of the variables.

We can also use the architecture to learn differential equations governing the evolution of the variables, such as the dynamics of the pendulum. By selecting a machine learning architecture that is adept at learning differential equations, like the sparse identification of nonlinear dynamics, we can achieve this goal.

Case Study: Pendulum

I use optimization to find the fewest library elements that describe dynamics. This is an architecture - a space of functions to describe observed data. There is a loss function and optimization algorithm to find the best function in the search space parameterized by the architecture. There are two architectures that relate to physics. One example is compression, assuming low-dimensional physics, using the SINDy library procedure to get a differential equation. This is an example of architectures promoting physics, as outlined in a paper by Kathleen Champion, Nathan Kutz, and myself. The paper combines a deep neural network autoencoder to learn a low-dimensional coordinate system for physics.

SINDy + AUTOENCODER

A SINDy model is used to show how dynamics evolve in a low-dimensional coordinate system. This approach helps to highlight the simplicity and sparsity of physical principles in physics. Custom loss functions are often necessary to effectively train architectures in this parameterized space of functions. These loss functions are essential for optimizing the models within the chosen architecture.

References
  • 1.
    K. Champion et al. (2019) Data-driven discovery of coordinates and governing equations. Proceedings of the National Academy of Sciences

Defining a Function Space

Architecture refers to the various types of structures used in machine learning models. These can include neural networks, support vector machines, regression models, and more. The goal of a machine learning model is to take input data X and predict an output Y using a function F that is learned through adjusting parameters like weights.

For example, in a neural network, X is the input, Y is the output, and θ represents the parameters to adjust. In a SINDy model, the goal is to predict the time derivatives of a system state X dot using a set of polynomials with weights θ. The choice of architecture helps to constrain the possible functions that can describe the input-output relationship.

Architectures like feed-forward neural networks, autoencoders, and SINDy models all use different parameterizations to optimize the function to fit observed data. These architectures allow for the enforcement of symmetries, conservation laws, or simplicity in the model. Ultimately, the goal is to find the best function by tuning the free parameters using optimization algorithms and loss functions.

Turbulence Modelling: Galilean Invariance

Here are a few examples of interesting architectures. One of my favorites is from 2016, when Julia Ling and her collaborators built a deep neural network to predict Reynolds stresses for fluid flow simulations. This is important for modeling turbulence in industrial applications. The custom architecture they used in panel B includes a tensor input layer that enforces Galilean invariance, meaning the physics remains consistent in different reference frames.

Another powerful architecture is the residual network, introduced in 2015. This type of deep architecture includes skip connections and is designed to behave like a numerical integrator. It has been widely cited and is commonly used in modern machine learning.

References
  • 1.
    J. Ling et al. (2016) Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. Journal of Fluid Mechanics

ResNets

This architecture promotes the idea of time stepping forward, making it ideal for time series data and dynamical systems. The unit architecture is highly effective for super resolution image segmentation.

References
  • 1.
    https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf

UNets

The basis of many diffusion models includes an inductive implicit bias. The structure of these models highlights the multiscale nature of observations in the real world, both in space and time. When looking at a picture of the real world, this multiscale structure is evident. This architecture is adept at parameterizing natural images, scenes of traffic, cities, and other similar objects.

References
  • 1.
    O. Ronneberger et al. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation.

Physics Informed Neural Networks

Physics informed neural networks are an important topic that will be explored in depth. The lecture series will focus on this concept within the context of crafting a loss function in the fourth stage of machine learning. These architectures and loss functions are often intertwined, with custom loss functions relying on specific architectures and vice versa.

Physics informed neural networks are particularly useful for estimating complex quantities such as fluid velocity fields or spatial fields with varying components. By utilizing automatic differentiation in neural network environments like pytorch and tensorflow, partial derivatives of these quantities can be computed without manual coding. These derivatives can then be incorporated into a loss function to enforce the satisfaction of physical laws, such as partial differential equations.

In essence, physics informed neural networks combine architecture and loss function elements to effectively utilize neural networks for obtaining the necessary quantities to satisfy the physics-based constraints within the loss function.

References
  • 1.
    M. Raissi et al. (2018) Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics

Lagrangian Neural Networks

Lagrangian neural networks and Hamiltonian neural networks are a good example of the intersection of architecture and loss function. If a system conserves energy or has a mechanical structure, Lagrangian and Hamiltonian systems can be incorporated into both the architecture and loss function to train the neural network. This area of research includes Lagrangian neural networks, deep Operator networks, and Operator networks in general, such as deep UNets and Fourier neural operators. These custom architectures can accelerate training with less data due to physical implicit assumptions. Neural operators are another popular architecture in this field.

References
  • 1.
    M. Cranmer et al. (2020) Lagrangian Neural Networks.

Deep Operator Networks

No summary
References
  • 1.
    Lu Lu et al. (2019) DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators.

Fourier Neural Operators

The Fourier neural Operator is based on the idea that real-world physics is multiscale and efficiently represented in the Fourier domain. By incorporating Fourier layers into the neural Operator, it implicitly assumes a multiscale nature of physics. Graph neural networks are another example of this concept.

References
  • 1.
    Zongyi Li et al. (2020) Fourier Neural Operator for Parametric Partial Differential Equations.

Graph Neural Networks

Graph neural networks have produced impressive results in machine learning for physical systems. These networks have been used to discover laws of planetary motion that can be applied to multi-planet systems and simulate fluid flows. GNNS are designed to incorporate assumptions about the structure of interactions, such as end body systems, molecular dynamics, or rigid body systems. By integrating physics into these networks, there are numerous opportunities for advancement.

I am eager to learn more about this topic and plan to explore it further. This will allow us to delve into the topic together and understand the powerful demonstrations of efficient and accurate machine learning models in simulating complex physics. One remarkable paper demonstrates the ability to simulate various fluids, elastics, and complicated partial differential equations using simple concepts in Graph neural networks to incorporate the physics of the system.

References
  • 1.
    P. W. Battaglia et al. (2018) Relational inductive biases, deep learning, and graph networks.

The physics of one parcel is similar to another, so there may be a smaller set of rules for local physics interactions. This is just an overview of half the topics we will cover in detail in the upcoming hours. Symmetries are crucial for physics and machine learning. Invariances are important for machine learning in physical systems. Invariances mean the output should stay the same despite transformations like rotation or scaling. Equivariance is slightly different and both are important.

References
  • 1.
    https://www.youtube.com/watch?v=h7h9zF8OO7E
  • 2.
    https://proceedings.mlr.press/v119/sanchez-gonzalez20a.html

Invariance and Equivariance

Classification involves creating a neural network for image segmentation, where the output labels different parts of an image. The concept of equivariance means that when an input is transformed and processed through a machine learning model, the output should reflect the same transformation. This is represented mathematically as the function F and symmetry operation G commuting. Group theory helps determine when F and G commute. Convolutional neural networks emphasize translation invariance, but research has shown how symmetry groups can be incorporated into various neural network architectures like auto encoders.

In this discussion, we are focusing on classification and image segmentation using neural networks. The goal is to have the output of the neural network remain the same even if the input is rotated or translated. This concept is known as equivariance, where the function F of the machine learning model and the symmetry operation G should commute. This means that the output should maintain the same rotation or translation as the input.

Convolutional neural networks are known for promoting translation invariance, but recent research has shown how to incorporate general symmetry groups into other neural network architectures. By designing machine learning models with equivariant properties, we can reduce the amount of data needed for training and improve generalization.

Researchers like Max Welling and Tess Schmidt have made significant contributions to the field of equivariant machine learning models. In the upcoming sections, we will explore how to build these models, the use of specific loss functions and architectures, and their efficiency and generalization capabilities.

Stay tuned for discussions on loss functions, optimization, and various examples of machine learning architectures. Thank you for your attention.