Statistical mechanics lessons for data-driven methods
Dr. Andrei A. Klishin
Associated pre-print
Statistical mechanics lessons for data-driven methods
Data-driven methods are rapidly displacing the traditional numerical schemes across many applications from fluid dynamics to biochemical reaction networks to atmospheric chemistry. The methods work well if the parameters of the dataset and the hyperparameters of the algorithm are adjusted "just so" and the minimum of a well-chosen loss function is reliably reached. However, if the data is too noisy or the regularization is chosen incorrectly, the methods would often confidently choose a nonsense solution without prior warning. In this talk I use techniques from statistical mechanics to analyze the performance and failure of two popular data-driven methods. First, system identification attempts to reconstruct a sparse differential equation from noisy observations of trajectory data but requires a lot of trial-and-error parameter tuning. By using a Bayesian inference framework with a sparsifying prior, I provide an uncertainty quantification of the identified model and the detailed anatomy of its sparsity and noise induced failure. Second, sparse sensing uses a training data set of images to allow reconstructing a new image from just a few pixel-sized sensors. I show that the reconstruction quality is highly sensitive to the sensor locations, which are explained by an effective energy landscape, and becomes highly unstable when the number of sensors matches the model dimension.
- National Science Foundation2112085