To make our way in the world, our brain must develop an intuitive understanding of the physical world around us, which we use to interpret the sensory information that comes to the brain.
How does the brain develop intuitive understanding? Many scientists believe that it may use a process similar to what is called “self-supervised learning.” This type of machine learning, originally developed as a way to create more efficient models for computer vision, allows computational models to learn about scenes that are visually based only on the similarities and differences between them, without labels or other information.
A pair of studies from researchers at MIT’s K. Lisa Yang Integrative Computational Neuroscience (ICoN) Center offers new evidence to support this hypothesis. The researchers found that when they trained models known as neural networks using a particular type of self-supervised learning, the resulting models produced patterns of activity that were very similar to those seen in the brains of animals that perform the same tasks as models.
The findings suggest that these models learn representations of the physical world that they can use to make accurate predictions about what will happen in that world, and that the mammalian brain can use in the same strategy, the researchers said.
“The theme of our work is that AI designed to help build better robots can also be a framework to better understand the brain in general,” said Aran Nayebi, a postdoc at the ICoN Center. “We can’t say whether it’s the whole brain, but across scales and different brain areas, our results seem to suggest an organizing principle.”
Nayebi is the lead author of one of the studies, co-author Rishi Rajalingham, a former postdoc at MIT now at Meta Reality Labs, and senior authors Mehrdad Jazayeri, an associate professor of brain and science in psychology and a member of the McGovern Institute for Brain Research; and Robert Yang, an assistant professor of brain and cognitive sciences and an associate member of the McGovern Institute. Ila Fiete, director of the ICoN Center, a professor of brain and cognitive sciences, and an associate member of the McGovern Institute, is the senior author of the other study, led by Mikail Khona, a student who graduate of MIT, and Rylan Schaeffer, former senior research associate at MIT.
Both studies will be presented at the 2023 Conference on Neural Information Processing Systems (NeurIPS) in December.
Modeling the physical world
Early models of computer vision mainly relied on supervised learning. Using this method, models are trained to classify images that are each labeled with a name — cat, car, etc.
To create a more efficient alternative, in recent years researchers have turned to models created through a technique known as contrastive self-supervised learning. This type of learning allows an algorithm to learn to classify objects based on their similarity to each other, without providing external labels.
“It’s a very powerful approach because you can use a lot of modern data sets, especially videos, and really unlock their potential,” Nayebi said. “A lot of modern AI that you see today, especially in the last few years with ChatGPT and GPT-4, is a result of training a self-monitored objective on a large data set to get a better quick representation.”
These types of models, also called neural networks, consist of thousands or millions of processing units connected to each other. Each node has connections of varying strength to other nodes in the network. As the network analyzes more data, the strength of the connections changes as the network learns to perform the desired task.
As the model performs a particular task, activity patterns of different units within the network can be measured. The activity of each unit can be represented as a firing pattern, similar to the firing patterns of neurons in the brain. Previous work from Nayebi and others has shown that self-directed vision models generate activity similar to that seen in the visual processing system of mammalian brains.
In two of the new NeurIPS studies, the researchers set out to explore whether self-directed computational models of other cognitive functions might also show similarities to the mammalian brain. In the study led by Nayebi, researchers trained self-directed models to predict the future state of their environment on hundreds of thousands of naturalistic videos depicting everyday scenarios.
“For the past decade or so, the dominant approach to building neural network models in cognitive neuroscience has been to train these networks on individual cognitive tasks. But models trained in this way are rarely that generalizes to other tasks,” Yang said. “Here we test whether we can build models for some aspects of recognition by first training on naturalistic data using self-supervised learning -on, then evaluate the lab settings.”
Once the model was trained, the researchers generalized it to a task they called “Mental-Pong.” It is similar to the video game Pong, where a player moves a paddle to hit a ball traveling across the screen. In the Mental-Pong version, the ball disappears immediately before hitting the paddle, so the player must estimate his trajectory to hit the ball.
The researchers found that the model was able to track the hidden ball’s trajectory with an exact resemblance to neurons in the mammalian brain, which had been shown in a previous study by Rajalingham and Jazayeri to imitate its trajectory – a cognitive phenomenon that known as “mental. simulation.” What’s more, the neural activation patterns seen within the model were similar to those seen in the animals’ brains as they played the game — specifically, in a part of the brain called the dorsomedial frontal cortex. kind of computational model that was able to match biological data like this, the researchers said.
“There are many efforts in the machine learning community to create artificial intelligence,” Jazayeri said. “The relevance of these models to neurobiology depends on their ability to further capture the inner workings of the brain. The fact that Aran’s model predicts neural data is very important because it suggests that we may be closer to building the artificial systems that follow natural intelligence.
Navigating the world
The study led by Khona, Schaeffer, and Fiete focused on a type of specialized neurons known as grid cells. These cells, located in the entorhinal cortex, help animals navigate, working with cells in the hippocampus.
While cells fire whenever an animal is in a certain location, grid cells only fire when the animal is at one of the vertices of a triangular lattice. Groups of grid cells create overlapping lattices of varying sizes, allowing them to encode multiple positions using a relatively small number of cells.
In recent studies, researchers have trained supervised neural networks to mimic grid cell function by predicting an animal’s next location based on its starting point and speed, a task known as merge path. However, these models depend on access to privileged information about absolute space at all times – information that the animal does not have.
Inspired by the unique coding properties of multiperiodic grid-cell codes for space, the MIT team trained a contrastive self-supervised model to both perform this same path integration task and represent space effectively while did it. For training data, they used sequences of velocity inputs. The model learns to recognize positions based on whether they are similar or different – close positions generate similar codes, but further positions generate more different codes.
“It’s like training models on images, where if two images are both cats’ heads, their codes should be the same, but if one is a cat’s head and the other is a truck, then you want their codes rejected,” Khona says. “We took the same idea but applied it to spatial trajectory.”
Once the model was trained, the researchers found that the activation patterns of nodes within the model formed multiple lattice patterns with different periods, very similar to those formed by grid cells in brain.
“What excites me about this work is that it makes connections between mathematical work on the unique information-theoretic properties of grid cell codes and the computation of path integration,” said Fiete. “While the mathematical work is analytic — what properties does the grid cell code have? — the approach to optimizing coding efficiency through self-supervised learning and obtaining grid-like tuning is synthetic: It shows what properties are necessary and sufficient to explain why the brain has grid cells.
The research was funded by the K. Lisa Yang ICoN Center, the National Institutes of Health, the Simons Foundation, the McKnight Foundation, the McGovern Institute, and the Helen Hay Whitney Foundation.