How BRAIN CoGS Learned to Share: A Case Study

Members of the multi-lab BRAIN CoGS collaboration are using DataJoint to more easily access each other’s data. When Manuel Schottdorf first arrived at Princeton in last year to begin his postdoc in David Tank’s lab, he was surprised to find that a large portion of the lab’s data was stored in immense Google spreadsheets housing millions of entries. Coming from the world of physics, Schottdorf was surprised at the lack of a more formal data infrastructure. The system worked well enough for individual researchers. But the lab had recently won a large-scale collaborative NIH grant with seven other groups, called BrainCogs, aimed at deciphering how the brain makes decisions based on working memory. These computations span the brain, so the various member labs, which each have expertise in different brain regions, would need to pool their data. “That data is so large, there is no way the experimenter who collected it can carry out all the analyses,” says Carlos Brody, a neuroscientist at Princeton and an investigator with BRAIN CoGS and the Simons Collaboration on the Global Brain (SCGB). “Sharing the data so that others can work on it will be very valuable.”

BRAIN CoGS researchers use DataJoint to manage the many types of data they need to store, analyze and share. These tools allow researchers to easily relate experimental details, such as the animal’s training history, to a specific imaging experiment. Credit: Manuel Schottdorf

Each component of an experiment, such as the microscope, involves scores of variables that can vary from experiment to experiment. To share and compile data in a useful way, researchers need a system to manage all of them. Credit: Manuel Schottdorf

Each group has their own methods for storing data, which come with their own idiosyncrasies — one might group data by animal, another by day or experiment. “If you want to look across a dataset, you have to talk to each person to find out how they organize it, which takes a lot of time and effort,” Brody says. To develop a more robust system for organizing and sharing data, BRAIN CoGS labs are adopting DataJoint, a tool for creating and managing scientific pipelines. “DataJoint helps you put a lot of that data in a database,” Brody says. “Then if you want to access it by day or by mouse ID or task, DataJoint will figure out how to access it and get that data.”

Brigitte Stark