Data Complexity/Scope
Number of modalities and data types; dimensions of these data, e.g., different types of data that requires different types of licenses, identifiability, and different sharing, IRB and HIPAA controls.
Best Practices:
- Organize data with sufficient metadata (information about interpretation and provenance) to be accessed/queried programmatically (a centralized data repository, or a data lake) and also to be readable/understandable by humans.
- Generate metadata from pipelines.
- Consider using a standardized data format, such as the BIDS data format.
- Consider using file formats that can be organized and queried with SQL and easily parsed, such as CSV.
- Use consistent naming conventions across projects and datatypes. Your life will be much easier if you adopt practices of wherever your data is going to end up or whatever the tools expect.
Things to Avoid:
- Thinking that you are going to "organize the data later" - it is best to be built into the pipelines from the start.
- Do not keep metadata separately from data, assuming you will be able to integrate later.
Value Set Definitions:
- Low: A limited number of neuroimaging data types
- Medium: Multiple structural and functional neuroimaging types coming from multiple sources-covered by different licenses
- High: Multiple structural and functional neuroimaging types as well as other data types, such as behavioral data and/or sequence data
Value of Use Case Example:
High - Jordan will assess a range of behavioral measures (individual differences in psychopathology, personality and behavior) out of the scanner, as well as T1, T2, resting state and diffusion imaging. They are also considering obtaining DNA samples and the possibility of doing mobile sensing with participants, collecting information about activity, sleep, geolocation, and potentially even scraping information about app usage, frequency of texts and calls, etc.
Discussion of Use Case:
The acquisition or use of only a single data type, especially if not a large amount of data, may not warrant the use of cloud-based resources unless there are other considerations that warrant the use of the Cloud, such as a lack of resources at the investigators home institution, the need to coordinate data collection/processing across multiple sites, or the need to share data in a way that cannot be supported by the home institution. In Jordan’s case, the high complexity/scope of the data she proposes to collect may make this project appropriate for the use of cloud-based resources.
See Also:
- BIDS format for data organization
- The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments
- MEG-BIDS, the brain imaging data structure extended to magnetoencephalography