Data size
Number of participants, number of files per participant, and size of files; number of copies of files, and whether the data will be downloaded or not.
Best Practices:
- Do not maintain copies of data if not necessary (see number of copies, below).
- Store data so that it can be queried/explored (see number of copies, below).
- Organize data to make it easy to optimize cost of cloud storage by using different storage classes (set up rules so this can be done automatically). AWS recently added intelligent tiering to make this even easier; see https://aws.amazon.com/blogs/aws/s3-intelligent-tiering-adds-archive-access-tiers/
- Separate derived products from archival/raw data (using different folders, for example).
- Use consistent naming conventions across projects (see BIDS format below).
- Raw data should be saved with read-only permissions to avoid accidental changes or deletion.
- Focusing on the short term: Consider how data sizes and ingress/egress may change over the course of the study. If you plan to remove your data from the Cloud at the end of the project, it is a good idea to reserve money in the budget for that in advance.
Things to Avoid:
- Do not group all files into a single archive (e.g., tarbal)l per participant.
- Do not duplicate raw data across researchers working on the same project. Consider a shared raw data repository.
Value Set Definitions:
- Low: Researcher has basic familiarity with neuroimaging tools and workflows in a local environment, but little or no experience with cloud-based computing. Researchers who would label themselves as low on this dimension should consider carefully whether they have the time and resources needed to develop the necessary expertise in order to use cloud-based resources for their projects. Even though the research group may not have the necessary expertise now, the group will want to think about whether they need to develop this expertise for their future research efforts.
- Medium: Researcher has good computational and data skills but only modest cloud-based computing experience.
- High: Researcher has computational and data skills; has cloud-based computing experience.
Value of Use Case Example:
Yes or no, based on whether or not the size of data are sufficient (>= terabytes) to warrant pushing to cloud
Discussion of Use Case:
A modest amount of data may not warrant the use of cloud-based resources unless there are other considerations that warrant the use of the Cloud, such as a lack of resources at the investigators home institution, the need to coordinate data collection/processing across multiple sites, or the need to share data in a way that cannot be supported by the home institution.
See Also:
- BIDS format for data organization
- AWS user guides for Cloud set up
- The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments
- MEG-BIDS, the brain imaging data structure extended to magnetoencephalography