Costs
How many direct costs for compute, storage, network costs are borne by the researcher? Issues include both short-term (while doing the study and analysis) and long-term costs for storage; cost of curation and organizing data, both the data you are generating and the output; cost of complying and using standards; and cost of computing.
Best Practices:
- Use available tools to estimate storage and compute costs of the project.
- Plan for ongoing costs.
- Determine whether subsidies are available from the institution, granting agencies or other sources.
- Remember that commercial cloud costs are “Pay as you go.”
- Estimate on the high side—include a cushion
Things to Avoid:
- Paying to store additional copies of data
- Not taking advantage of archival/"cold" storage
- Not accounting for network costs associated with copying data
- Not accounting for access costs or not taking advantage of "requester pays" capabilities
- Forgetting to turn off machine you are not using
- Not accounting for the free-tier of compute resources
Value Set Definitions:
- Low: Relative low costs ($10,000 or less)
- Medium: Greater than $10,000, but less than $25,000
- High: $25,000 or more
Value of Use Case Example:
High - The amount of data and length of store and compute demands are likely to be considerable.
Discussion of Use Case:
Jordan will either need to ensure prospectively that her budget includes sufficient resources for all planned cloud computing and storage costs, or determine whether she will have sufficient budget for the duration of the project and all needs before embarking on the use of cloud computing and storage. If her budget did not initially cover these costs, there may be additional institutional or federal funds available to do so that she could pursue. It will be particularly important for Jordan to plan for costs that will be needed throughout the life of the project, including any longer term archiving or sharing costs.
See Also:
- How to control cloud costs: Use cases and discussion by Terra
- Cloudbank: Cost estimation
- Cost Calculators
- Life Cycle Decisions for Biomedical Data: National Academy of Sciences, Engineering, and Medicine Report (see in particular Chapter 4 and Appendix E)
- NIH STRIDES: STRIDES is the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative. It “allows NIH to explore the use of cloud environments to streamline NIH data use by partnering with commercial providers. NIH’s STRIDES Initiative provides cost-effective access to industry-leading partners to help advance biomedical research. These partnerships enable access to rich datasets and advanced computational infrastructure, tools, and services.” Through this program NIH-funded researchers with an active NIH award may take advantage of the STRIDES Initiative for their NIH-funded research projects. The STRIDES Initiative provides:
- Favorable pricing on computing, storage, and related cloud services
- Access to training for researchers, data owners, and others to help ensure optimal use of available tools and technologies
- Access to professional service consultations and technical support from the STRIDES Initiative partners
- Check whether institutions have policies regarding use of the Cloud that can affect costs. For example, academic discounts, waivers for indirects for cloud computing, e.g., see https://itconnect.uw.edu/research/waiver/
- Running Neuroimaging Applications on Amazon Web Services: How, When, and at What Cost?
- https://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPaysBuckets.html