Research teams lose an estimated 50% of their datasets within two decades of publication, according to a study in Current Biology — and the single biggest factor is where (and whether) that data gets deposited. Choosing the right research data repository is no longer optional. Funder mandates from the NIH, Wellcome Trust, and European Commission now require researchers to share data in FAIR-compliant repositories, and journals increasingly reject manuscripts that lack a proper data availability statement. Yet with dozens of generalist and discipline-specific platforms available, most researchers still struggle to figure out which one actually fits their workflow.
This guide compares the best research data repositories in 2026 — including Zenodo, Figshare, Dryad, the Open Science Framework (OSF), Mendeley Data, and Harvard Dataverse — across the criteria that matter most: storage limits, curation quality, metadata standards, funder compliance, and cost. Whether you are a principal investigator managing multi-site datasets, a PhD candidate depositing your first supplementary files, or a lab manager standardizing data sharing across a research group, this comparison will help you make an informed decision.
What is a research data repository and why does it matter?
A research data repository is a platform designed to store, preserve, and make research datasets discoverable and citable. Unlike generic cloud storage (Google Drive, Dropbox), a dedicated data repository assigns a persistent identifier — typically a Digital Object Identifier (DOI) — to every deposited dataset, registers rich metadata in global indexes, and commits to long-term preservation.
Repositories matter because they are the infrastructure behind reproducible science. When a dataset is stored in a trusted repository, other researchers can locate it, verify findings, and build on previous work. Funders and publishers recognize this: the NIH Data Management and Sharing Policy (effective since January 2023) requires all NIH-funded researchers to submit a Data Management and Sharing Plan that specifies where data will be deposited. Similar mandates exist across the EU (Horizon Europe), the UK (UKRI), and Australia (ARC).
For research teams, the challenge is not just depositing a dataset — it is organizing, documenting, and tracking datasets across multiple projects before they ever reach a repository. This is where a research project and reference management platform like ScholarDock becomes essential: it helps teams structure their data, connect it to the papers and references that produced it, and track which datasets have been deposited and where.
How to choose the right data repository for your research
Before comparing individual platforms, it helps to understand the key criteria researchers should evaluate. The NIH-GREI (Generalist Repository Ecosystem Initiative) consortium and the Generalist Repository Comparison Chart — maintained collaboratively by major repository providers — recommend assessing repositories on the following dimensions:
Accepted data types and formats — Does the repository accept all file types, or is it restricted to specific formats?
Storage limits and costs — How much data can you deposit for free, and what are the fees for larger datasets?
Metadata standards — Does the repository support rich, machine-readable metadata (DataCite, Dublin Core, schema.org)?
Persistent identifiers — Does it assign DOIs automatically? Does it support versioning with linked DOIs?
Curation and quality control — Is there human curation before publication, or is it self-service?
Funder and publisher compliance — Is the repository on your funder's approved list?
Access controls — Can you embargo datasets, restrict access, or share privately during peer review?
Interoperability — Does the repository integrate with other tools, such as GitHub, ORCID, or institutional repositories?
Researchers should also check whether a discipline-specific repository exists for their field first. Tools like re3data.org and FAIRsharing.org catalog thousands of domain repositories. If no suitable domain repository is available, a generalist repository is typically the best choice.
Zenodo: the free, CERN-backed open repository
Zenodo is a free, open-access repository hosted at CERN's Data Centre in Geneva and operated in partnership with OpenAIRE. Launched in 2013, it has become one of the most widely used generalist repositories in the world, with millions of records spanning every academic discipline.
Key features
Storage: Up to 50 GB per record with a maximum of 100 files. A one-time quota increase to 200 GB is available on request.
Cost: Completely free for uploading and downloading.
DOI minting: Every upload receives a DataCite DOI automatically upon publication. DOIs can be reserved in advance.
Versioning: Zenodo supports record versioning with linked DOIs — each version gets its own DOI, and a concept DOI resolves to the latest version.
Access controls: Supports open, embargoed, restricted, and closed access.
GitHub integration: Researchers can connect their GitHub account and automatically archive repository releases in Zenodo with a DOI.
Communities: Users can create thematic communities to curate and organize related deposits.
Metadata: Uses DataCite metadata schema. All metadata is licensed under CC0 and exported via OAI-PMH for harvesting.
Best for
Zenodo is ideal for researchers who need a free, no-hassle repository for datasets, software, presentations, posters, and other research outputs. Its GitHub integration makes it particularly popular in computational and data science fields. Because it accepts any file type and has no discipline restrictions, it serves as a reliable fallback when no domain repository exists.
Limitations
Zenodo does not provide human curation — depositors are responsible for metadata quality. The 50 GB per-record limit, while generous for most use cases, may not suffice for large-scale imaging, genomics, or climate datasets. Search and discovery are functional but lack the advanced filtering of more specialized platforms.
Figshare: flexible storage with institutional support
Figshare is a commercial open-data publishing platform (owned by Digital Science, a Springer Nature portfolio company) that allows researchers to share any type of scholarly output — datasets, figures, media, code, presentations, and more.
Key features
Storage: 20 GB of free private storage per account, with individual file uploads up to 20 GB. Institutional accounts offer up to 5 TB per file.
Cost: Free for datasets up to 20 GB. Figshare+ offers larger deposits starting at around $395 for 100 GB, scaling to $2,500 for 1 TB (one-time data publishing charge).
DOI minting: DataCite DOIs are assigned to every published item.
File types: Accepts all file types. Includes in-browser previews for common formats.
Versioning: Supports versioning with linked DOIs.
Institutional integration: Many universities and research institutions have Figshare for Institutions licenses, which provide branded repositories with custom metadata schemas and higher storage quotas.
API: Well-documented REST API for programmatic access.
Metrics: Altmetric and usage statistics (views, downloads, shares) are displayed for every item.
Best for
Figshare is a strong choice for researchers at institutions with a Figshare for Institutions license, since storage limits and features are significantly expanded. It is also well-suited for sharing supplementary materials, figures, and multimedia alongside traditional datasets.
Limitations
Free-tier storage (20 GB) can be limiting for data-heavy disciplines. Figshare does not perform human curation on deposits — metadata quality depends entirely on the depositor. The platform is commercial and owned by a major publisher, which may raise concerns for researchers committed to fully open infrastructure.
Dryad: curated, open-data publishing for research datasets
Dryad is a nonprofit, community-governed data repository focused exclusively on publishing research data associated with peer-reviewed publications. It stands out for its expert curation — every submission is reviewed by a team of professional curators before publication.
Key features
Storage: No hard storage cap per deposit, but pricing scales with size. Standard datasets under 50 GB are covered by the base publishing charge.
Cost: $150 per dataset (Data Publishing Charge), which may be waived or covered by the author's institution if it has a Dryad partnership. A $50 fee applies for datasets kept private during peer review.
Curation: Dryad curators check every submission for accessibility, reusability, file integrity, and metadata completeness. This is a significant differentiator.
DOI minting: DataCite DOIs assigned upon publication.
ORCID integration: Authors log in via ORCID, linking deposits to their researcher profile.
Journal integration: Integrated into the submission workflows of many major journals, enabling seamless data deposit alongside manuscript submission.
Licensing: All published data is released under CC0 (public domain), ensuring maximum reusability.
Best for
Dryad is the best option for researchers who want professional curation and quality assurance for their datasets. It is particularly strong for life sciences, ecology, and environmental science. If your institution has a Dryad membership, it can be effectively free for individual researchers.
Limitations
The $150 per-dataset fee is a barrier for unfunded researchers or teams with many small deposits. Dryad focuses on data — it does not accept software, presentations, or other non-data research outputs. The CC0 licensing requirement means depositors waive all rights, which may not suit all data types (e.g., sensitive qualitative data).
Open Science Framework (OSF): project management meets data sharing
The Open Science Framework (OSF), developed by the Center for Open Science, is unique among repositories because it combines data storage with research project management. OSF is both a preprint server, a project workspace, and a data repository.
Key features
Storage: 5 GB per file for OSF Storage (the default). Higher limits available through connected storage add-ons (AWS S3, Google Cloud, institutional storage).
Cost: Completely free.
DOI and ARK identifiers: OSF assigns both DOIs and ARK identifiers to registrations and files.
Project structure: Researchers can organize files, data, and documentation within nested project components, each with independent access controls and contributor lists.
Preregistration: OSF is a leading platform for study preregistration, supporting templates from the APA, the Replication Recipe, and custom formats.
Add-ons: Connects to GitHub, Dropbox, Google Drive, Amazon S3, Dataverse, Figshare, and institutional storage.
Registrations: Frozen, time-stamped snapshots of a project can be created as permanent registrations, ensuring research transparency.
Best for
OSF is ideal for researchers who want to manage an entire research project — from preregistration to data collection to publication — in one place. It is especially popular in social sciences and psychology, where preregistration and open science practices are strongly emphasized.
Limitations
The 5 GB per-file limit on default storage is restrictive for large datasets. OSF is a broad project management platform, not a dedicated data repository — its metadata and curation capabilities are less developed than those of Zenodo, Figshare, or Dryad. Discovery of deposited data through external search engines can be less reliable.
Mendeley Data: free repository with publisher ecosystem integration
Mendeley Data is a free, general-purpose data repository operated by Elsevier. It allows researchers to deposit, share, and discover datasets across all disciplines.
Key features
Storage: Up to 10 GB per dataset for free.
Cost: Free.
DOI minting: DataCite DOIs assigned to every published dataset.
Discovery: Indexes over 20 million datasets from thousands of external data repositories, making it a discovery layer as well as a deposit platform.
Publisher integration: Tightly integrated with Elsevier's journal submission workflows, enabling researchers to attach datasets directly during manuscript submission.
Licensing: Supports multiple Creative Commons licenses (CC0, CC BY, etc.).
FAIR compliance: Follows FAIR data principles with machine-readable metadata.
Best for
Mendeley Data works well for researchers already embedded in Elsevier's publishing ecosystem (e.g., submitting to Elsevier journals) and for those who need a quick, free deposit for small to medium datasets.
Limitations
The 10 GB storage limit is among the most restrictive of the generalist repositories. Mendeley Data does not offer human curation. Being owned by Elsevier may raise concerns among researchers who prefer open, community-governed infrastructure. Search and metadata capabilities are less developed than those of Dataverse or Zenodo.
Harvard Dataverse: structured metadata for the social and quantitative sciences
Harvard Dataverse is the flagship installation of the open-source Dataverse Project, developed by Harvard's Institute for Quantitative Social Science (IQSS). Any researcher can create a "dataverse" — a customizable collection for organizing and publishing datasets.
Key features
Storage: No hard file size limit for Harvard Dataverse; limits vary for other Dataverse installations.
Cost: Free for Harvard Dataverse. Other institutional installations may have different policies.
Metadata: Rich, customizable metadata schemas with discipline-specific metadata blocks (social science, geospatial, life science, astronomy, and more).
DOI minting: DataCite DOIs assigned to datasets and individual files.
Data exploration: Supports in-browser tabular data exploration and analysis for common formats (CSV, TSV, Stata, SPSS, R).
Versioning: Full dataset versioning with change logs.
Hierarchical organization: Dataverses can contain sub-dataverses, datasets, and files, providing a flexible organizational structure.
API: Robust REST API for programmatic deposit, search, and data access.
Best for
Harvard Dataverse is a strong fit for social sciences, political science, economics, and quantitative research that benefits from structured, discipline-specific metadata. Its open-source nature also makes it popular with institutions that want to run their own repository.
Limitations
The interface can feel dated compared to newer platforms. Because Dataverse is open-source software with many independent installations, the experience varies between institutions. Discovery is strong within the Dataverse network but less visible in general academic search engines than Zenodo or Figshare.
Side-by-side comparison table
What are the FAIR data principles and why do repositories need to support them?
The FAIR data principles — Findable, Accessible, Interoperable, and Reusable — are a set of guidelines published in Scientific Data in 2016 that have become the global standard for research data management. Every major funder, including the NIH, Wellcome Trust, European Commission, and UKRI, now requires or strongly encourages FAIR-compliant data sharing.
A FAIR-compliant repository must:
Assign globally unique, persistent identifiers (DOIs) to datasets
Provide rich, machine-readable metadata using standardized schemas
Use open, standardized protocols for data access
Include clear licensing and provenance information
All six repositories in this comparison support DOI minting and basic FAIR compliance. However, the depth of metadata support varies significantly. Harvard Dataverse offers the most structured, discipline-specific metadata. Dryad ensures FAIR compliance through human curation. Zenodo and Figshare rely on depositors to provide quality metadata — which is where the gap often appears.
For research teams managing datasets across multiple projects, ensuring FAIR compliance before deposit is a major workflow challenge. ScholarDock, a research project and reference management platform, helps teams organize datasets alongside the papers, protocols, and references that produced them — so when it comes time to deposit in a repository, the metadata, context, and documentation are already in place.
How to decide: a practical decision framework
Choosing a research data repository does not have to be complicated. Here is a simple decision framework:
Check your funder's requirements first. Some funders specify approved repositories. The NIH maintains a list of recommended repositories, and many include Zenodo, Dryad, Figshare, and Dataverse.
Look for a discipline-specific repository. Use re3data.org or FAIRsharing.org to search for a repository in your field. Domain repositories often have richer metadata and stronger community adoption.
If no domain repository fits, choose a generalist repository based on your needs:
Need it free with large storage? → Zenodo
Need professional curation? → Dryad
Need institutional integration and flexibility? → Figshare
Need to manage the entire research project lifecycle? → OSF
Need structured, discipline-specific metadata? → Harvard Dataverse
Need quick, free deposit with Elsevier journal integration? → Mendeley Data
- Prepare your data and metadata before depositing. Clean file names, write documentation, apply consistent metadata, and check licensing requirements. This preparation step is where most researchers lose time — and where having an organized research workspace makes a significant difference.
Organizing your data before deposit: the missing step
Most guides about research data repositories focus on where to deposit — but the harder problem is what happens before deposit. Research teams typically manage dozens of datasets across multiple projects, each connected to different papers, collaborators, and funding sources. Without a structured system for organizing this information, data deposit becomes a chaotic last-minute task instead of a smooth part of the research workflow.
This is precisely the problem ScholarDock solves. As a research project and reference management platform, ScholarDock gives teams a single workspace to organize datasets alongside their associated references, project notes, and collaborator assignments. When a manuscript is accepted and the journal requests a data availability statement, the team already knows which datasets exist, where they are stored, and what metadata has been prepared — because all of that information lives in one connected workspace rather than scattered across email threads, shared drives, and spreadsheets.
By connecting your data management workflow to your project and reference management workflow, you eliminate the frantic pre-submission scramble and make FAIR compliance part of your everyday research process.
Key takeaways
Choosing the right research data repository in 2026 comes down to matching your storage needs, curation expectations, funder requirements, and workflow to the right platform. Zenodo remains the most versatile free option. Dryad is unmatched for curated, high-quality data publication. Figshare offers the most flexibility for institutional users. OSF combines project management with data sharing. Harvard Dataverse leads on structured metadata. Mendeley Data provides quick, free deposits with publisher integration.
Whatever repository you choose, the real competitive advantage for research teams is organizing data, references, and projects in a connected workflow before deposit. If your team is tired of last-minute metadata scrambles, scattered datasets, and broken links between your papers and your data, ScholarDock brings your entire research workflow — sources, projects, datasets, and collaborators — into one connected workspace. Start organizing your research the way it should be: structured, connected, and ready to share.
