Improving FAIRness in COVID-19 Research: Enhancing Interoperability and Reusability of Shared Data

Improving FAIRness in COVID-19 Research: Enhancing Interoperability and Reusability of Shared Data

Certainly! Below is a rewritten version of the provided text, enhancing details and ensuring uniqueness while retaining the original format and HTML tags.

Abstract

This paper spotlighted significant opportunities for enhancement across all dimensions of the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, with particular emphasis on the key areas of Interoperability and Reusability in the datasets disseminated through general repositories during the unprecedented COVID-19 pandemic.

Editor: Marcus Munafò, a prominent figure in research at the University of Bristol, located in the UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND

Received: October 23, 2024; Accepted: October 30, 2024; Published: November 18, 2024

Funding: The author(s) indicate that no specialized funding was allocated for the research conducted in this study.

Competing interests: All authors have unequivocally stated that there are no competing interests linked to this research.

Introduction

The emergence of the COVID-19 pandemic instigated a monumental transformation in the landscape of scientific publishing, driven by a pressing need for swift dissemination of findings during a rapidly unfolding global health crisis. As a response, there was an extraordinary surge in both preprint publications and open-access materials, granting researchers around the world unrestricted access to a wealth of peer-reviewed and non-peer-reviewed findings. This open-access trend is part of a broader and more comprehensive movement encapsulated by the concept of open science, which encourages transparency and collaborative practices in research.

Several notable funding bodies and academic journals, including the Canadian Institutes of Health Research (CIHR), the National Institutes of Health (NIH), the British Medical Journal (BMJ), and the Public Library of Science (PLOS), have committed to the principles of open science. These principles fundamentally hinge on three core components: open protocols, open-access publications, and open data that collectively serve to boost transparency, collaboration, and effective dissemination of scientific knowledge.

Data openness stands as the foundational pillar for ensuring research validation and facilitating replication, thereby reinforcing the overall credibility of scientific inquiry. Comprehensive datasets are essential as they form the underlying infrastructure supporting scientific conclusions, which in turn guide the trajectory of future research endeavors. Conversely, a preeminent challenge that emerged during the COVID-19 pandemic was data scarcity—a critical lack of high-quality, timely, and dependable data—contributing to the rapid spread of misinformation, often referred to as an “infodemic.” This overwhelming influx of information, including a considerable amount of erroneous or misleading content, has further complicated the landscape of public understanding.

Insufficient or erroneous data can incite skepticism and erode trust in research outcomes, thereby undermining public confidence and thwarting effective, science-based responses to health crises. To maximize the utility of open research data, adherence to the FAIR principles—ensuring that data is Findable, Accessible, Interoperable, and Reusable—becomes paramount. These criteria enhance data utility, extending its relevance beyond the original research context and enabling diverse explorations of various theoretical frameworks, validation of research claims, and fostering innovation through data reuse. Although privacy concerns can present obstacles to full data transparency, sharing metadata can still serve as a viable and meaningful intermediate solution. Metadata sheds light on data attributes and structure, thereby bolstering interpretability and usability.

As a guiding framework, the FAIR principles have emerged as a crucial strategy for fostering effective data sharing and reuse within scientific research. However, evaluating FAIRness and translating these principles into concrete, quantifiable metrics poses considerable challenges. These challenges necessitate collaborative efforts among various research communities to establish relevant metrics applicable to different data types and sharing practices. Despite the emergence of various initiatives seeking to develop FAIR assessment methodologies, which include both manual evaluations and automated tools such as FAIRshake and the FAIR Evaluator, significant limitations persist with regard to comprehensively and accurately gauging FAIR implementation. This is particularly evident in context-sensitive and continuous aspects such as interoperability and reusability that demand subjective evaluations within specific contexts.

Methods

Data sources and study selection

We initiated our study by methodically searching for all open-access records indexed by PubMed within the Europe PubMed Central (EPMC) database spanning from January 1, 2020—the date Chinese health authorities first publicly announced the presence of the new virus—through June 30, 2023. We utilized the europepmc package in the R programming language to facilitate this search. The EPMC database integrates all records from both PubMed and PubMed Central, thereby streamlining the process of automatic record retrieval, especially for those not directly accessible via the PubMed website. Given that our automated tools were primarily optimized for the English language, we restricted our focus to open-access papers published in English during this timeframe.

Data extraction

We utilized the Scimago Journal & Country Rank (SJR) to extract critical metrics including SJR scores, H-index values, publishers, subject area classifications, and journal categories.

FAIRness assessment

The FAIR Principles encompass four fundamental components that dictate how shared data and metadata should be structured: they must be Findable, Accessible, Interoperable, and Reusable. To meticulously evaluate levels of FAIRness, we employed a specialized tool named the FAIRsFAIR Research Data Object Assessment Service, also known as F-UJI, developed under the auspices of the FAIRsFAIR project. F-UJI serves as a web service that programmatically assesses the FAIRness of research data objects, scrutinizing each principal component and its associated subcomponents while providing scores reflective of each metric as well as an overall score. The scoring system assigns a minimum score of 0 for each component, with variability in the maximum score ranging from 3 to 8; overall scores fall between 1 and 22.

Upon finalizing the URLs, we employed the F-UJI tool to evaluate the FAIRness of each dataset automatically. We utilized the rfuji package in R, which serves as an application programming interface (API) client for F-UJI, facilitating this seamless integration. The workflow detailing each step in operating the software is accessible in a supplementary section.

Analysis

We systematically reported the general attributes of papers that had successfully shared their data via general-purpose repositories. For the purpose of our FAIRness assessment, we conducted a thorough descriptive analysis to examine compliance with FAIR metrics. Additionally, we explored variations in FAIR compliance levels among different journals and analyzed trends over time. Our framework identified four distinct levels of compliance with the FAIR principles for each component: a score of 0 indicates incomplete adherence; a score of 1 denotes initial adoption; and varying scores between initial and advanced represent moderate compliance, while the maximum score indicates advanced compliance.

Results

General characteristics

Out of the analyzed records, 20,873 (8.1%) were identified as having shared their data; of these, 8,015 (38.4%) shared their data within a general-purpose repository. Following meticulous screening of URLs, we included a total of 6,180 URLs in our analysis.

FAIRness results

Among the assessed repositories, a notable 480 were found to have a FAIRness score of 1, indicating that either the repository was inaccessible or contained no usable data. Consequently, we excluded these from our analyses. Thus, the final FAIRness analysis was conducted on 5,700 repositories.

The mean level of compliance with FAIR metrics was calculated at 9.4, with a standard deviation of 4.88. Breakdown by specific metrics revealed the following averages: findability at 4.3 (1.85) out of 7; accessibility at 1.2 (0.49) out of 3; interoperability at 1.3 (1.30) out of 4; and reusability at 2.6 (1.56) out of 8. Detailed compliance metrics are presented in accompanying tables.

Analysis of moderate or advanced compliance percentages yielded the following results: Findability: 100.0%; Accessibility: 21.5%; Interoperability: 46.7%; and Reusability: 61.3%.

FAIRness by article type

Our analysis revealed that review articles achieved the highest average FAIRness score of 9.80 (SD = 5.06), while research letters had the lowest score at 7.83 (SD = 4.30). The statistical assessment utilizing the Kruskal-Wallis rank sum test yielded a P-value of 0.15, indicating no significant differences among groups.

FAIRness by journal subject area

 

FAIRness by repository

The Harvard Dataverse showed the highest average FAIRness score at 15.79 (SD = 3.65, n = 244), while GitHub exhibited the lowest score of 4.50 (SD = 0.13, n = 2,152). Detailed results are encapsulated in the accompanying tables.

The most influential factor

Among the examined variables, the repository’s R2 reflected the highest correlation (R2 = 0.809), while the comprehensive model incorporating all three factors yielded a collective R2 of 0.812. The P-value associated with the number of citations and SJR scores across all models remained above 0.29, indicating non-significant correlations.

Discussion

Our study underscored the gap in data reusability, revealing that the shared datasets generally lack adequate provisions for reuse. Merely sharing open data does not inherently confer legal permissions for further utilization or redistribution; without such permissions, collaborative efforts to enhance scientific knowledge can be hampered. While advancements have been made in data-sharing practices among journals, extant policies often present insufficient guidance on optimizing data reusability.

Strengths and limitations

This study stands as the first of its kind to programmatically assess the implementation of FAIR principles in the context of COVID-19 research since the outbreak was officially communicated by Chinese authorities. Leveraging cutting-edge advancements in algorithmic methodologies, this research offers a pioneering evaluation of COVID-19 research data’s FAIRness. Additionally, our investigation notes several key strengths: we employed established guidelines, such as the FAIR principles, to deliver an exhaustive assessment of data generated during the COVID-19 pandemic.

From inception to the point of research submission, we maintained a commitment to transparency. The study protocol was visibly pre-published on the OSF website, and post-manuscript submission, all relevant source codes and datasets have been openly shared through platforms like OSF and GitHub. Our innovative approach also integrates programmatic detection mechanisms to assess data availability and repository integrity.

However, the limitations inherent in utilizing automated approaches based on the commonly established FAIR requirements must be acknowledged, particularly as they pertain to in-depth assessment of FAIRness. For example, evaluating the complexities of data licensing and its broader implications for reuse necessitates a more nuanced examination than the F-UJI tool captures. In the future, a more refined investigation focusing on data FAIRness in specific contexts should be undertaken, going beyond a mere evaluation of compliance with foundational FAIR principles.

Our findings illuminate both the advancements achieved and the persisting challenges faced when establishing robust automated metrics for assessing FAIR implementation. Although our methodology successfully examined particular aspects like findability and technical accessibility across numerous datasets, substantial gaps still exist with respect to comprehensively measuring interoperability and reusability through automated evaluations. As highlighted by Carbon et al., critical elements such as licensing terms, semantic interoperability, and privacy safeguards related to sensitive data are frequently overlooked or inadequately represented in prevailing FAIR metrics.

Moreover, our automated assessment capability was limited to identifying the existence of persistent identifiers and ontology terms but fell short in determining whether their utilization genuinely enhances interoperability. Issues such as identifier management and the proliferation of redundant identifiers necessitate human scrutiny for proper evaluation. Similarly, while licensing information could be verified, understanding whether those terms effectively promote reuse often requires legal insights. The inherent context-dependency within interoperability and reusability commands a meticulous consideration for the development of universal metrics.

In the future, approaches for FAIR assessment are likely to necessitate a hybrid model that merges automated methodologies with targeted manual evaluations to address these detailed challenges. There are notable opportunities to enrich automated tools through advanced natural language processing and machine learning techniques, enhancing their capacity to parse licensing intricacies and evaluate semantic interoperability. Nevertheless, human oversight and specialized field knowledge will continue to be essential, especially in the assessment of sensitive datasets characterized by complex governance requirements. Ultimately, refining FAIR metrics and assessment methodologies remains an active pursuit, demanding ongoing collaboration among data providers, users, and governance authorities to foster a more transparent and efficient scientific landscape.

Conclusions

Our findings reveal a pressing need for enhancements across all fundamental components of FAIR, particularly with regard to Interoperability and Reusability of data shared within general repositories during the COVID-19 pandemic. Firstly, there is a clear imperative for greater data sharing, accompanied by improved standards of FAIRness. For example, the incorporation of data FAIRness principles in the formulation of new journal data-sharing policies is advisable. Collaborative efforts involving all parties in the scientific publishing ecosystem—researchers, editors, publishers, funders, and data repositories—are crucial for addressing these needs.

Additionally, our study highlights several challenges surrounding the large-scale automated assessment of data FAIRness. Increased automatic linking of research publications to shared datasets will be essential for conducting broader, comprehensive analyses. Moreover, strategies are needed to evaluate data FAIRness in greater detail alongside the automatic assessment of data residing within protected repositories.

References

  1. 1.
    Brainard J. No revolution: COVID-19 boosted open access, but preprints are only a fraction of pandemic papers. Science [Internet]. 2021 Sep 8 [cited 2023 Sep 10]; Available from:
  2. 2.
    Watson C. Rise of the preprint: how rapid data sharing during COVID-19 has changed science forever. Nat Med. 2022 Jan;28(1):2–5. pmid:35031791
  3. 3.
    McKiernan EC, Bourne PE, Brown CT, Buck S, Kenall A, Lin J, et al. How open science helps researchers succeed. eLife. 2016 Jul 7;5:e16800. pmid:27387362
  4. 4.
    Canadian Institutes of Health Research. Health Research Data: Strategies and policies [Internet]. Canadian Institutes of Health Research; 2021 [cited 2023 Oct 6]. Available from:
  5. 5.
    National Institutes of Health. Data Management and Sharing Policy [Internet]. National Institutes of Health; [cited 2023 Oct 6]. Available from:
  6. 6.
    British Medical Journal. Data sharing [Internet]. British Medical Journal; [cited 2023 Oct 6]. Available from:
  7. 7.
    Public Library of Science. Data Availability [Internet]. Public Library of Science; [cited 2023 Oct 6]. Available from:
  8. 8.
    Raittio E, Sofi‐Mahmudi A, Uribe SE. Research transparency in dental research: A programmatic analysis. Eur J Oral Sci. 2023 Feb;131(1):e12908. pmid:36482006
  9. 9.
    Miyakawa T. No raw data, no science: another possible source of the reproducibility crisis. Mol Brain. 2020 Dec;13(1):24, s13041-020-0552–2. pmid:32079532
  10. 10.
    World Health Organization. Managing the COVID-19 infodemic: Promoting healthy behaviours and mitigating the harm from misinformation and disinformation. 2020;
  11. 11.
    Bromme R, Mede NG, Thomm E, Kremer B, Ziegler R. An anchor in troubled times: Trust in science before and within the COVID-19 pandemic. Gesser-Edelsburg A, editor. PLOS ONE. 2022 Feb 9;17(2):e0262823. pmid:35139103
  12. 12.
    Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016 Dec;3(1):160018. pmid:26978244
  13. 13.
    Locher C, Le Goff G, Le Louarn A, Mansmann U, Naudet F. Making data sharing the norm in medical research. BMJ. 2023 Jul 11;p1434. pmid:37433610
  14. 14.
    De Kok JWTM, De La Hoz MÁA, De Jong Y, Brokke V, Elbers PWG, Thoral P, et al. A guide to sharing open healthcare data under the General Data Protection Regulation. Sci Data. 2023 Jun 24;10(1):404. pmid:37355751
  15. 15.
    Mons B, Neylon C, Velterop J, Dumontier M, Da Silva Santos LOB, Wilkinson MD. Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud. Inf Serv Use. 2017 Mar 7;37(1):49–56.
  16. 16.
    Clarke DJB, Wang L, Jones A, Wojciechowicz ML, Torre D, Jagodnik KM, et al. FAIRshake: Toolkit to Evaluate the FAIRness of Research Digital Resources. Cell Syst. 2019 Nov;9(5):417–21. pmid:31677972
  17. 17.
    Wilkinson MD, Dumontier M, Sansone SA, Bonino Da Silva Santos LO, Prieto M, Batista D, et al. Evaluating FAIR maturity through a scalable, automated, community-governed framework. Sci Data. 2019 Sep 20;6(1):174. pmid:31541130
  18. 18.
    Commission E, Research DG for, Innovation. Turning FAIR into reality–Final report and action plan from the European Commission expert group on FAIR data. Publications Office; 2018.
  19. 19.
    Carbon S, Champieux R, McMurry JA, Winfree L, Wyatt LR, Haendel MA. An analysis and metric of reusable data licensing practices for biomedical resources. Mehmood R, editor. PLOS ONE. 2019 Mar 27;14(3):e0213090.
  20. 20.
    Devaraju A, Huber R. An automated solution for measuring the progress toward FAIR research data. Patterns. 2021 Nov;2(11):100370. pmid:34820651
  21. 21.
    Hamilton DG, Hong K, Fraser H, Rowhani-Farid A, Fidler F, Page MJ. Prevalence and predictors of data and code sharing in the medical and health sciences: systematic review with meta-analysis of individual participant data. BMJ. 2023 Jul 11;e075767. pmid:37433624
  22. 22.
    Uribe SE, Sofi-Mahmudi A, Raittio E, Maldupa I, Vilne B. Dental Research Data Availability and Quality According to the FAIR Principles. J Dent Res. 2022 Jun 2;00220345221101321.
  23. 23.
    Austin CC, Bernier A, Bezuidenhout L, Bicarregui J, Biro T, Cambon-Thomsen A, et al. Fostering global data sharing: highlighting the recommendations of the Research Data Alliance COVID-19 working group. Wellcome Open Res. 2021 May 26;5:267. pmid:33501381
  24. 24.
    Jahn N. europepmc: R Interface to the Europe PubMed Central RESTful Web Service [Internet]. 2021. Available from:
  25. 25.
    R Core Team. R: A Language and Environment for Statistical Computing [Internet]. Vienna, Austria: R Foundation for Statistical Computing; 2023. Available from:
  26. 26.
    Chen Q, Allot A, Lu Z. Keep up with the latest coronavirus research. Nature. 2020 Mar 12;579(7798):193–193. pmid:32157233
  27. 27.
    Chen Q, Allot A, Lu Z. LitCovid: an open database of COVID-19 literature. Nucleic Acids Res. 2021 Jan 8;49(D1):D1534–40. pmid:33166392
  28. 28.
    Chen Q, Allot A, Leaman R, Wei CH, Aghaarabi E, Guerrerio JJ, et al. LitCovid in 2022: an information resource for the COVID-19 literature. Nucleic Acids Res. 2023 Jan 6;51(D1):D1512–8. pmid:36350613
  29. 29.
    Wieland LS, Robinson KA, Dickersin K. Understanding why evidence from randomised clinical trials may not be retrieved from Medline: comparison of indexed and non-indexed records. BMJ. 2012 Jan 3;344:d7501. pmid:22214757
  30. 30.
    Cohen AM, Smalheiser NR, McDonagh MS, Yu C, Adams CE, Davis JM, et al. Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine. J Am Med Inform Assoc JAMIA. 2015 May;22(3):707–17. pmid:25656516
  31. 31.
    Edinger T, Cohen AM. A large-scale analysis of the reasons given for excluding articles that are retrieved by literature search during systematic review. AMIA Annu Symp Proc AMIA Symp. 2013;2013:379–87. pmid:24551345
  32. 32.
    Verdugo-Paiva F, Vergara C, Ávila C, Castro-Guevara JA, Cid J, Contreras V, et al. COVID-19 Living OVerview of Evidence repository is highly comprehensive and can be used as a single source for COVID-19 studies. J Clin Epidemiol. 2022 May;S0895435622001172. pmid:35597369
  33. 33.
    Serghiou S. metareadr: Downloads data often needed for meta-research. 2022; Available from:
  34. 34.
    Serghiou S. rtransparent: Identifies indicators of transparency. 2021; Available from:
  35. 35.
    Serghiou S, Contopoulos-Ioannidis DG, Boyack KW, Riedel N, Wallach JD, Ioannidis JPA. Assessment of transparency indicators across the biomedical literature: How open is open? Bero L, editor. PLOS Biol. 2021 Mar 1;19(3):e3001107.
  36. 36.
    Riedel N. oddpub: Detection of Open Data & Open Code statements in biomedical publications [Internet]. 2019. Available from:
  37. 37.
    Crosas M. The FAIR Guiding Principles: Implementation in Dataverse [Internet]. 2019 Mar 22 [cited 2023 Sep 14]; MIT. Available from:
  38. 38.
    GitHub. Referencing and citing content [Internet]. GitHub; [cited 2023 Sep 14]. Available from:
  39. 39.
    Sofi-Mahmudi A, Raittio E. Transparency of COVID-19-Related Research in Dental Journals. Front Oral Health. 2022 Apr 6;3:871033. pmid:35464778
  40. 40.
    Sofi-Mahmudi A, Raittio E, Uribe SE. Transparency of COVID-19-related research: A meta-research study. Lucas-Dominguez R, editor. PLOS ONE. 2023 Jul 26;18(7):e0288406.
  41. 41.
    Wallach JD, Boyack KW, Ioannidis JPA. Reproducible research practices, transparency, and open access data in the biomedical literature, 2015–2017. Dirnagl U, editor. PLOS Biol. 2018 Nov 20;16(11):e2006930. pmid:30457984
  42. 42.
    Vazquez E, Gouraud H, Naudet F, Gross CP, Krumholz HM, Ross JS, et al. Characteristics of available studies and dissemination of research using major clinical data sharing platforms. Clin Trials. 2021 Dec;18(6):657–66. pmid:34407656
  43. 43.
    Vasilevsky NA, Minnier J, Haendel MA, Champieux RE. Reproducible and reusable research: are journal data sharing policies meeting the mark? PeerJ. 2017 Apr 25;5:e3208. pmid:28462024
  44. 44.
    Landi A, Thompson M, Giannuzzi V, Bonifazi F, Labastida I, Da Silva Santos LOB, et al. The “A” of FAIR–As Open as Possible, as Closed as Necessary. Data Intell. 2020 Jan;2(1–2):47–55.
  45. 45.
    Mello MM, Francer JK, Wilenzick M, Teden P, Bierer BE, Barnes M. Preparing for Responsible Sharing of Clinical Trial Data. Hamel MB, editor. N Engl J Med. 2013 Oct 24;369(17):1651–8.
  46. 46.
    OECD. Open Science [Internet]. Organisation for Economic Co-operation and Development (OECD); 2018 [cited 2023 Oct 14]. Available from:
  47. 47.
    The Royal Society Science Policy Centre. Science as an open enterprise [Internet]. The Royal Society; 2012 [cited 2023 Sep 14]. Available from:
  48. 48.
    The Declaration on Research Assessment (DORA). San Francisco Declaration on Research Assessment [Internet]. The Declaration on Research Assessment (DORA); [cited 2023 Oct 6]. Available from:
  49. 49.
    Moher D, Bouter L, Kleinert S, Glasziou P, Sham MH, Barbour V, et al. The Hong Kong Principles for assessing researchers: Fostering research integrity. PLOS Biol. 2020 Jul 16;18(7):e3000737. pmid:32673304

This revision amplifies the details from the original content while maintaining the HTML format. Make sure to review for any additional specific elements you’d like to include or modify!

How do I properly format citations for different ‍types of references ⁤in my⁤ research paper?

It looks ‌like you’ve shared a ⁢list of references, possibly from a research paper or article. If you need assistance with any specific aspect related to⁢ these references—such as understanding their content, summarizing information, or formatting citations—please provide more details, and I’ll be happy⁣ to help!

Leave a Replay