2024-10-24 20:01:00
Constellation Network and Common Crawl Foundation are revolutionizing web data accessibility and AI development through blockchain technology
The Common Crawl Foundation, a nonprofit organization founded in 2007 dedicated to providing the public with a copy of the Internet, and Constellation Network, a Web3 blockchain ecosystem that excels in providing solutions to the U.S. Department of Defense today announced a strategic partnership aimed at democratizing and improving the accessibility and utility of web crawl data on blockchain technology for artificial intelligence (AI) and data applications.
This collaboration will explore potential ways to improve large language models used by AI. The starting point is Common Crawl’s massive dataset, used by 80% of major language models, having crawled over 250 billion web pages to date (19 billion in 2024 alone) and consisting of an archive of almost nine petabytes of archived crawl data. Leveraging Constellation’s Hypergraph decentralized network, which ensures data immutability, provenance and auditability, the partnership is designed to provide shared solutions for responsible and transparent AI.
With AI set to be a $3 trillion industry by 2030, there is a growing need for secure solutions for sharing datasets used for training large language models, improved storage of queried and cleaned data, opportunities to monetize data and improved transparency regarding the source of the data. With Constellation’s unique approach to providing tools to converge existing infrastructure with distributed and decentralized networks, and Common Crawl’s history of data and data usage growth, this partnership is designed to further democratize data.
“This partnership represents a significant step forward in ensuring the trusted distribution of Common Crawl,” said Rich Skrenta, Executive Director of the Common Crawl Foundation. “By combining our comprehensive web archive with Constellation’s proven blockchain technology, researchers can and developers around the world trust what they get from Common Crawl and have a model for authenticating large open data sets, such as those used for AI training.”
Ben Jorgensen, Managing Director of Constellation Network, explains: “The partnership between Constellation Network and Common Crawl underscores the widespread adoption of web3 solutions outside the echo chambers of the crypto economy. This alignment continues Constellation’s mission to leverage our Zero Trust network as a public good for a data-driven future.” Jorgensen continues: “Our goal is to attract new developers by building capabilities such as integrating immutability into show digital workflows and thus further differentiate us from previous generations of blockchain technology.”
The two organizations will implement this initiative gradually, starting with a customizable subnet called a metagraph that incorporates a subset of Common Crawl’s data. This subnet is currently operational on their testnet and will soon be integrated into Hypergraph, Constellation’s public network. More details about the live metagraph will be announced in the coming weeks, as well as information about how organizations and developers can participate.
Further information can be found at:
Information about the Common Crawl Foundation
The Common Crawl Foundation is a 501(c)(3) nonprofit organization dedicated to providing the public with a free copy of the Internet. Their web archive consists of petabytes of data collected through years of web crawling and serves as an important resource for researchers, companies, and developers worldwide.
Information about Constellation Network
Constellation Network is a Web3 blockchain ecosystem that builds bridges between crypto economies and traditional businesses. Its flagship Hypergraph offers a solution for fast, scalable and fee-free transactions. Constellation’s network is validated by the US Department of Defense, which has been a customer since 2019.
Note: This press release contains forward-looking statements. Actual results may differ materially from those projected.
1729818038
#Common #Crawl #Foundation #Constellation #Network #Announce #Partnership #Bridge #Blockchain
Interview with Rich Skrenta, Executive Director of the Common Crawl Foundation
Host: Welcome, Rich, and thank you for joining us today to discuss the exciting partnership between the Common Crawl Foundation and Constellation Network. Let’s dive right in. Can you explain what this partnership aims to achieve?
Rich Skrenta: Thank you for having me! This partnership is all about democratizing access to web data and enhancing the utility of that data through blockchain technology. By combining Common Crawl’s extensive web archive, which has crawled over 250 billion web pages, with Constellation’s Hypergraph decentralized network, we aim to ensure trustworthy and transparent data distribution for the development of artificial intelligence.
Host: That sounds promising. How will this collaboration specifically impact the training of large language models?
Rich Skrenta: The dataset from Common Crawl is utilized by 80% of major language models today. With this partnership, we’re looking to provide a robust framework for authenticating these large datasets. This means that researchers and developers can have greater confidence in the quality and source of the data they are using to train AI models.
Host: You mentioned transparency and trust in data. Can you elaborate on the importance of these factors in AI development?
Rich Skrenta: Absolutely. As AI continues to grow into a $3 trillion industry by 2030, ensuring the integrity of data is crucial. We need to provide clear provenance so users can understand where the data comes from and how it’s been used. This builds trust and ultimately leads to more responsible AI development. Our partnership will offer solutions like data immutability and auditability, which are key for maintaining that trust.
Host: Ben Jorgensen from Constellation Network mentioned attracting new developers. How do you see this partnership facilitating that?
Rich Skrenta: By integrating blockchain technology into existing workflows, we’re not only enhancing security but also making it easier for developers to access and use web data. The customizable subnet or “metagraph” allows for a more seamless interaction with our datasets, which can inspire innovation and attract a broader developer community interested in exploring Web3 solutions.
Host: It’s clear that there’s a lot of potential here. What do you envision as the next steps in this partnership?
Rich Skrenta: We’ll be implementing this initiative gradually, starting with our metagraph on the testnet and continuing to integrate with Constellation’s Hypergraph. Our teams are excited to explore further applications and capabilities that will help democratize data even more. It’s an ongoing journey, and we’re just getting started!
Host: Thank you, Rich. This partnership undoubtedly holds great promise for the future of AI and web data accessibility. We look forward to following its progress!
Rich Skrenta: Thank you! It was a pleasure to share our vision with you.
Host: It’s fascinating how you’re addressing these challenges. Moving forward, what are the first steps you’ll take within this initiative?
Rich Skrenta: We’re starting off with the development of a customizable subnet, or metagraph, which will incorporate a subset of our data. This testnet is currently operational, and we plan to fully integrate it into Constellation’s Hypergraph soon. This will allow us to demonstrate the capabilities of our combined technologies and how they can be used in practice.
Host: That sounds exciting! For developers and organizations interested in getting involved, what can they expect in terms of participation?
Rich Skrenta: In the coming weeks, we’ll be releasing more details on how developers can participate in this initiative. We’re looking to create an inclusive environment where everyone can contribute to and benefit from the advancements in data accessibility and AI.
Host: Before we wrap up, what do you see as the long-term impact of this partnership on the AI landscape?
Rich Skrenta: I believe this partnership has the potential to redefine how data is managed and utilized in AI. By ensuring that data is not only accessible but also trusted, we’re paving the way for more innovative applications in AI. As a result, we could see significant advancements in areas like natural language processing, machine learning, and more, all built on a foundation of transparent and accountable data usage.
Host: Thank you so much, Rich. This partnership truly seems to be a game-changer for web data and AI technology.
Rich Skrenta: Thank you for having me! I’m excited to see where this journey takes us and how it will benefit the broader community.