The growth of IBM Storage Ceph – the ideal foundation for a modern data lakehouse

2024-02-15 20:04:30

By Gerald Sternagl, Technical Product Management Manager, IBM Storage Ceph

It’s been a year since IBM integrated Red Hat’s storage product roadmaps and teams into IBM Storage. During this period, organizations have faced unprecedented data challenges in scaling AI, due to the rapid growth of data in more locations and formats, but with lower quality. Helping customers address this issue has meant modernizing their infrastructure with cutting-edge solutions as part of their digital transformations. To a large extent, this involves delivering consistent applications and data storage across on-premises and cloud environments. Crucially, this includes helping customers adopt cloud-native architectures to realize the benefits of public cloud such as cost, speed and elasticity. Formerly known as Red Hat Ceph and now called IBM Storage Ceph, this next-generation open source-defined storage platform is a key element in this effort.

Software-defined storage (SDS) has emerged as a transformative force when it comes to data management, offering a number of advantages over traditional legacy storage arrays, including extreme flexibility and scalability, which are well-suited to address use cases modern technologies, such as generative AI. With IBM Storage Ceph, storage resources are abstracted from the underlying hardware, enabling dynamic allocation and efficient utilization of data storage. This flexibility not only simplifies management, but also improves agility in adapting to evolving business needs and scaling compute and capacity as new workloads are introduced. This self-healing platform is designed to provide unified file, block, and object storage services at scale on industry-standard hardware. Unified storage helps provide customers with a bridge from legacy applications running on independent file or block storage to a common platform that includes these and object storage on a single device.

Ceph is optimized for large single- and multi-site deployments, and it scales efficiently to support hundreds of petabytes of data and tens of billions of objects, which is essential for traditional and most recent generative ones. IBM Storage Ceph’s scalability, resilience, and security make it ideal for supporting open source data lakehouse and AI/ML (artificial intelligence/machine learning) frameworks, as well as more traditional workloads such as MySQL and MongoDB on Red Hat OpenShift or Red Hat OpenStack. That’s why IBM Storage Ceph’s 768 TiB raw capacity is included in the watsonx.dataIBM’s open, governed, fit-for-purpose data lakehouse architecture, optimized for data, analytics, and AI workloads.

KNOW MORE ABOUT IBM STORAGE CEPH

The right foundation for data-intensive workloads and information processing

The explosive growth of unstructured data and generative AI share a symbiotic relationship, influencing and benefiting each other. In its Top Trends in Enterprise Data Storage 2023 report, Gartner states that “by 2028, large enterprises will triple their unstructured data capacity across on-premises, edge, and public cloud compared to mid-2023.” The proliferation of unstructured data such as text, images and videos provides a vast and diverse source for training generative AI models. In turn, generative AI helps in understanding and extracting valuable insights from the ever-growing pool of unstructured data. This synergy results in a feedback loop in which generative AI thrives on the abundance of unstructured data, and AI’s continuous generation of realistic data further enriches and refines its understanding of unstructured datasets, driving innovation and advancement.

With 70% of file and object data expected to be implemented on a consolidated unstructured data storage platform by 2028 (increasing from 35% in 2023), according to the same Gartner report, organizations need a management solution capable of accelerated data ingestion, data cleansing and classification, metadata management and augmentation, and cloud-scale capacity management and deployment such as software-defined storage. IBM Storage Ceph scales seamlessly to meet these growing data demands. Its self-management capabilities ensure that the system continually adapts to changing conditions, making the solution hassle-free while easily maintaining data integrity.

To accelerate and scale the impact of data and AI on an organization and ultimately improve business outcomes, companies must be hybrid by design. This includes the ability to consume on-premises storage services with a cloud-native operating model to address issues such as the need for enterprise resource sets unavailable in the public cloud, data sovereignty considerations, and cost. IBM Storage Ceph’s plug-and-play architecture simplifies integration with existing infrastructures, including cross-platform, cloud environments, hypervisors, open source data repositories like Apache Iceberg or Apache Parquet, and end-to-end solution stacks like watsonx.ai, watsonx.data and others. New nodes or devices can be added to the cluster seamlessly, without interruptions or service downtime. It provides an easy and efficient way for customers to build a data lakehouse with watsonx.data and other next-generation AI workloads.

“At Snap, our need to store more and more data continues to expand, and we need a platform that can scale quickly, meet our performance KPIs, and be cost-effective at the same time. IBM Storage Ceph is the platform of choice with its simple scalable architecture, easy-to-manage interface, and cost-effective software-defined implementation. Having world-class knowledge and support from IBM is another important part of our decision to use IBM Storage Ceph for such a critical component of our business.” –

Snap Inc.

Fast data access with NVMe over TCP

Over the past year, IBM has introduced several major updates to Ceph, including, most recently, IBM Storage Ceph 7.0. This next-generation Ceph platform prepares for NVMe/TCP capabilities designed to enable faster data transfer between storage devices, servers and cloud platforms while maintaining the low latency and high bandwidth characteristics of traditional NVMe. This makes it suitable for applications that require access to ultra-fast storage, such as databases, analytics and content delivery, and simplifies infrastructure due to its compatibility with traditional network technology investments. These benefits will help customers embrace a software-defined approach designed to deliver a cloud-like experience in terms of speed, agility and cost-effectiveness.

NVMe/TCP can help Ceph bridge the gap to traditional block storage with scale-out architectures. With NVMe/TCP, Ceph will be designed to integrate with platforms like VMware to help enterprises replicate cloud architectures in their own data center, moving away from expensive, rigid SAN networks and monolithic storage arrays.

Additional new features included in Ceph 7.0:

SEC and FINRA Compliance Certification for WORM with Object Lockenabling WORM compliance for object storage

• Support from NFS for CephFS file system access for non-native Ceph customers

• For more feature details, visit the IBM Storage community here

Cloud economies of scale with IBM Storage Ceph

Because IBM Storage Ceph stores data as objects within logical storage pools, a single cluster can have multiple pools, each tuned for different performance or capacity requirements. This allows customers to benefit from easier and faster access to data with content and context classifications, storage capacity limited only by the size of an organization’s infrastructure, and cost reductions at scale by removing hardware constraints compared to existing architectures. traditional and legacy storage arrays.

Shorter maturation time

IBM has also made implementing Ceph easier than ever. As IBM Storage Ready Nodes for Ceph, the platform can be deployed as a complete software and hardware solution and is offered in a variety of different capacity configurations optimized to run IBM Storage Ceph workloads. We make setup easier by removing any ambiguity, making it easier to digest, configure and administer.

The growth of IBM Storage Ceph is just another example of how IBM’s portfolio of storage hardware and software helps deliver shorter maturation with scaled capacity and performance to optimize costs for customers.

1708031813
#growth #IBM #Storage #Ceph #ideal #foundation #modern #data #lakehouse

Share:

Facebook
Twitter
Pinterest
LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.