The world of analytics is expanding – and the complexity is expanding right along with it. Data is stored in distributed file systems, on clustered server nodes, and on scale-up systems. It may be structured, or unstructured, data. And, applications that access the data can be written to run on physical servers, virtual servers (VMs), or containers. The result is a rapidly growing, complex environment for customers.
IBM’s approach to addressing these customer issues is to provide a new generation of advanced resource-management IBM Spectrum Computing software that supports software-defined infrastructure (SDI). The SDI functionality abstracts the control plane away from the underlying hardware – presenting a more unified view of all resources, and supporting access to data stored by many business units, using different technologies, in different ways.
Faster computing with better data access for these distributed computing environments is the goal of IBM Spectrum Conductor, announced June 2, 2016. IBM Spectrum Conductor’s role is to find the data resources, to enable access the data across platforms and to protect that data as it is gathered and processed. It provides a multi-tenant, converged application and data fabric. A configuration manager and policy-based provisioning are included, to ease management of large, distributed clustered resources. Based on two years of customer input on analytics best practices, the IBM Spectrum Conductor product shows a strong emphasis on performance. This is being done as IBM’s resource-management solutions compete with others built on Apache Mesos and Hadoop YARN for analytics.
Looking at this announcement, it is clear that IBM has chosen to focus on performance, optimization and predictable availability as differentiators that will lead analytics customers to evaluate its new IBM Spectrum Conductor product. The focus here is on scale-out computing, in which hundreds of individual server nodes are combined with clustering software to do work that once was done only on large scale-up servers or supercomputers. We expect to see more resource-management announcements in 2016, as IBM enhances the IBM Spectrum product portfolio.
Supporting SPARK, Open Source, and Non-SQL Databases
Open-source is a powerful force, as customers build out scale-out distributed storage for cloud and analytics. It’s clear from this announcement that IBM recognizes it must tap the momentum shown by Apache Spark in real-time analytics, because Spark leverages in-memory data for faster processing of large data volumes. In this offering, IBM will have a version of IBM Spectrum Conductor that supports Spark for analytics. The same customer sets often leverage NoSQL databases, such as Cassandra and MongoDB, in scale-out clusters.
It’s worth saying here that we have seen interest in Spark increasing in the open-source community, including frequent mentions at the OpenStack conference in Austin, Texas, in April, 2016. Driving interest in Spark are its use of in-memory data, and its deployment for real-time analytics for Internet of Things (IoT) and health-care decision-making.
Extending the IBM Spectrum LSF Customer Base
At the same time, IBM extended the functionality of IBM Spectrum LSF, which identifies and schedules work across high-performance computing (HPC) clusters supporting research and product design. Among target market segments: genome sequencing, car design, and market simulations. These customers now see that HPC results can be combined with other data to yield more inclusive analysis for the overall business. That tracks with the way that analytics has moved into wider views of data, and metadata, throughout a corporation, organization or government agency.
IBM Spectrum LSF, building on the Platform Computing LSF product that IBM acquired in 2012, supports sharing of storage resources across data center tiers – and workload scheduling. Now, IBM is providing faster processing, faster I/O – and reduced “wait” times between scheduled tasks. We view this as a logical and pragmatic step that addresses increased demand for timely data analytics driving business decisions.
The analytics market is changing, with support for larger data-stores, the open-source ecosystem and OpenStack – and improved access to extremely diverse data resources. Another driver for this increased demand is hybrid computing, linking on-prem and off-prem computing. By optimizing rapid access to data resources in this complex environment, IBM Spectrum Conductor addresses many of the IT issues faced by customers building multiple scale-out clusters for advanced analytics.