Artificial intelligence, machine learning and cognitive capabilities are likely to lead IBM’s marketing for product differentiation for years to come. As a result of this corporate direction, IBM Systems is moving away from a technical roadmap approach for its POWER-based servers – and toward an approach to marketing that is centered on analytics capabilities and business outcomes. We believe this approach will be seen more widely in 2018 in other IBM offerings.
Now, we are seeing an IBM server platform optimized for AI, ML and Cognitive workloads coming to market. On Dec. 5, 2017, IBM announced the IBM Power System AC922 server, based on the POWER9 processor and OpenPower consortium technologies. [The next major POWER processor milestone, the POWER10 processor, is due in 2020, based on IBM’s 2015 POWER Systems roadmap].
The AC922 is intended to be used for AI analysis, high performance computing (HPC) and accelerated database workloads. This fits with IBM’s offerings for Big Data analytics, with a focus on analyzing large amounts of data to gain better, faster business insights that drive decision-making at businesses and organizations. The new system was designed to improve training time for deep learning, supporting faster development of AI applications.
We believe that the system optimization around cognitive, AI and high-performance computing is being done for three primary purposes:
- To ensure that AI, ML and Cognitive workloads will perform at high levels on the new POWER9-based hardware, increasing interest in POWER and OpenPower systems.
- To create product differentiation for purposes of competition with other types of scale-out systems for HPC and analytics.
- To drive continued IBM investment in POWER technologies to grow the installed base, and to engage with the wider OpenPower consortium of IBM partners, including Google, Nvidia, Mellanox and systems vendors.
The AC922 includes scale-out (Nimbus models) and scale-up (Cumulus models) systems, and these servers will generally be deployed in clusters. The scale-out models will be delivered as one-socket or two-socket nodes, while the scale-up models will be delivered as servers with four or more sockets.
Solution-Based Design for AI and Cognitive Workloads
IBM is focusing on AI, ML and Cognitive workloads as the design point for its AC922 POWER9-based system. Typical workloads in this space are running in AI frameworks like TensorFlow, Caffe and Chainer (blockchain). These workloads need on-chip acceleration, high-speed links (NVLINK and PCIe 4.0) and coherent memory links so that workloads can be deployed across extensible clusters of many servers. IBM is using the second generation of NVLINK, which provides twice the bandwidth of the NVLINK interconnect that was designed for use with the POWER8 processor.
If arranged in a multi-node supercomputing cluster, POWER9-based systems can take on massively scalable computing workloads. The Dec. 5 announcement mentioned the CORAL project, in which several Department of Energy research laboratories have deployed very large clusters for supercomputing HPC workloads.
Three national laboratories – Argonne, Oak Ridge, Lawrence Livermore – all early adopters of POWER processors – are working together to building the CORAL supercomputer based on large clusters of systems based on POWER9 processors. Another early adopter is Google, which has said it plans to deploy POWER-based systems in very large clusters for TensorFlow and other Google-based workloads.
Other uses will come in high-performance computing for science, research and industrial uses demanding on-chip acceleration, which POWER9 supports. It is also possible that these systems will be used in the IBM Cloud for AI, ML and cognitive analysis on behalf of end-customers of IBM Cloud services.
System Design With OpenPower Foundation Partners
By focusing on specific workloads, IBM has found a new way to bring POWER-based servers to market. This business model could be a Rosetta Stone that shows how IBM plans to build and deliver future POWER-based systems for analytics. IBM executives from its IBM Cognitive group said that IBM’s PowerAI capability speeds up data scientists’ tasks by optimizing and simplifying the deployment of deep learning frameworks and libraries on POWER9.
To build the AC922 system, IBM partnered with members of the OpenPower consortium, including Nvidia (NVLink interconnects); Mellanox (high-speed interconnects) and Google, which has said it will deploy OpenPower chips for its scale-out workloads, like TensorFlow for AI. Nvidia, Mellanox and Google are active in the OpenPower Foundation, which encourages system designs based on POWER architecture, and OpenPower open hardware specifications.
The centerpiece of the AC922 is the POWER9 processor itself, which brings a new microarchitecture into the 15-year-old POWER line of RISC chips. The POWER9 design, four years in the making, will enable high-speed data transfers via Open CAPI and NVLINK interconnects, helping to differentiate the IBM AC922 server.
Drivers for AI and ML Capabilities on POWER9
Why did IBM go to the trouble an expense of designing the 14nm POWER9 chip, and surrounding micro-architecture? It’s more about gaining and maintaining a competitive edge in AI, ML, cognitive and high-performance computing (HPC) than it is about selling record numbers of servers. These are growing market segments – and they demand high levels of performance to achieve the business aims of AI and ML customers to support rapid analytics and actionable business insights.
This laser-like focus on AI and HPC has led to a set of specifications involving POWER9 and OpenPOWER processors, Nvidia’s GPUs and Mellanox’ high-speed network interconnects. Key attributes are on-chip acceleration and links to fast GPUs developed by IBM partner Nvidia. Working together since 2014, IBM and Nvidia developed the NVLink interconnect that links POWER processors to Nvidia’s GPUs for graphics, analytics and visualization of data.
The Power Systems line of servers is important to enterprises, where its role as a platform for SAP applications, telcos and financial services workloads is well-established. It is also used as a platform for databases, including Oracle, Sybase and IBM DB2. Repeated sales of AIX and Linux applications that run on POWER RISC processors have sustained the traditional POWER systems business in recent years, although the number of annual shipments has declined during that time.
Hardware competitors for the processors themselves are few. In terms of volume and scale-out deployments, Intel’s Xeon and Xeon XP processor chips account for more than 90% of the overall market opportunity in terms of server shipments. Intel’s Xeon is widely used for scale-out server nodes in clusters, especially Linux-based clusters for HPC and Hadoop-style analytics. We also note here that Oracle SPARC servers and SPARC SuperClusters, based on SPARC RISC processors, are deployed in enterprise data centers and in the Oracle Cloud, leveraging on-chip acceleration to run enterprise workloads and databases.
Customers who are focused on Big Data, analytics and high-performance computing (HPC) will likely understand the AC922’s value proposition right away. However, IBM has some more work to do to promote this same understanding about key workloads among the broader IT and business audiences who may hear about it.
The important role that business managers play in acquiring named servers for on-premises use means that this audience is highly valued by IBM. These business-focused managers, who approve budgets, must be made more aware of the ways the IBM AC922 server will be leveraged by enterprise IT organizations and cloud providers for workload acceleration and analytics to gain business advantage.