I borrowed this blog title from a session at the recent Fujitsu Laboratories of America Technology Symposium 2015. As we know, software gets more attention than hardware does these days. Marc Andreesen said that software was eating the world. Some even say data is eating the worldinstead, as data is being given more attention.
In my opinion, hardware is getting less attention recently than software and data. As we require more computing power in the days of intelligent computing, new computer architectures are demanded. After all, software cannot run without hardware, and hardware needs to be high performance to provide powerful computing power. This session in the symposium was a timely discussion of current trends in the field.
Artificial intelligence (AI) requires more computing power, and that power comes from hardware; i.e., new and higher performance hardware architectures. This session was interesting because it discussed the new trend in hardware architecture from three different perspectives. All three of the panelists said that Moore’s Law has ended and that new approaches must be taken to cope with growing demands for computing power now and in the future. This means we are hitting the physical limit for transistor density on a chip, and alternative ways should be considered to cope with this limitation. Two possibilities are parallel/distributed computing and the use of some kind of accelerator. The combination of the two may also be applicable for this purpose.
(Source: Krste Asanović’s presentation at the symposium)
From left: Marc Hamilton of NVIDIA, Krste Asanović of UC Berkeley, and Andrew Putnam of Microsoft
The use of FPGA by Microsoft
Andrew Putnam of Microsoft Research discussed the new hardware architecture being applied to Microsoft businesses like Bing and Azure. According to Putnam, the life of servers in their data centers is about three years, during which they may be repurposed quite often as software demands change. Access to them is usually limited and, when they fail, they are discarded without repair and replaced with others. For Microsoft’s purposes, servers need to be homogeneous in their form factor and design as well as high performance and energy efficient. Note that commercial off-the-shelf (COTS) processors are flexible, while custom or dedicated hardware is more energy efficient and more powerful.
Because their servers, like those of Facebook, need custom designs, Microsoft joined the Open Compute Project (OCP) in January 2014 and holds a board seat. Microsoft made several contributions to OPC, including server/chassis specifications and designs.
Microsoft Open Compute Server (Source: Andrew Putnam’s presentation)
OCP was founded by Facebook in 2011. The most recent open compute US summit drew more than 2,500 people and got a lot of attention. In addition to Facebook and Microsoft, other participating companies include Apple, Rackspace, Cisco, Juniper Networks, Goldman Sachs, Fidelity, and Bank of America.
In short, Putnam said that their solution uses FPGA (field-programmable gate array) in the Catapult FPGA accelerator. The use of FPGA satisfies the requirements of their data centers, which are specialization and homogeneity. That is, the use of the same form factor blade means that each and every blade server is identical physically, but because of the FPGA card, it can be customized, establishing specialization at the same time. More details are given here. Additionally, the use of Azure smartNIC with FPGA allows implementation of software defined networking (SDN).
Krste Asanović of UC Berkeley talked about the end of Moore’s Law and said that the new direction should be above the transistor level for Warehouse-Scale Computers (WSCs), leading to more specialized hardware.
Asanović stated that we are in the era of open compute based on custom hardware with COTS chips, and that in 2020 new chip-scale open standards will emerge with custom hardware.
Some observations he made about Instruction Set Architecture (ISA):
- ISA does not matter.
- The cost of custom chips will go down because of amortization of capital equipment, CAD tools, libraries, and training.
- New hardware description languages like Genesis2 (Stanford University) and Scala-based Chisel (UC Berkeley) will allow productivity improvement and design cost reduction.
He recommended RISC-V (free, open, and energy efficient; its white paper) over X86 or ARM for Instruction Set Architecture (ISA). The important points are the quality of COTS implementation and the business model (e.g., no custom chip with Intel).
FireBox is a processor module being developed in the Berkeley ASPIRE lab and presented as a component to implement warehouse-scale computers. Asanović described FireBox at high levels: processor, rack, cluster, and warehouse.
His conclusions were:
- Chips/systems to exploit solid-state drive (SSD) should be developed.
- We should take advantage of custom chips.
- We need specialized coprocessors but not specialized cores.
Marc Hamilton of NVIDIA emphasized the use of GPU to offload computing loads.
NVIDIA positions itself as a visual computing company. A graphic processing unit (GPU) is the center of their product line. According to Hamilton, the following five Internet companies use GPUs most: Google, Alibaba.com, Facebook, Amazon, and Tencent.
Hamilton gave two examples from the recent GPU Technology Conference of ways in which GPU is being used. According to Jeff Dean of Google, Google uses it for speech, vision, language modeling, user prediction, and translation. His keynote speech at the conference is available in both text and video.
Another example was by Andrew Ng, who is a chief scientist of Baidu and a professor at Stanford University. In his keynote speech at the conference, he said that in five years 50% of search queries will be speech or images. His speech is available in text and video.
He said that the solution for an accelerated data center is a good combination of CPU and GPU, as shown below.
GPU and CPU to accelerate data centers (source: NVDIA’s website)
Sequential computing is done with a CPU, and parallel computing is done with a GPU. Dividing a task between the two yields more performance. He concluded the presentation by showing a complete product line for implementing accelerated data centers, as shown below.
Nvdia’s products for accelerated data centers
Each component is explained below. Each description is taken from NVDIA webpages and edited by the author.
Pascal: accelerate deep learning applications 10X beyond the speed of its current-generation Maxwell processor
NVLink: a fast interconnect between CPU and GPU, and among GPUs
Iray: a highly interactive and intuitive physically based rendering technology that generates photorealistic imagery by simulating the physical behavior of light and materials
vGPU: brings the full benefit of NVIDIA hardware-accelerated graphics to virtualized solutions
- cuDNN: GPU-accelerated library of primitives for deep neural networks and designed to be integrated into higher-level machine learning frameworks, such as the popular Caffe, Theano, or Torch software frameworks
Three new approaches to accommodating intelligent computing were presented. I had not been following hardware much, so this session opened my eyes to current trends in computer hardware and architecture.
The three presentations summarized the trends as:
- Open hardware design
- Custom hardware design
- Accelerator based on FGPA and GPU
It seems that COTS chips will not die out anytime soon, but custom hardware, including chips, will be a must for warehouse-scale computing, which requires enormous computing power.