Google Data Centres are the large data centre facilities Google uses to provide their services, which combine large amounts of digital storage (mainly hard drives and SSDs), compute nodes organized in aisles of racks, internal and external networking, environmental controls (mainly cooling and dehumidification), and operations software (especially as concerns load balancing and fault tolerance). This article is based on the public announcements made by the company in Google’s website.
Pursuing Google’s mission of organizing the world’s information to make it universally accessible and useful takes an enormous amount of computing and storage. In fact, it requires coordination across a warehouse-scale computer. Ten years ago, Google realized that they could not purchase, at any price, a datacentre network that could meet the combination of our scale and speed requirements. So, they set out to build their own datacentre network hardware and software infrastructure. Today, at the ACM SIGCOMM conference, they are presenting a paper with the technical details on five generations of their in-house data centre network architecture which presents the technical details.
From relatively humble beginnings, and after a misstep or two, they’ve built and deployed five generations of datacenter network infrastructure. Their latest-generation Jupiter network has improved capacity by more than 100x relative to their first generation network, delivering more than 1 petabit/sec of total bisection bandwidth. This means that each of 100,000 servers can communicate with one another in an arbitrary pattern at 10Gb/s.
Such network performance has been tremendously empowering for Google services. Engineers were liberated from optimizing their code for various levels of bandwidth hierarchy. For example, initially there were painful tradeoffs with careful data locality and placement of servers connected to the same top of rack switch versus correlated failures caused by a single switch failure. A high performance network supporting tens of thousands of servers with flat bandwidth also enabled individual applications to scale far beyond what was otherwise possible and enabled tight coupling among multiple federated services. Finally, they were able to substantially improve the efficiency of compute and storage infrastructure. As quantified in their recent paper, scheduling a set of jobs over a single larger domain supports much higher utilization than scheduling the same jobs over multiple smaller domains.
Delivering such a network meant google had to solve some fundamental problems in networking. Ten years ago, networking was defined by the interaction of individual hardware elements, e.g., switches, speaking standardized protocols to dynamically learn what the network looks like. Based on this dynamically learned information, switches would set their forwarding behavior. While robust, these protocols targeted deployment environments with perhaps tens of switches communicating between between multiple organizations. Configuring and managing switches in such an environment was manual and error prone. Changes in network state would spread slowly through the network using a high-overhead broadcast protocol. Most challenging of all, the system could not scale to meet our needs.
Google adopted a set of principles to organize our networks that is now the primary driver for networking research and industrial innovation, Software Defined Networking (SDN). They observed that they could arrange emerging merchant switch silicon around a Clos topology to scale to the bandwidth requirements of a data center building. The topology of all five generations of our data center networks follow the blueprint below. Unfortunately, this meant that they would potentially require 10,000+ individual switching elements. Even if they could overcome the scalability challenges of existing network protocols, managing and configuring such a vast number of switching elements would be impossible.
So, google next observation was that they knew the exact configuration of the network they were trying to build. Every switch would play a well-defined role in a larger-ensemble. That is, from the perspective of a datacenter building, wished to build a logical building-scale network. For network management, built a single configuration for the entire network and pushed the single global configuration to all switches. Each switch would simply pull out its role in the larger whole. Finally, google transformed routing from a pair-wise, fully distributed (but somewhat slow and high overhead) scheme to a logically-centralized scheme under the control of a single dynamically-elected master as illustrated in the figure below :
Outcome of Research
Their SIGCOMM paper on Jupiter is one of four Google-authored papers being presented at that conference. Taken together, these papers show the benefits of embedding research within production teams, reflecting both collaborations with PhD students carrying out extended research efforts with Google engineers during internships as well as key insights from deployed production systems:
- Google work on Bandwidth Enforcer shows how they can allocate wide area bandwidth among tens of thousands of individual applications based on centrally configured policy, substantially improving network utilization while simultaneously isolating services from one another.
- Condor addresses the challenges of designing data center network topologies. Network designers can specify constraints for data center networks; Condor efficiently generates candidate network designs that meet these constraints, and evaluates these candidates against a variety of target metrics.
- Congestion control in datacentre networks is challenging because of tiny buffers and very small round-trip times. TIMELY shows how to manage datacentre bandwidth allocation while maintaining highly responsive and low latency network roundtrips in the data center.
These efforts reflect the latest in a long series of substantial Google contributions in networking. They are excited about being increasingly open about results of their research work: to solicit feedback on their approach, to influence future research and development directions so that they can benefit from community-wide efforts to improve networking, and to attract the next-generation of great networking thinkers and builders to Google. Their focus on Google Cloud Platform further increases the importance of being open about their infrastructure. Since the same network powering Google infrastructure for a decade is also the underpinnings of our Cloud Platform, all developers can leverage the network to build highly robust, manageable, and globally scalable services.