Public Cloud has won. No one deploys applications without leveraging Azure, AWS or GCP in some manner. Public cloud has proven that consistency, availability, and abstraction lead to better business growth, technical outcomes, and customer experiences. This can be seen by taking a quick look at the unicorn IPOs that have launched earlier this year; Lyft, Pinterest, Slack, etc. Not one of these companies can function without the public cloud. Even when it appears exorbitantly expensive, businesses have decided it is worth the cost for the flexibility and growth achieved since they can focus on their revenue generating applications and not on supporting their infrastructure.
However, the Public Cloud has not solved all problems. The 3 major cloud providers have spent years creating, supporting, and perfecting hyper-centralized, hyper-dense data centers. This has led to massively centralized data into few geographic locations close to the largest population centers. But as we all know, the world does not stand still and is rapidly moving to a real-time digital paradigm where the physical and digital worlds are colliding. Applications are no longer just tools we leverage to buy products, play a song or watch a video. More and more applications are becoming digital extensions of our physical world requiring a real-time response for natural interactions and localized data processing due to bandwidth constraints; and here lies the problem with the hyper-centralization of data.
The human mind can process changes in an image in ~7-10 milliseconds (ms) and cars moving at 75 mph travel 20ft in 180 milliseconds with each car generating multiple terabytes of data on a daily basis. The latency of the network and the bandwidth required to back-haul this information to a highly centralized public cloud in order support such use cases is not possible today, even if when accounting for 5G. There is simply not enough fiber in the ground and no one has solved that pesky speed of light problem! Public Cloud locations are at best 50ms from their users and add another 30-50ms when using 4G LTE technology. Which leads us to the next wave of infrastructure design, the Distributed Cloud.
What is the Distributed Cloud you ask? It is the hyper-localization of compute, network and storage to the user while providing and maintaining the experiences, level of service, and abstraction provided by the Public Cloud. This transition from centralized to Distributed Cloud platforms is already happening today, but it’s not being led by the 3 major cloud providers. Smaller regional SP’s and Distributed Cloud companies like Packet, Fastly and CloudFlare are leading the way by leveraging hyper-scale “cloud native” principles. AWS, Azure, and GCP have taught organizations how to manage infrastructure in an abstract way but their business model was never intended to extend to the edge. There are many reasons for this, but front and center are differences in design and mindset of managing 20 sites with 10,000+ devices Vs. 10,000+ sites with 20 devices, not to mention the financials required to maintain so many sites.
When building the Distributed Cloud you are limited by space, power, cost, and more importantly technical man-power. When deploying the network you can’t send technical resources to setup every site at the bottom of a cell-tower, in a branch retail store, or a remote office. Hardware redundancy is limited by cost and remote moves, adds, and changes need to be hitless, reliable and precisely aimed. You can’t risk upgrading BGP or the RIB when you are facing issues with LLDP. You can’t leave vulnerable network code in hundreds of unmanned sites and not have a way to fix it without impacting service.
We live in a 24/7 world and our application owners and users won’t tolerate outages and continue to use the service. They have become accustomed to an always on, always available world. None of which can be done with legacy monolithic network software. Legacy network software upgrades are complicated, requiring weeks or months of preparation, and are unreliable and risky when you don’t have local resources available to jump in when things go wrong. This has caused operators to reside to impactful updates in an attempt to manage results and expectations. Security vulnerability fixes are impossible to deploy in any reasonable amount of time, leaving them unfixed and you need technical experts onsite to setup the equipment. The modern solutions we have been supplied to manage these old designs have been an all or nothing centralized SDN controller. You can either have all forwarding and control-plane decisions made there or continue box by box management with limited API access or CLI-only control. There is no in-between.
Glenn Sullivan and I lived through this when we deployed real-time distributed applications in our former lives as operators and managing what happens the day after the vendor delivering the solution has claimed victory. We designed CN-NOS with what we saw as the best characteristics of networking in conjunction with the innovations the public cloud created and was being leveraged by our compute colleagues (I can’t recall a time when they were sent around the world for site bring-up as we were). We firmly disagree with centralized protocols and forwarding decisions that SDN architectures pushed us towards. Instead, we decided to intelligently segment and containerize network services standing by our belief that networks should continue to have distributed control and data planes with standard protocols.
Rather than rebuilding from scratch a traditional (and proprietary) management application, we embraced Kubernetes orchestration as the way to command and control our network service containers, but without requiring a Kubernetes cluster. CN-NOS works without a separate Kubernetes master. If you choose to utilize one to make it easier to manage a fleet of network devices and the Kubernetes master dies, loses connectivity or is simply removed, your network still continues to work and can be managed. Your CN-NOS instances are autonomous nodes with locally available APIs and a standard CLI. We view this as an absolute must when attempting to manage hundreds of sites across a wide geographic region.
As the world continues to move from the centralized Public Cloud to the Distributed Cloud with hyper-localized data to the users, a new NOS architecture and network is required. In this 24/7 world, where real-time applications are king, we need to embrace what we have learned from the Public Cloud and use it to enable network innovation and agility to meet the demands of the distributed one. Containerized hitless upgrades, software redundancy, Kubernetes management, and realistic costs are no longer “nice to haves”, they are absolute requirements.