Connect with us

International Circuit

T-Mobile to slash $30 Million in cloud costs with Kubernetes

T-Mobile US has found success with Kubernetes and currently operates 20,000 containers across its virtualized cloud infrastructure. In fact, over the next year the carrier claims the combination of centralized management of Kubernetes containers and a normalized approach to clusters will help it to slash $30 million in cloud costs.

Thom McCann, senior manager and software engineer at T-Mobile, shared the carrier’s seven-year cloud journey progress this week as as part of D2iQ’s “Cloud Native Virtual Summit featuring Kubernetes.”

From building applications manually, progressing to automation, and then implementing container-based workloads at scale, T-Mobile’s centralized cloud team has laid “bedrock foundations in place to deliver consistent cloud consumption in a large organization,” McCann said.

That foundation is called Conducktor, which is T-Mobile’s internal Kubernetes platform. It has enabled the carrier’s cloud team to establish a common way to access the cloud, applications, security, and enabling applications teams to self-serve in their own organizations. Conducktor is also backed by the carrier’s telemetry platform, which gathers all the data in its clusters. Application teams are then able to scrape from the cluster data to identify memory percentages.

Driving down the cost of compute was no stroke of luck. T-Mobile hit an inflection point in late 2018, when the number of containers it was running in the cloud exceeded the number of virtual machines (VMs). Once that happened, McCann said the growth of VMs in the cloud eventually stopped and the number of containers started to surge. “We were on a path toward what I call ‘spending many millions of dollars’ to a path where we spend fewer millions of dollars,” he added.

McCann spoke about utilizing the public cloud and Kubernetes specifically to run automated, container-based workloads at scale, along with a number of other tools to deliver features and new capabilities to customers faster. T-Mobile refers to its use of containers as “un-carrier moves” as the company continues to “disrupt a broken and arrogant industry.” While John Legere has stepped down as CEO of the carrier, his bombastic language and un-canny marketing is clearly here to stay.

However, in this case, this isn’t so much an “un-carrier move” as it is a commitment to adopt container-based infrastructures needed to support and automate 5G networks and related services. These moves will soon be put to the test following T-Mobile finally closing its two-year quest to acquire smaller rival Sprint.

Observability to Normalize the Approach

Observability is one of the key pieces that McCann cited in his presentation.

“It’s taken a coordinated effort across teams at T-Mobile to be able to realize these savings, and it takes that extra effort of once someone’s in a container environment to make sure they’re optimized for container environments,” McCann said.

It’s hard to optimize clusters that are running in separate organizations, workgroups, or application teams if the cloud team is unable to track changes. To centralize the management of clusters, specific environments such as development, test, quality assurance (QA), and production are dedicated to individual clusters.

With Conducktor, users at T-Mobile generally don’t have to interact with the concept of a cluster. They get a namespace and in that namespace they have full permissions. Users can also work with the cloud team to define how they want to isolate certain aspects of their application, whether it be performance tests or components that need to be segmented.

“When you have a normalized approach to clusters and you own the platform that you’re running and most of your application teams are consuming Kubernetes in a normalized way, you then have the opportunity to optimize how you purchase from a cloud perspective, the underlying resources,” McCann explained.

Cost Optimization

Optimization, according to McCann, is focused on cost management and cost show back. On a cloud level, his team shows back all the costs of all the cloud usage that is available at T-Mobile to each individual application team and management. In order to ensure that application teams are using resources properly, his team also shows the optimized cost of action.

This is an automated process, he said, “because we have the data and we also have a large Prometheus infrastructure,” McCann noted of the open source monitoring tool.”We actually merge all the data together to show things like memory management, memory usage, and ship-off notifications via slack.” In doing so, the cloud team can identify whether a  particular application might be overusing memory or underutilized in some way.

“We really have developed an internal ethos around cost awareness for both the development teams and the team that runs the platform,” he added.

―SDX Central

Click to comment

You must be logged in to post a comment Login

Leave a Reply

Copyright © 2024 Communications Today

error: Content is protected !!