Network control in the virtualized and containerized network environment is undergoing a paradigm shift with model-based, analytics and machine-learning algorithms. Service providers are increasingly relying on automation to both minimize potential issues from human error and enable real-time responses to network events including cybersecurity threats, equipment failure and demand changes. Eventually, when machine learning matures by training on massive volumes of data, the network is expected to be self-healing and self-correcting.
In Part 1 of a two-part Q&A, Telco Transformation spoke to Andrew Dugan, chief technology officer for Level 3 Communications about the challenges and opportunities of utilizing artificial intelligence, machine learning and analytics in streamlining the network. Dugan joined Level 3 in 1998 and has 30 years of experience leading technology teams and in building telecom networks, switching platforms and services platforms.
Want to know more about NFV and open source strategies? Join us in Austin at the fourth annual Big Communications Event. There's still time to register and
communications service providers get in free.
Telco Transformation: How mature are model-driven, intent-based management methods in Level 3's current network environment?
Andrew Dugan: Automated methods exist across our network including CDN [content delivery network], video, security, voice/VoIP, L2/L3 VPN, Internet and transport. Certainly, not everything is automated but some automation at all layers for provisioning and management has been implemented. Model-driven management methods are used for these purposes.
We have an interesting vantage point on this challenge given Level 3's highly distributed global CDN, which was built over the last two decades. The core principles in developing our CDN assets mirror the promise of cloud and virtual services: a global, web-scale application that services our customers' end users' sessions as close to their physical locations as possible. A highly instrumented and automated design of the network -- including service, platform and IP network metrics -- allows for load distribution, fault isolation and automated service repair in a constantly evolving environment where you have dynamic adaptation to changing customer demand, capacity constraints and network events.
TT: How do you monitor real-time performance and load balancing?
AD: Level 3 has developed a control system that monitors real-time performance and dynamically adjusts load on servers to distribute load across the network. This is done by understanding the load of each server in the network as well as understanding the details of each content request and where the requesting user is located on the Internet. This allows the network to balance load distribution with end-user performance.
TT: What methods are you applying and how are you benefiting from them?
AD: Based on our internal best practices of building and maintaining this global CDN platform, we are applying network virtualization, disaggregation and more cloud-centric approaches to networking. Instead of the more traditional methods of monitoring for alarms and reacting to those alarms, service and network management in this environment is based on dynamic performance. For an operator to be effective in this new world where services can easily transition between data centers, geographies and providers, service management must be approached with methodologies that can accommodate this fluidity. Getting the network management approach right enables the insights needed to realize the full flexibility, scalability and resiliency potential of cloud and cloud-like applications.
While our CDN network and management infrastructure has had 20 years to mature in its control and management methods, Level 3 has also seen significant benefit from applying data analytics and machine learning over the last few years to our security services. Our security management is model-based and dynamic in the sense that our threat analytics detect bad traffic patterns on the Internet using a model-driven, pattern-matching approach. Once detected, those threats can be mitigated using our automated provisioning and orchestration platform to push filters into the network to stop the malicious traffic.
TT: For virtualized networks, CSPs need real-time data or streaming events data, and the capability to store it in a database like Hadoop, instead of in active or passive probes which was done in the past with physical networks. How feasible has this method of data collection been so far and how good is the quality of data for analytics?
AD: Whether you're reporting on CDN, Internet threats or virtualized networking resources, you need to leverage real-time (streamed) and historical (trended) network and service performance data. Level 3 uses high-frequency service polling and log collection, along with the industry-leading big data technologies to gather, stream and process service and platform quality data from our network. We collect logs from much of our network infrastructure -- router logs, CDN logs and DNS logs.
The variety of VNF data transports and models presents unique challenges that can delay real-time results or cause us to lose fidelity in the data if it's processed inefficiently. To help deal with the variety of data formats coming from different physical or virtual elements, we have built an efficient normalization layer that collects data from these disparate elements and generates a standardized feed to our collection and analysis systems. This does take additional effort for each new service or element we add to our network, and it can add development time and complexity to our environment. However, we look to drive normalized model standards to VNF and physical network function providers wherever reasonable.
— Kishore Jethanandani, Contributing Writer, Telco Transformation