FTvNF: Fault Tolerant Virtual Network Functions

Yotam Harchol, David Hay, and Tal Orenstein
ACM/IEEE ANCS,
2018
Conferences & Workshops
Fault tolerance, NFV/SDN

Abstract

One of the major concerns about Network Function Virtualization (NFV) is the reduced stability of virtual network functions (VNFs), compared to dedicated hardware appliances. Stateful VNFs make recovery a complex process, where a major concern is how to handle non-determinism such as multi-threaded processing, time dependence, and randomness.

In this paper we present FTvNF — a new approach for network functions recovery with very low overhead in failure-free time. This is in contrast to previous suggestions to take snapshots of the VNF state at certain checkpoints or to store the VNF state externally. Compared with state-of-the-art approaches, our approach significantly reduces the latency overhead incurred by the network elements, both in failure-free operations and when failures occur. In addition, our approach better suits the common case of NFV service chaining, as our mechanisms are applied once per chain, thus significantly improve the performance over approaches that treat each VNF separately.

@inproceedings{10.1145/3230718.3230731, author = {Harchol, Yotam and Hay, David and Orenstein, Tal}, title = {FTvNF: Fault Tolerant Virtual Network Functions}, year = {2018}, isbn = {9781450359023}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3230718.3230731}, doi = {10.1145/3230718.3230731}, abstract = {One of the major concerns about Network Function Virtualization (NFV) is the reduced stability of virtual network functions (VNFs), compared to dedicated hardware appliances. Stateful VNFs make recovery a complex process, where a major concern is how to handle non-determinism such as multi-threaded processing, time dependence, and randomness.In this paper we present FTvNF — a new approach for network functions recovery with very low overhead in failure-free time. This is in contrast to previous suggestions to take snapshots of the VNF state at certain checkpoints or to store the VNF state externally. Compared with state-of-the-art approaches, our approach significantly reduces the latency overhead incurred by the network elements, both in failure-free operations and when failures occur. In addition, our approach better suits the common case of NFV service chaining, as our mechanisms are applied once per chain, thus significantly improve the performance over approaches that treat each VNF separately.}, booktitle = {Proceedings of the 2018 Symposium on Architectures for Networking and Communications Systems}, pages = {141–147}, numpages = {7}, keywords = {network function virtualization, fault tolerance, NFV, service chaining}, location = {Ithaca, New York}, series = {ANCS ’18} }