Archive for January, 2015

Noction Intelligent Routing Platform (IRP) – What is it?

January 27, 2015 6 comments

I was contacted by some people at Noction and asked if I was interested in writing about their platform, the Intelligent Routing Platform (IRP). Since it’s a product that uses Border Gateway Protocol (BGP), it peaked my interest. First let’s make the following things clear:

  • I am not being paid to write this blog post
  • My opinions can’t be bought
  • I will only write about a product if it’s something that interests me

BGP is the glue of the Internet (with DNS) and what keeps everything running. BGP is a well designed and scalable protocol which has been around for a long time. It has grown from carrying a few hundred routes to half a million routes. However, there will always be use cases where BGP might not fit your business model.

In Noction’s white paper they define the following as the network’s major challenges:

  • Meeting the customer’s demand for 100% uptime
  • Facing the low latency requirement
  • Achieving reliable data transmission
  • Avoiding network congestion and blackouts
  • Achieving consistency of throughput
  • Keeping bandwidth usage below predefined commit levels
  • Reducing the cost and time of network troubleshooting

The product is designed for multihomed networks running BGP. You can’t optimize network flows if you don’t have any other paths to switch to. Some of these challenges apply to all networks and some may be a bit more local. As an example, in Sweden (where I live), you usually pay a fixed amount for your bandwidth and you can use that all you want without going above some threshold defined by the Service Provider (SP).

So why do we have these challenges? Is it BGP’s fault? BGP has a lot of knobs but they are quite blunt tools. We need to keep in mind that BGP runs between organizations and every organization must make their own decisions on how to forward traffic. This means that there is no end to end policy to optimize the traffic flowing across these organizations.

If history has learned us anything, it is that protocols that try to keep too much state will eventually fail or hit scaling limitations. These protocols seem very intelligent and forward thinking at first but as soon as they hit large scale, the burden becomes too much. One such protcol is Resource Reservation Protocol (RSVP). BGP’s design is what has kept the Internet running for decades, this would not be the case if we were to inject all kind of metrics, latencies, jitter etc for all of the Network Layer Reachability Information (NLRI). As communities have grown more popular there could be a use case where information is tagged along as communities for the NLRI. The question is then, how often do we update the communities?

Does this mean that these are not real challenges or that there is no room for a product like Noction IRP? No, it means that unique forwarding decisions and intelligence needs to be kept at the edge of the network, not in the core. We should keep as little state as possible in the core for networks that need high availability.

How does BGP select which routes are the best? The default is to simply look at the AS-path:, the shorter AS-path, the better. Meaning that the traffic will pass through as few organizations as possible. This does not however give any consideration to how much bandwidth is available, nor takes into account latency and jitter of the path and the availability of the path.

How does this product work? The following picture shows the key components of IRP:


There is a collector that passively analyzes the traffic flowing to see which prefixes are being used the most, between which endpoints is the traffic flowing and so on. The collector can gather this data from a mirror port or preferably from Netflow/sFlow.

The Explorer will actively probe relevant prefixes for metrics such as latency, jitter and packet loss. This data is then sent to the Core.

The Core is based on the data received from the Explorer calculating improvements to optimize metrics such as latency, jitter and packet loss or the most cost effective path. These improvements are sent to the BGP daemon which will advertise BGP Updates to the edge router(s).

IRP is non-intrusive and does not sit in the data path. If IRP were to fail, traffic would fall back to their normal paths following the shortest AS-path or any other policies defined on the edge router. IRP can also act in BGP non-intrusive mode where it will report potential improvements without applying them.

If we pause here for a second, this sounds a lot like Performance Routing (PfR), doesn’t it? So what value would IRP add that PfR does not? I see mainly two benefits here. PfR may require a more senior network administrator to setup and administer, however PfR has been greatly simplified in later releases. The other main factor is the reporting through the frontend. PfR does not give you the monitoring platform, which is not to be expected of course.

When you login to the IRP you get a dashboard showing the status of the system and the number of prefixes being probed and how many of those prefixes are being improved.


In the demo, there are two service providers called “SwiftWay” and “FiberRing”. There is a graph to show how many prefixes have been rerouted to one of the providers.


There is also a list that shows you which prefixes were moved, what’s the AS number and the reason for being moved. If you do a mouseover on the flash symbol, it will show if the improvement was due to loss or latency.


There are a lot of different reports that can be generated. A nice feature is that all reports are exportable to CSV, XLS or PDF.


This report shows how loss has been improved: 75% of loss was totally avoided and 25% of loss was reduced.


There are also graphs showing top usage of traffic by AS or, as in this case, the bandwidth used per provider.


The monitoring and reports are extensive and easy to use. The IRP is certainly an interesting platform and depending on the business case it could be very useful. The main considerations would be how sensitive are you to loss and latency? How much does it cost you if you are not choosing the most optimal path? Do you trust a system to make these decisions for you? If you do, then certainly take a look at the Noction IRP.

Categories: Announcement Tags: , ,

Cisco Reveals New Products – The Time of Multigigabit is Here

January 20, 2015 2 comments

Wireless networks are becoming faster and faster. With 802.11ac Wave 2, wireless networks will be capable of achieving speeds up to 6.8 Gbps. This creates challenges when connecting APs to switches which normally run Ethernet at 1GE or 10GE. To meet these evolving demands, Cisco has as of today revealed some new products.

Cisco is releasing a new compact switch supporting multigigabit technology, the Cisco Catalyst 3560-CX. The most compelling new features are support for multigigabit interfaces, more power available for PoE, support for 10GE on the uplinks and being able to be deployed as an Instant Access switch. It also support PoE pass through which can help save on long cable runs. The Catalyst 3560-CX supports two multigigabit interfaces.


This device is fanless, so it can be deployed in cubicles to decrease the need for a wiring closet. It also has the support for role based security. Cisco’s goal is to provide for a better working environment, which they call “Next Generation Workspace”.


If you are a technical person, you are probably wonder about the multigigabit ports. IEEE only has 1GE, 10GE and so on. Cisco started the NBASE-T Alliance with Aquantia, Freescale, and Xilinx. Other members have joined since. They are also working with the IEEE to make these multigigabit Ethernet technology a standard.

With 802.11ac Wave 2 comes the possibility for having multiple conversations at the same time. Basically taking wireless technology from being a hub to a switch.


This then creates challenges with Cat5e cables being limited to 1 Gbps and the support for PoE on multigigabit interfaces.

The new rates for multigigabit ports will be 2.5 Gbps and 5 Gbps and PoE are also supported on these ports.


There is also a new line card for the Catalyst 4500E with 48 ports where 12 of the ports are multigigabit capable. Then there’s also a new version of the Catalyst 3850 in either 24 or 48 port models where half of the ports support multigigabit, so either 12 or 24 ports will be multigigabit capable. The Catalyst 3850 will also support 40GE uplinks which is another nice addition.


The new Catalyst 3850 is compatible with the older model so you can stack them together if you want to.


To support the increase of traffic in the backbone, there is a new line card for the Catalyst 6800 and Catalyst 6500-E switches which supports 32 ports of 10GE, there is also the possibility of converting ports to 40GE.

Cisco has also increase the scale of Instant Access from around 1000 ports to 2000 ports. The scalability was a bit limited earlier for larger networks so this is a welcomed increase.


Wired and wireless networks are converging. To support this there is a need for interfaces capable of more than 1GE. Cisco is now preparing for the next wave of 802.11ac and more of their products are getting support for Instant Access. This will speed up the convergence of wired and wireless networks and make it easier for network administrators to manage their network. Follow this link to find out more on multigigabit.

QoS Design Notes for CCDE

January 17, 2015 14 comments

Trying to get my CCDE studies going again. I’ve finished the End to End QoS Design book (relevant parts) and here are my notes on QoS design.

Basic QoS

Different applications require different treatment, the most important parameters are:

  • Delay: The time it takes from the sending endpoint to reach the receiving endpoint
  • Jitter: The variation in end to end delay between sequential packets
  • Packet loss: The number of packets sent compared to the number of received as a percentage

Characteristics of voice traffic:

  • Smooth
  • Benign
  • Drop sensitive
  • Delay sensitive
  • UDP priority

One-way requirements for voice:

  • Latency ≤ 150 ms
  • Jitter ≤ 30 ms
  • Loss ≤ 1%
  • Bandwidth (30-128Kbps)

Characteristics for video traffic:

  • Bursty
  • Greedy
  • Drop sensitive
  • Delay sensitive
  • UDP priority

One-way requirements for video:

  • Latency ≤ 200-400 ms
  • Jitter ≤ 30-50 ms
  • Loss ≤ 0.1-1%
  • Bandwidth (384Kbps-20+ Mbps)

Characteristics for data traffic:

  • Smooth/bursty
  • Benign/greedy
  • Drop insensitive
  • Delay insensitive
  • TCP retransmits

Quality of Service (QoS) – Managed unfairness, measured numerically in latency, jitter and packetloss

Quality of Experience (QoE) – End user perception of network performance, subjective and can’t be measured


Classification and marking tools: Session, or flows, are analyzed to determine what class the packets belong to and what treatment they should receive. Packets are marked so that analysis happens a limited number of times, usually at ingress as close to the source as possible. Reclassification and remarking is common as the packets traverse the network.

Policing, shaping and markdown tools: Different classes of traffic are alotted portions of the network resources. Traffic may be selectively dropped, delayed or remarked to avoid congestion when it exceeds the available network resources. Traffic can be dropped (policing), slowed down (shaped) or remarked (markdown) to conform.

Congestion management or scheduling tools: When there is more traffic than available network resources it will be queued. For traffic classes that don’t react well to queueing they can be denied access by a scheduling tool to avoid lowering quality of the existing flows.

Link-specific tools: Link fragmentation and interleaving fits into this category.

Packet Header

IPv4 packet has 8-bit Type of Service (ToS) field, IPv6 packet has 8-bit Traffic Class field. The first three bits are IP Precedence (IPP) bits for a total of 8 classes. The first three bits in combination with the nex three is known as DSCP for a total of 64 classes.

At layer two the most common marking is 802.1p Class of Service (CoS) or MPLS EXP bits, each using three bits for a total of 8 classes.

QoS Deployment Principles

  1. Define business/organizational objectives of QoS deployment. This may including provisioning real-time services for voice/video traffic or guaranteeing bandwidth for critical business applications and also managing scavenger traffic. Seek executive endorsement of the business objectives to not derail the process later on.
  2. Based on the business objectives, determine how many classes of traffic is needed. Define an end-to-end strategy how to identify the traffic and treat it across the network.
  3. Analyze the requirements of each application class so that the proper QoS tools can be deployed to meet these requirements.
  4. Design platform-specific QoS policies to meet the requirements with consideration for appropriate Place In the Network (PIN).
  5. Test the QoS designs in a controlled environment.
  6. Begin deployment with a closely monitored and evaluated pilot rollout.
  7. The tested and pilot proven QoS designs can be deployed to the production network in phases during scheduled downtime.
  8. Monitor service levels to make sure that the QoS objectives are being met.

The common mistake is to make it a technical process only and not research the business objectives and requirements.

QoS Feature Sequencing

Classification: The identification of each traffic stream.

Pre-queuing: Admission decisions, and dropping and marking the packet, are best applied before the packet enters a queue for egress scheduling and transmission.

Queueing: Scheduling the order of packets before transmission.

Post-queueing: Usually optional, sometimes needed to apply actions that are dependent on the transmission order of packets, such as sequence numbering(e.g. compression and encryption), which isn’t known until the QoS scheduling function dequeues the packets based on the priority rules.

Security and QoS

Trust Boundaries

A trust boundary is a network location where packet markings are not accepted and may be rewritten. Trust domains are network locations where packet markings are accepted and acted on.

Network Attacks

QoS tools can mitigate the effects of worms and DoS attacks to keep critical applications available during an attack.

Recommendations and Guidelines

  • Classify and mark traffic as close to the source as technically and administratively feasible
  • Classification and marking can be done on ingress or egress but queuing and shaping are usually done on egress
  • Use an end-to-end Diffserv PHB model for packet marking
  • Less granular fields such as CoS and MPLS EXP should be mapped to DSCP as close to the traffic source as possible
  • Set a trust boundary and mark or remark traffic that comes in beyond the boundary
  • Follow standards based Diffserv PHB markings if possible to ensure interopability with SP networks, enterprise networks or merging networks together
  • Set dscp and set precedence should be used to mark all IP traffic, set ip dscp and set ip precedence only marks IPv4 packets
  • When using tunnel interfaces, think of feature sequencing to make sure that the inner or outer packet headers (or both) are marked as intended

Policing and Shaping Tools

Policer: Checks for traffic violations against a configured rate. Does not delay packets, takes immediate action to drop or remark packet if exceeding rate.

Shaper: Traffic smoothing tool with the objective to buffer packets instead of dropping them, smoothing out any peaks of traffic arrival to not exceed configured rate.

Characteristics of a policer:

  • Causes TCP resends when traffic is dropped
  • Inflexible and inadaptable;makes instantaneous packet drop decisions
  • An ingress or egress interface tool
  • Does not add any delay or jitter to packets
  • Rate limiting without buffering

Characteristics of a shaper:

  • Typically delays rather than drops exceeding traffic, causes fewer TCP resends
  • Adapts to congestion by buffering exceeding traffic
  • Typically an egress interface tool
  • Adds delay and jitter if rate exceeds the shaper
  • Rate limiting with buffering

Placing Policers and Shapers in the Network

Policers make instantaneous decisions and should be deployed ingress, don’t transport packets if they are going to be dropped anyway. Policers can also be placed on egress to limit a traffic class at the edge of the network.

Shapers are often deployed as egress tools, commonly on enterprise to SP links to not exceed the commited rate of the SP.

Tail Drop and Random Drop

Tail drop means dropping the packet that is at the end of an queue. The TX ring is always FIFO, if a voice packet is trying to get into the TX ring but it’s full it will get dropped because it’s at the tail of the queue. Random drop via Random Early Detection (RED) or Weighted Random Early Detection (WRED) tries to keep the queues from becoming full by dropping packets from traffic classes to cause TCP slowing down.

Recommendations and Guidelines

  • Police as close to the source as possible, preferably on ingress.
  • Single rate three color policer handles bursts better than single rate two color policer resulting in fewer TCP retransmissions
  • Use a shaper on interfaces where speed mismatches, such as buying a lower rate than physical speed or between a remote-end access link and the aggregated head-end link
  • When shaping on an interface carrying real-time traffic, set the Tc value to 10 ms

Scheduling Algorithms

Strict priority: Lower priority queues are only served when higher priority queues are empty. Can potentially starve traffic in lower priority queues.

Round robin: Queues are served in a set sequence, does not starve traffic but can add unpredictable delays in real-time, delay sensitive traffic.

Weighted fair: Packets in the queue are weighted, usually by IP precedence so that some queues get served more often than others. Does not provide bandwidth guarantee, the bandwidth per flow varies based on number of flows and the weight of each flow.

WRED is a congestion avoidance tool and manages the tail of the queue. The goal is to avoid TCP synchronization where all TCP flows speed up and slow down at the same time, which leads to poor utilization of the link. WRED has little or no effect on UDP flows. WRED can be used to set the RFC 3168 IP ECN bits to indicated that it is experiencing congestion.

Recommendations and Guidelines

  • Critical applications like VoIP requires service guarantees regardless of network conditions. This requires to enable queueing on all nodes with a potential for congestion.
  • A large number of applications end up in the default class, reserve 25% for this default Best Effort class
  • For a link carrying a mix of voice, video and data traffic, limit the priority queue to 33% of the link bandwidth
  • Enable LLQ if real-time, latency sensitive traffic is present
  • Use WRED for congestion avoidance on TCP flows but evalute if it has any traffic on UDP flows
  • Use DSCP-based WRED wherever possible

Bandwidth Reservation Tools

Measurement based: Counting mechanism to only allow a limited number of calls (sessions). Normally statically configured by an administrator.

Resource based: Based on the availability of resources in the network, usually bandwidth. Uses the current status of the network to base its decision.

Resource Reservation Protocol (RSVP) is a resource based protocol, commonly used with MPLS-TE. The drawback of RSVP is that it requires a lot of state in the devices.

AC functionality is most effectively deployed at aplication level such as with Cisco Unified Communications Manager (CUCM). It works well in networks with limited complexity and where flows are of predictable bandwidth.

RSVP can be used in combination with Diffserv in an Intserv/Diffserv model where RSVP is only responsible for admission control and Diffserv for the queuing.

A RSVP proxy can be used because end devices such as phones and video endpoints usually don’t support the RSVP stack. A router closest to the endpoint is then used as a proxy together with CUCM to act as an AC mechanism.

Recommendations and Guidelines

Cisco recommends using RSVP Intserv/Diffserv model with a router-based proxy device. This allows for scaling of policies together with a dynamic network aware AC.

IPv6 and QoS

IPv6 headers are larger in size so bandwidth consumption for small packet sizes is higher. IPv4 header is normally 20 bytes but IPv6 is 40 bytes. IPv6 has a 20-bit Flow Label field and 8-bit Traffic Class field.


Modern applications can be difficult to classify and can consists of multiple types of traffic. Webex provides text, audio, instant messaging, application sharing and desktop video conferencing through the same application. NBAR2 can be used to identify applications.

Application Visibility Control (AVC)

Consists of NBAR2, Flexible Netflow (FNF) and MQC. NBAR2 is used to identify traffic through Deep Packet Inspection (DPI), FNF reports on usage and MQC is used for the configuration.

FNF uses Netflow v9 and IPFIX to export flow record information. It can monitor L2 to L7 and identify apps by port and through NBAR2. When using NBAR2, CPU usage may increase significantly as well as memory usage. This is also true for FNF. Consider the performance impact before deploying it.

QoS Requirements and Recommendations by Application Class

Voice requirements:

  • One-way latency should be no more than 150 ms
  • One-way peak-to-peak jitter should be no more than 30 ms
  • Per-hop peak-to-peak jitter should be no more than 10 ms
  • Packet loss should be no more than 1%
  • A range of 20 – 320 Kbps of guaranteed priority bandwidth per call (depends on sampling rate, codec and L2 overhead)

Voice recommendations:

  • Mark to Expedited Forwarding (EF) / DSCP 46
  • Treat with EF PHB (priority queuing)
  • Voice should be admission controlled

May use jitter buffers to reduce the effects of jitter, however it does add delay. Voice packets are constant in size which means bandwidth can be provisioned accurately. Don’t forget to account for L2 overhead.

Broadcast video requirements:

  • Packet loss should be no more than 0.1%

Broadcast video recommendations:

  • Mark to CS5 / DSCP 40
  • May be treated with EF PHB (priority queuing)
  • Should be admission controlled

Flows are usually unidirectional and include application level buffering. Does not have strict jitter or latency requirements.

Real-time interactive video requirements:

  • One-way latency should be no more than 200 ms
  • One-way peak-to-peak jitter should be no more than 50 ms
  • Per-hop peak-to-peak jitter should be no more than 10 ms
  • Packet loss should be no more than 0.1%
  • Provisioned bandwidth depends on codec, resolution, frame rates, additional data components and network overhead

Real-time interactive video recommendations:

  • Should be marked with CS4 / DSCP 32
  • May be treated with an EF PHB (priority queuing)
  • Should be admission controlled

Multimedia conferencing requirements:

  • One-way latency should be no more than 200 ms
  • Packet loss should be no more than 1%

Multimedia conferencing recommendations:

  • Mark to AF4 class (AF41/AF42/AF43 or DSCP 34/36/38)
  • Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED
  • Should be admission controlled

Multimedia streaming requirements:

  • One-way latency should be no more than 400 ms
  • Packet loss should be no more than 1%

Multimedia streaming recommendations:

  • Should be marked to AF3 class (AF31/AF32/AF33 or DSCP 26/28/30)
  • Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED
  • May be admission controlled

Data applications can be divided into Transactional Data (low latency) or Bulk Data (high throughput)

Transactional data recommendations:

  • Should be marked to AF2 class (AF21/AF22/AF23 or DSCP 18/20/22)
  • Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED

This class may be subject to policing and remarking. Applications in this class can be Enterprise Resource Planning (ERP) or Customer Relationship Management (CRM).

Bulk data recommendations:

  • Should be marked to AF1 class (AF11/AF12/AF13 or DSCP 10/12/14)
  • Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED
  • Deployed in moderately provisioned queue to provide a degree of bandwidth constraint during congestion, to prevent long TCP session from dominating network bandwidth

Example applications are e-mail, backup operations, FTP/SFTP transfers, video and content distribution.

Best effort data recommendations:

  • Mark to DF (DSCP 0)
  • Provision in dedicated queue
  • May be provisioned with guaranteed bandwidth allocation and WRED/RED

Scavenger traffic recommendations:

  • Should be marked to CS1 (DSCP 8)
  • Should be assigned a minimally provisioned queue

Example traffic is Youtube, Xbox Live/360 movies, iTunes, Bittorrent.

Control plane traffic can be divided into Network Control, Signaling and Operations/Administration/Management (OAM).

Network Control recommendations:

  • Should be marked to CS6 (DSCP 48)
  • May be assigned a moderately provisioned guaranteed bandwidth queue

Do not enable WRED. Example traffic is EIGRP, OSPF, BGP, HSRP and IKE.

Signaling traffic recommendations:

  • Should be marked to CS3 (DSCP 24)
  • May be assigned a moderately provisioned guaranteed bandwidth queue

Do not enable WRED. Example traffic is SCCP, SIP and H.323.

OAM traffic recommendations:

  • Should be marked to CS2 (DSCP 16)
  • May be assigned a moderately provisioned guaranteed bandwidth queue

Do not enable WRED. Example traffic is SSH, SNMP, Syslog, HTTP/HTTPs.

QoS Design Recommendations:

  • Always enable QoS in hardware as opposed to software if possible
  • Classify and mark as close to the source as possible
  • Use DSCP markings where available
  • Follow standards based DSCP PHB markings
  • Police flows as close to source as possible
  • Mark down traffic according to standards based rules if possible
  • Enable queuing at every node that has potential for congestion
  • Limit LLQ to 33% of link capacity
  • Use AC mechanism for LLQ
  • Do not enable wred for LLQ
  • Provision at least 25% for Best Effort traffic

QoS Models:

Four-Class Model:

  • Voice
  • Control
  • Transactional Data
  • Best Effort

Eight-Class Model:

  • Voice
  • Multimedia-conferencing
  • Multimedia-streaming
  • Network Control
  • Signaling
  • Transactional Data
  • Best Effort
  • Scavenger

Twelve-Class Model:

  • Voice
  • Broadcast Video
  • Real-time interactive
  • Multimedia-conferencing
  • Multimedia-streaming
  • Network Control
  • Signaling
  • Management/OAM
  • Transactional Data
  • Bulk Data
  • Best Effort
  • Scavenger

This picture shows how different size models can be expanded or vice versa.

QoS models

Campus QoS Design Considerations and Recommendations:

The primary role of QoS is campus networks is not to control latency or jitter, but to manage packet loss. Endpoints normally connect to the campus at high speeds, it may only take a few milliseconds if congestion to overrun the buffers of switches/linecards/routers.

Trust Boundaries:

Conditionally trusted endpoints: Cisco IP phones, Cisco Telepresence, Cisco IP video surveillance cameras, Cisco digital media players.

Trusted endpoints: Centrally administered PCs and endpoints, IP video conferencing units, managed APs, gateways and other similar devices.

Untrusted endpoints: Unsecure PCs, printers and similar devices.

Port-Based QoS versus VLAN-based QoS versus Per-Port/Per-VLAN QoS

Design recommendations:

  • Use port-based QoS when simplicity and modularity are the key design drivers
  • Use VLAN-based QoS when looking to scale policies for classification, trust and marking
  • Do not use VLAN-based QoS to scale (aggregate) policing policies
  • Use per-port/per-VLAN when supported and policy granularity is the key design driver

EtherChannel QoS

  • Load balance based on source and destination IP or what is expected to give the best distribution of traffic
  • Be aware that multiple real-time flows may up on the same physical link and oversubscribing the real-time queue

EtherChannel QoS will vary by platform and some policies are applied to the bundle and some to the physical interface.

Ingress QoS Models:

Design recommendations:

  • Deploy ingress QoS models such as trust, classification and policing on all access edge ports
  • Deploy ingress queuing (if supported and required)

The probability for congestion on ingress is less than on egress.

Egress QoS Models:

Design recommendations:

  • Deploy egress queuing policies on all switch ports
  • Use a 1 priority queue and 3 normal queues or better queuing structure

Enable trust on ports leading to network infrastructure and similar devices.

Trusted Endpoint:

  • Trust DSCP
  • Optional ingress marking and/or poling
  • Minimum 1P3Q

Untrusted Endpoint:

  • No trust
  • Optional ingress marking and/or poling
  • Minimum 1P3Q

Conditionally Trusted Endpoint:

  • Conditional trust with trust CoS
  • Optional ingress marking and/or poling
  • Minimum 1P3Q

Switch to Switch/Router Port QoS:

  • Trust DSCP
  • Minimum 1P3Q

Control Plane Policing

Can be used to harden the network infrastructure. Packets handled by main CPU typically include the following:

  • Routing protocols
  • Packets destined to the local IP of the router
  • Packets from management protocols such as SNMP
  • Interactive access protocols such as Telnet and SSH
  • ICMP or packets with IP options may have to be handled by CPU
  • Layer two packets such as BPDUs, CDP, DTP and so on

Wireless QoS

802.11e Working Group (WG) proposed QoS enhancements to the 802.11 standard in 2007. This was also revised in IEEE 802.11-2012. Wi-Fi Alliance has a compatibility standard called Wireless Multimedia (WMM).

In Wi-Fi networks only one station may transmit at a time, physical constraints that are not in place on wired networks. The Radio Frequency (RF) is shared between devices. This is similar to a hub environment. Wireless networks operate at variable speeds.

Distributed Coordination Function (DCF) is responsible for scheduling and transmitting frames onto the wireless medium.

Wirless uses Carrier Sense Multiple Access/Collision Avoidance (CSMA/CA). It actively tries to avoid collisions. A wireless client has a random period where it may send traffic to try to avoid collisions.

DCF evolved to Enhanced Distributed Channel Access (EDCA) which is a MAC layer protocol. It has the following additions compared to DCF:

  • Four priority queues, or access categories
  • Different interframe spacing for each AC as compared to a single fixed value for all traffic
  • Different contention window for each AC
  • Transmission Opportunity (TXOP)
  • Call admission control (TSpec)

802.11e Ethernet frame uses 3-bit field known as User Priority (UP) for traffic marking. It is analogous to 802.1p CoS. One difference is that voice is marked with UP 6 as compared to CoS 5.

Interframe spacing is a time the client needs to wait before starting to send traffic, the wait time is lower for higher priority traffic.

The contention window is used when the wireless media is not free, higher priority traffic waits a shorter period of time before trying to send again than lower priority data.

TXOP is a period of time when the client is allowed to send to not make it hog up the media for a long period of time.

TSpec is used for admission control, the client sends it requirements such as data rate, frame size to the AP and the AP only admits it if there is available bandwidth.

Upstream QoS is packets from the wireless network onto the wired network. Downstream QoS is packets from the wired network onto the wireless network.

Wireless marking may not be consistent with wired markings so mapping may have to be done to map traffic into the correct classes on the wired network.

Upstream QoS:

  1. 802.11e UP marking on upstream frame from client to AP is translated to a DSCP valued on the outside of the CAPWAP tunnel. The inner DSCP marking is preserved
  2. After the CAPWAP packet is decapsulated at the WLC, the original IP headers DSCP value is used to derive the 802.1p CoS value

Downstream QoS:

  1. A frame with 802.1p CoS marking arrives a WLC wired interface. DSCP value of the IP packet is used to set the DSCP of the outer CAPWAP header.
  2. The DSCP value of the CAPWAP header is used to set the 802.11e UP value on the wireless frame

The 802.1p CoS value is not used in the above process.

Data Center QoS

Primary goal is to manage packet loss. A few milliseconds of traffic during congestion can cause buffer overruns.

Various data center designs have different QoS needs. These are a few data center architectures:

  • High-Performance Trading (HPT)
  • Big data architectures, including High-Performance Computing (HPC), High-Throughput Computing (HTC) and grid data
  • Virtualized Multiservice Data Center (VMDC)
  • Secure Multitenant Data Center (SMDC)
  • Massively Scalable Data center (MSDC)

High-Performance Trading:

Minimal or no QoS requirements because the goal of the architecture is to introduce as little delay as possible using low latency platforms such as the Nexus.

Big Data (HPC/HTC/Grid) Architectures

Have similar QoS needs as a campus network. The goal is to process large and complex data sets that are too difficult to handle by traditional data processing applications.

High-Performance Computing: Uses large amounts of computing power for a short period of time. Often measured in Floating-point Operations Per Second (FLOPS)

High-Throughput Computing: Also uses large amounts of computing power but for a larger period of time. More focused on operations per month or year.

Grid: A federation of computer resources from multiple locations to reach a common goal. A distributed system with noninteractive workloads that involve a large number of files. Compared to HPC, Grid is usually more heterogenous, loosely coupled and geographically dispersed.

Virtualized Multiservice Data Center (VMDC):

VMDC comes with unique requirements due to compute and storage virtualization, including provisioning a lossless Ethernet service.

  • Applications no longer map to physical servers (or cluster of servers)
  • Storage is no longer tied to a physical disk (or array)
  • Network infrastructure is no longer tied to hardware

Lossless compute and storage virtualization protocols such as RoCE and FCoE need to be supported as well as Live Migration/vMotion.

Secure Multitenant Data Center (SMDC):

Virtualization is leveraged to support multitenants over a common infrastructure and this affects the QoS design. SMDC has similar needs as VMDC but a different marking model.

Massively Scalable Data Center:

A framework used to build elastic data centers that host a few applications that are distributed across thousands of servers. Geographically distributed homogenous pools of compute and storage. The goal is to maximize throughput. Common to use a leaf and spine design.

Data Center Bridging Toolset

IEEE 802.1 Data Center Bridging Task Group has defined enhancements to Ethernet to support requirements of converged data center networks.

  • Priority flow control (IEEE 802.1Qbb)
  • Enhanced transmission selection (IEEE 802.1Qaz)
  • Congestion notification (IEEE 802.1Qau)
  • DCB exchange (DCBX) (IEEE 802.1Qaz combined with 802.1AB)

Priority Flow Control (802.1Qbb): PFC provides link level flow control mechanism that can be controlled independently for each 802.1p CoS priority. The goal is to provide zero frame loss due to congestion in DCB networks and mitigating Head of Line (HoL) blocking. Uses PAUSE frames.

Skid Buffers

Buffer management is critical to PFC, if transmit or receive buffers are overflowed, transmission will not be lossless. A switch needs sufficient buffers to:

  • Store frames sent during the time it takes to send the PAUSE frame across the network between stations
  • Store frames that are already in transit when the sender receives the PFC PAUSE frame

The buffers used for this are called skid buffers and usually engineered on a per port basis in hardware on ingress.

An incast flow is a flow from many senders to one receiver.

Virtual Output Queuing (VOQ)

Artifically induce congestion on ingress ports where there is an incast flow going to a host. This lessens the need for deep buffers on egress. VOQ consumes congestion at every ingress port and optimizes switch buffering capacity for incast flows. It does not consume fabric bandwidth only to be dropped on the egress port.

Enhanced Transmission Selection – IEEE 802.1Qaz

Uses a virtual lane concept on a DCB enabled NIC, also called Converged Network Adaptor (CNA). Each virtual interface queue is accountable for managing its alloted bandwidth for its traffic group. If a group is not using all its bandwidth it may be used by other groups.

ETS virtual interface queues can be serviced as follows:

  • Priority – a virtual lane can be assigned a strict priority service
  • Guaranteed bandwidth – a percentage of the physical link capacity
  • Best effort – the default virtual lane service

Congestion Notification IEEE 802.1Qau

Layer two traffic management system that pushes congestion to the edge of the network by instructing rate limiters to shape the traffic that is causing congestion. The congestion point such as a distribution switch connecting to several access switches can instruct these switches called reaction points to throttle the traffic by sending control frames.

Data Center Bridging Exchange (DCBX) IEEE 802.1Qaz + 802.1AB

DCB capabilities:

  • DCB peer discovery
  • Mismatched configuration detection
  • DCB link configuration of peers

The following DCB parameters can be exchanged by DCBX:

  • PFC
  • ETS
  • Congestion notification
  • Applications
  • Logical link-down
  • Network interface virtualization

DCBX can be used between switches and with some endpoints.

Data Center Transmission Control Protocol (DCTCP)

A goal of the data center is to maximize the goodput which is the application level throughput excluding protocol overhead. Goodput is reduced by TCP flow control and congestion avoidance, specifically TCP slow start.

DCTCP is based on two key concepts:

  • React in proportion to the extent of congestion, not its presence – this reduces variance in sending rates
  • Mark ECN base on instantaneous queue length – this enables fast feedback and corresponding window adjustments to better deal with bursts

Considerations affecting the marking model to be used in the data center include the following:

  • Data center applications and protocols
  • CoS/DSCP marking
  • CoS 3 overlapping considerations
  • Application-based marking models
  • Application- and tenant-based marking modelse

Data Center Applications and Protocols


  • Consider what applications/protocols are present in the data center and may not already be reflected in the enterprise QoS model and how these may be integrated
  • Consider what applications/protocols may not be present or have a significantly reduced presence in the DC

Compute Virtualization Protocols:

Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE):
Supports direct memory access of one computer into another over converged Ethernet without involving either one’s operating system. Permits high-throughput low-latency networking, especially useful in massively parallel computer clusters. It’s a link layer protocol that allows communcation between any two hosts in the same broadcast domain. RoCE requires lossless service via PFC. When implemented along with FCoE, it should be assigned its own no-drop class/virtual lane, such as CoS 4. Other applications such as video using CoS 4 need to be reassigned to improve RoCE performance.

Internet Wide Area RDMA Protocol (iWARP):
Extends the reach of RDMA over IP networks. Does not require lossless service because it runs over TCP or STCP which uses reliable transport. It can be marked to unused CoS/DSCP or combined with internetwork control (CS6/CoS 6) or network control (CS7/CoS 7).

Virtual machine control and live migration protocols (VM control):
Virtual Machines (VMs) require control traffic to be passed between hypervisors. VM control is control plane traffic and should be marked to CoS 6 or CoS 7, depending on QoS model in use.

Live migration:
Protocols that support the process of moving a running VM (or application) between different phsycical machines without disconnecting the client or the application. Memory, storage and network connection are moved from original host matchine to the destination. A common example being vMotion. Can be argued to be a candidate for internetwork control (CoS 6) due to being a control plane protocol but sends too much traffic to be put in that class. Use an available marking or combine with CoS 4, CoS 2 or even CoS 1.

Storage Virtualization Protocols:

Fibre Channel over Ethernet (FCoE):
Encapsulates Fibre Channel (FC) frames over Ethernet networks, requires lossless service and is a layer two protocol that can’t be natively routed. Requires lossless service via PFC and usually marked with CoS 3 which should be dedicated for FCoE.

Internet Protocol Small Computer System Interface (iSCSI):
Encapsulates SCSI commands within IP to enable data transfers. Can be used to transmit data over LANS, WANS or even the Internet and can enable location independent data storage and retrieval. Does not require lossless service due to using TCP. Can be provisioned in dedicated class or in another class such as CoS 2 or CoS 1.

CoS/DSCP Marking:


  • Some layer two protocols within the DC require CoS marking
  • CoS marking has limitations so consider a hybrid CoS/DSCP model (when supported)

CoS 3 Overlap Considerations and Tactical Options:


  • Recognize the potential overlap of signaling (and multimedia streaming) markings with FCoE
  • Select a tactical option to address this overlap

Signaling is normally marked with CoS 3 but so is also FCoE. Some administrators prefer to dedicate CoS 3 to FCoE but that leaves the question what to do with signaling. Options to handle the overlap:

Hardware Isolation:
Some platforms and interface modules do not support FCoE such as Nexus 7k M-Series module but F-series do. M-series module can connect to CUCM and multimedia streaming servers and F-series modules to DCB extended fabric supporting FCoE.

Layer 2 Versus Layer 3 Classification:
Signaling and multimedia streaming can be classified by DSCP values (CS3 and AF3) to be assigned to queues and FCoE can be classified by CoS 3 to its own dedicated queue.

Asymmetrical CoS/DSCP Marking:
Asymmetrical meaning the that the three bits forming the CoS do not match the first three bits of the DSCP value. Signaling could be marked with CoS 4 but DSCP CS3.

DC/Campus DSCP Mutation:
Perform ingress and egress DSCP mutation on data center to campus links. Signaling and multimedia streams can be assigned DSCP values that map to CoS 4 (rather than CoS 3).

Allow signaling and FCoE to coexist in CoS 3. The reasoning being that if the CUCM server has CNA then both signaling and FCoE will be provided a lossless service.

Data Center QoS Models:

Trusted Server Model:
Trust L2/L3 markings sent on application servers. Only approved servers should be deployed in the DC.

Untrusted Server Model:
Do not trust markings, reset markings to 0.

Single-Application Server Model:
Same as the untrusted server model but remarked to a non zero value.

Multi-Application Server Model:
Access-lists are used for classification and traffic is marked to multiple codepoints. Application server does not mark traffic at all or it marks it to different values than the enterprise QoS model.

Server Policing Model:
One or more application classes are metered via one-rate or two-rate policers, with conforming, exceeding and optionally violating traffic marked to different DSCP values.

Lossless Transport Model:
Provision lossless service to FCoE.

Trusted Server/Network Interconnect:

  • Trust CoS/DSCP
  • Ingress queuing
  • Egress queuing

Untrusted Server:

  • Set CoS/DSCP to 0
  • Ingress queuing
  • Egress queuing

Single-App Server:

  • Set CoS/DSCP to non zero value
  • Ingress queuing
  • Egress queuing

Multi-App Server:

  • Classify by ACL
  • Set CoS/DSCP values
  • Ingress queuing
  • Egress queuing

Policed Server:

  • Police flows
  • Remark/drop
  • Ingress queuing
  • Egress queuing

Lossless Transport:

  • Enable PFC
  • Enable ETS
  • Enable DCBX
  • Ingress queuing
  • Egress queuing

WAN & Branch QoS Design Considerations & Recommendations:

  • To manage packet loss (and jitter) by queuing policies
  • To enhance classification granularity by leveraging deep packet inspection engines

Packet jitter is most apparent at WAN/branch edge because of downshift in link speeds.

Latency and Jitter:

  • Choose service provider paths to target 150 ms for one-way latency. If this target can’t be met, 200 ms is generally acceptable
  • Only queuing delay is managable by QoS policies

Network latency consists of:

  • Serialization delay (fixed)
  • Propagation delay (fixed)
  • Queuing delay (variable)

Serialization delay is the time it takes to convert a layer two frame into electrical or optical pulses onto the transmission media. The delay is fixed and a function of the line rate.

Propagation delay is also fixed and a function of the physical distance between endpoints. The gating factor is speed of light at 300 000km/s in vacuum but speed in fiber circuits is around a third of that. Propagation delay is then approximately 6.3 microseconds per km. Propagation delay is what makes up most of the network delay.

Queuing delay is variable and a function of whether a node is congested or not and if scheduling policies have been applied to resolve congestion events.


  • Be aware of the Tx-Ring function and depth;tune only if necessary

The Tx-Ring is the final IOS output buffer for an interface, it’s a relatively small FIFO queue that maximizes physical link bandwidth utilization by matching the outbound packet rate on the router with the physical interface rate. If the size of the Tx-Ring is too large, packets will be subject to latency and jitter while waiting to be served. If the Tx-Ring is too small the CPU will be continually interrupted, causing higher CPU usage.


  • Use a dual-LLQ design when deploying voice and real-time video applications
  • Limit sum of all LLQs to 33% of bandwidth
  • Tune the burst parameter if needed

Some applications like Telepresence may be bursty by nature, the burst value may have to be adjusted to account for this.


  • Optionally tune WRED thresholds as required
  • Optionally enable ECN

To match behavior of AF PHB defined in RFC 2597 use these values:

  • Set minimum WRED threshold for AFx3 to 60% of queue depth
  • Set minimum WRED threshold for AFx2 to 70% of queue depth
  • Set minimum WRED threshold for AFx1 to 80% of queue depth
  • Set all WRED maximum thresholds to 100%


  • Enable RSVP for dynamic network-aware admission control requirements
  • Use the Intserv/Diffserv RSVP model to increase efficiency and scalability
  • Use application-identification RSVP policies for greater policy granularity

Ingress QoS Models

  • DSCP is trusted by default in IOS
  • Enable ingress classification with NBAR2 on LAN edges, as required
  • Enable ingress/internal queuing, if required

Egress QoS Models

  • Deploy egress queuing policies on all WAN edge interfaces
  • Egress queuing policies may not be required on LAN edge interfaces

Recommendation for queues:

  • Limit the sum of all LLQs to 33%
  • Use an admission control mechanism
  • Do not enable WRED


  • Provision guaranteed bandwidth according to application requirements
  • Enable fair-queuing presorters
  • Enable DSCP-based WRED


  • Provision guaranteed bandwidth according to control traffic requirements
  • Do not enable presorters
  • Do not enable WRED


  • Provision with a minimum bandwidth allocation such as 1%
  • Do not enable presorters
  • Do not enable WRED

Default/Best effort:

  • Allocate at least 25% for the default/Best effort queue
  • Enable fair-queuing pre-sorters
  • Enable WRED

WAN and Branch Interface QoS Roles:

WAN aggregator LAN edge:

  • Ingress DSCP trust should be enabled
  • Ingress NBAR2 classification and marking policies may be applied
  • Ingress Medianet metadata classification and marking policies may be applied
  • Egress LLQ/CBWFQ/WRED policies may be applied (if required)

WAN aggregator WAN edge:

  • Ingress DSCP trust should be enabled
  • Egress LLQ/CBWFQ/WRED policies should be applied
  • RSVP policies may be applied
  • Additional VPN specific policies may be applied

Branch WAN edge:

  • Ingress DSCP trust should be enabled
  • Egress LLQ/CBWFQ/WRED policies should be applied
  • RSVP policies may be applied
  • Additional VPN specific policies may be applied

Branch LAN edge:

  • Ingress DSCP trust should be enabled
  • Ingress NBAR2 classification and marking policies may be applied
  • Ingress Medianet metadata classification and marking policies may be applied
  • Egress LLQ/CBWFQ/WRED policies may be applied (if required)

MPLS VPN QoS Design Considerations & Recommendations
The role of QoS over MPLS VPNs may include the following:

  • Shaping traffic to contracted service rates
  • Performing hierarchical queuing and dropping within these shaped rates
  • Mapping enterprise classes to the service provider classes
  • Policing traffic according to contracted rates
  • Restoring packet markings

MEF Ethernet Connectivity Services

A service connecting two customer Ethernet ports over a WAN. It is based on point-to-point Ethernet Virtual Connection (EVC)

Ethernet Private Line(EPL):
A basic point-to-point service characterized by low frame delay, frame delay variation and frame loss ratio. Service multiplexing is not allowed. No CoS bandwidth profiling is allowed, only a Committed Information Rate (CIR).

Ethernet Virtual Private Line(EVPL):
Multiplexing of EVCs is allowed. The individual EVCs can be defined with different bandwidth profiles and layer two control processing methods.

A multipoint service connecting customer endpoints and acting as a bridged Ethernet network. It is based on multipoint EVC and service multiplexing is allowed. It can be configured with a CIR, Committed Burst Size (CBS) and Excess Information Rate (EIR).

A point-to-multipoint version of the E-LAN, essentialy it’s a hub and spoke topology where the spokes can only communicate with the hub but not each other. Common for franchise operations.

Sub-Line-Rate Ethernet Design Implications

  • Sub line rate may require hierarchical shaping with nested queuing policies
  • Configure the CE shaper’s Committed Burst (Bc) value to be no more than half of the SP’s policer Bc

If the Bc of the shaper is set too high, packets may be dropped by the policer even though the shaper is shaping to CIR of the service.

When using sub line rate there will be no congestion on the interface, congestion is artificially induced by using a shaper and then a nested policy for the queuing. This may be referred to as Hierarchical QoS (HQoS).

QoS Paradigm Shift

  • Enterprises and service providers most cooperate to jointly administer QoS over MPLS VPNs

MPLS VPNs offer a full mesh of connectivity between campus and branch networks. This fully meshed connectivity has implications for the QoS design. Previously WANs were usually point-to-point or hub and spoke which made the QoS design simpler. Branch to branch traffic would pass through the hub which controlled the QoS.

When using MPLS VPNs traffic from branch to branch will not pass the hub meaning that QoS needs to be deployed on all the branches as well. However, this is not enough, contending traffic may not be coming from the same site, it could be coming from any site. To overcome this the service provider needs to deploy QoS policies that are compatible with the enterprise policies on the PE routers. This is a paradigm shift in QoS administration and requires the enterprise and SP to jointly administer the QoS policies.

Service Provider Class of Service Models

  • Fully understand the CoS models of the SP
  • Select the model that most closely matches your strategic end-to-end model

MPLS DiffServ Tunneling Modes

  • Understand the different MPLS Diffserv tunneling modes and how they affect customer DSCP markings
  • Short pipe mode offers enterprise customers the most transparency and control of their traffic classes

Uniform Mode

  • If provider uses uniform mode, be aware that your packets DSCP values may be remarked

Uniform mode is generally used when the customer and SP share the same Diffserv domain, which would be the case for an enterprise deploying MPLS.

Uniform mode is the default mode. The first three bits of the IP ToS field are mapped to MPLS EXP bits on the ingress PE when it adds the label. If a policer or other mechanism remarks the MPLS EXP value this value is copied to lower level labels and at the egress PE the MPLS EXP value is used to set the IPP value.

Short Pipe Mode

It is used when customer and SP are in different Diffserv domains. This mode is useful when the SP wants to enfore its own Diffserv policy but the customer wants its Diffserv information to be preserved across the MPLS VPN.

The ingress PE sets the MPLS EXP value based on the SPs policies. Any remarking will only propagate to the MPLS EXP bits of labels but not to the IPP bits of the customers IP packet. On egress the queuing is based on the IPP marking of the customers packet, giving the customer maximum control.

Pipe Mode

Pipe mode is the same as short pipe mode except for that the queuing is based on MPLS EXP bits at the egress PE and not on the customers IPP marking.

Enterprise-to-Service Provider Mapping

  • Map the enterprise application classes to the SP CoS classes as efficiently as possible

Enterprise to service provider mapping considerations include the following:

  • Mapping real-time voice and video traffic
  • Mapping signaling and control traffic
  • Separating TCP-based applications from UDP-based applications (where possible)
  • Remarking and restoring packet markings (where required)

Mapping Real-Time Voice and Video

  • Balance the service level requirements for real-time voice and video with the SP premium for real-time bandwidth
  • In either scenario, use a dual LLQ policy at CE egress edge

SPs often only a single real-time CoS, if you are deploying both real-time voice and video you will have to make a choice to put the video in the real-time class or not. Putting both voice and video into the real-time class may be costly or even cost prohibitive. You should still use a dual LLQ at the CE edge since that is under your control and that way you can protect voice from video. Downgrading video to a non real-time class may only produce slightly lower quality which could be acceptable.

Mapping Control and Signaling Traffic

  • Avoid mixing control plane traffic with data plane traffic in a single SP CoS

Signaling should be separated from data traffic if possible since the signaling could get dropped if the class is oversubscribed and thus producing voice/video instability. If the SP does not offer enough classes to put signaling in its own, consider putting it in the real-time class since these flows are lightweight, but critical.

Separating TCP from UDP

  • Separate TCP traffic from UDP traffic when mapping to SP CoS classes

It is generally best to not mix TCP-based traffic with UDP-based traffic (especially if the UDP traffic is streaming video such as broadcast video) within a single SP CoS. These protocols behave differently under congestion. Some UDP applications may have application-level windowing, flow control and retransmission capabilities but most UDP transmitters are oblivious to drops and don’t lower transmission rates due to dropping.

When TCP and UDP share a SP CoS and that class experiences congestion, the TCP flows continually lower their transmission rates, potentially giving up their bandwidth to UDP flows that are oblivious to drops. This is called TCP starvation/UDP dominance.

Even if enabling WRED the same behavior would be seen because WRED (primarily) manages congestion only on TCP-based flows.

Re-Marking and Restoring Markings

  • Remark application classes on CE edge on egress (as required)
  • Restore markings on the CE edge on ingress via deep packet inspection policies (as required)

If packets need to be remarked to fit with the SP CoS model, do it at the CE edge on egress. This requires less of an effort than doing it in the campus.

To restore DSCP markings, traffic can be classified on ingress on the CE edge via DPI.


CE LAN edge:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Ingress NBAR2 classification and marking policies may be applied
  • Ingress Medianet metadata classification and marking policies may be applied
  • Egress LLQ/CBWFQ/WRED policies may be applied (if required)

CE VPN edge:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Ingress NBAR2 classification and marking policies may be applied (to restore markings lost in transit)
  • Ingress Medianet metadata classification and marking policies may be applied (to restore markings lost in transit)
  • RSVP policies may be applied
  • Egress LLQ/CBWFQ/WRED policies should be applied
  • Egress hierarchical shaping with nested LLQ/CBWFQ/WRED policies may be applied
  • Egress DSCP remarking policies may be applied (used to map application classes into specific SP CoS)

PE customer-facing edge:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Ingress policing policies to meter customer traffic should be applied
  • Ingress MPLS tunneling mode policies may be applied
  • Egress MPLS tunneling mode policies may be applied
  • Egress LLQ/CBWFQ/WRED policies should be applied

PE core-facing edge:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Ingress policing policies to meter customer traffic should be applied
  • Egress MPLS EXP-based LLQ/CBWFQ policies should be applied
  • Ergess MPLS EXP-based WRED policies may be applied

P edges:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Egress MPLS EXP-based LLQ/CBWFQ policies may be applied
  • Egress MPLS EXP-based WRED policies may be applied

IPSEC QoS Design

Tunnel Mode

Default IPSEC mode of operation on Cisco IOS routers. The entire IP packet is protected by IPSEC, the sending VPN router encrypts the entire original IP packet and adds a new IP header to the packet. It supports multicast and routing protocols.

Transport Mode

Often used for encrypting peer-to-peer communications, does not encase the original IP packet into a new packet. Only the payload is encrypted while the original IP header is preserved, in effect being copied to outside of the new IP packet. Because the header is left intact its not possible to do multicast or routing protocols in transport mode.


GRE can be used to enable VPN services that connect disparate networks. It’s a key building block when using VRF Lite, a technology allowing related Virtual Routing and Forwarding (VRF) instances running on different routers to be interconnected across an IP network, while maintaining their separation from both the global routing table and other VRFs.

When using GRE as a VPN technology, it is often desirable to encrypt the GRE tunnel so that privacy and authentication of the connection can be ensured. GRE can be used with IPSEC tunnel mode or transport mode but if the tunnel transits a NAT or PAT device, tunnel mode is required.

Remote-Access VPNs

Cisco’s primary remote-access VPN client is AnyConnect Secure Mobility Client, which supports both IPSEC and Secure Sockets Layer (SSL) encryption.

Anyconnect uses Data Transport Layer Security (DTLS) to optimize real-time flows over SSL encrypted tunnel. Anyconnect connects to remote headend concentrator (such as an ASA firewall) through TCP-based SSL. All traffic from the client including voice, video and data traverses the SSL TCP connection. When TCP loses packets it pauses and waits for them to be resent, this is not good for real-time UDP based packets.

DTLS is a datagram technology, meaning it uses UDP packets instead of TCP. After Anyconnect establishes the TCP SSL tunnel it also establishes an UDP-based DTLS tunnel which is reserved for the use of real-time applications. This allows RDP voice and video packets to be sent unhindered. In case of packet loss, the session does not pause.

The decision on which tunnel to send the packets to is dynamic and made by the Anyconnect client.

QoS Classification of IPsec Packets

  • Understand the default behavior of Cisco VPN routers to copy the ToS byte from the inner packet to the VPN packet header

Cisco routers by default copy the the ToS field from the original IP packet and write it into the new IPSEC packet header, thus allowing classification to still be accomplished by matching DSCP values. The same holds true for GRE packets as well. The IP packet is encrypted so it’s not possible to match on other fields such as IP addresses, ports, protocol and so on without using another feature.

The IOS Preclassify Feature

  • Be aware of the limitations of QoS classification when using something other than the ToS byte
  • Use the IOS preclassify feature for all non ToS types of QoS classification
  • As a best practice, enable this feature for all VPN connections

Normally tunneling and encryption takes place before QoS classification in the order of operations, QoS preclassify reverses the order so that classification can be done on the IP header before it gets encrypted. Actually the order isn’t really reversed but the router clones the original IP header and keeps it in memory so that it can be used for QoS classification after tunneling and encryption.

This feature is only applicable on the encrypting routers outbound interface (physical or tunnel). Downstream routers can’t make decisions on the header because the packet will be encrypted at that point. Always enable the feature since tests have shown that it has very little impact on the routers performance to enable it.

MTU Considerations

  • Be aware that MTU issues can severely impact network connectivity and the quality of user experience in VPN networks

When tunneling technologies are used there is always the risk of exceeding the MTU somewhere in the path. Unless jumbo frames are available end-to-end, MTU issues will almost always need to be addressed when dealing with any kind of VPN technology. Common symptoms when having MTU issues is that applications using small packets such as voice work but not e-mail, file server connections and many other applications.

Path MTU Discovery (PMTUD) can be used to discover what the MTU is along the path but it relies on ICMP messages which may be blocked on intermediary devices.

TCP Adjust-MSS

TCP Maximum Segment Size (MSS) is the maximum amount of payload data that a host is willing to accept in a single TCP/IP datagram. During a TCP connection setup between two hosts (TCP SYN), the MSS for each side of the connection is reported to each other. It’s the responsibility of the sending host to limit the size of the datagram to a value less than or equal to the receiving hosts MSS.

For an IP packet that is 1500 bytes and using TCP, the MSS is 1460 bytes, 20 bytes for IP and 20 bytes for TCP excluded from the 1500 byte packet.

Two hosts may not be aware they are communication through a tunnel and send a TCP SYN with MSS 1460 but the MTU may be lower. TCP Adjust-MSS can rewrite the MSS of the SYN packet so that when the receiving hosts gets it, the value is set to something lower to be able to send traffic through the tunnel without fragmentation. The receiving host will then reply with this value to the sender host. The router is acting as a middleman for the TCP session.

When using IPSEC over GRE, a MTU of 1378 bytes can be used:

  • Original IP packet = 1500 bytes
  • Subtract 20 bytes for IP header = 1480 bytes
  • Subtract 20 bytes for IP header = 1460 bytes
  • Subtract 24 bytes for GRE header = 1436 bytes
  • Subtract a maximum of 58 bytes for IPSEC = 1378 bytes

Adjusting MSS is a CPU intensive process. Enable it at remote sites rather than headend since it might be terminating a lot of tunnels. Adjusting MSS only needs to be done at one point in the path.

TCP Adjust-MSS only has impact on TCP packets, UDP packets are less likely to be of large size compared to TCP.

Compression Strategies Over VPN

  • Compression can improve overall throughput, latency and user experience on VPN connections
  • Some compression technologies tunnel and may hide the fields used for QoS classification

TCP Optimization Using WAAS

Wide Area Application Services (WAAS) is a WAN accelerator, it uses compression technologies such as LZ compression, Date Redundancy Elimination (DRE) and specific Application Optimizers (AO). This significantly reduces the amount of data send over the WAN or VPN. For a technology like WAAS to work, the compression must take place before encryption.

Compression technologies can have a significant effect on the QoE but it works mainly for TCP traffic. Some WAN acceleration solution may break classification if the traffic is tunnel so that the original IP header is obfuscated. WAAS only compresses the data partion of the packet and keeps the header intact leaving the ToS byte available for classification.

Using Voice Codecs over a VPN Connection

To improve voice quality over bandwidth constrained VPN links, administrators may use compression codecs such as ILBC or G.729.

G.729 uses about a third of the bandwidth of G.711 but this also increases the effect of packet loss since more data is lost in every packet. To overcome this when the a packet is lost and the jitter buffer expires, the voice from the previous packet can be replayed to hide the gap, essentially tricking the listener. Through this technology, up to 5% of packet loss can be acceptable.

Internet Low Bitrate Codec (ILBC) uses 15.2 Kbit/s or 13.33 Kbit/s and performs similarly to G.729, the Mean Opinion Score (MOS) for ILBC is significantly better though when there is packet loss.

Compress Real-Time Protocol (cRTP) is not compatible with IPSEC because the packets are already encrypted when cRTP would try to compress them.

Antireplay Implications

  • Antireplay drops may introduce in an IPSEC VPN network with QoS enabled

When ESP authentication is configured in an IPSEC transform set, every Security Association (SA) keeps a 64-packet sliding window where it checks the incoming sequence number of the encrypted packets. This is to stop from someone replaying packets and is called connectionless integrity. If packets arrive out of order due to queuing it must fit inside the window or the packet will be drop and seen as antireplay error. A data packet may get stuck behind voice in a queue so that it misses to fit inside its sliding window and then the packet would get dropped. To overcome this use a line in the ACL for every type of traffic such as voice, data, video. This will create a SA for each type of traffic.

TCP will be affected by packet loss, it will not know that the packets are dropped due to antireplay.

Antireplay drops are around 1 to 1.5% on congested VPN links with queuing enabled. A CBWFQ policy will often hold 64 packets per queue, decreasing this will lead to fewer antireplay drops as the packets are dropped before traversing the VPN but it may also increase the CPU usage.

DMVPN QoS Design

DMVPN offers some advantages regarding QoS compared to IPSEC, such as the following:

  • Reduction of overall hub router QoS configuration
  • Scalability to thousands of sites, with QoS for each tunnel on the hub router
  • Zero-touch QoS support on the hub router for new spokes
  • Flexibility of both hub and spoke and spoke to spoke (full mesh) deployment models

DMVPN Building Blocks

mGRE: Multi-point GRE allows a single tunnel interface to server a large number of remote spokes. One outbound QoS policy can be applied instead of one per tunnel as with normal GRE which is point-to-point.

Dynamic discover of IPSEC tunnel endpoints and crypto profiles: Dynamic creation of crypto maps, no need to statically build crypto map for each tunnel endpoint.

NHRP: Allows spoke to be configured with dynamically configured IP address. Also enables zero-touch deployment that makes DMVPN spokes easy to set up. Think of the hub router as a “next-hop server” rather than a traditional VPN router. NHRP is also used for per tunnel QoS feature.

The Per-Tunnel QoS for DMVPN Feature

Allows the administrator to enable QoS on a per-tunnel or per-spoke basis. QoS policy is applied to the mGRE tunnel interface. This protects spokes from each other and keeps one spoke from using all the BW so that there is none left for the others. The QoS policy at the hub is automatically generated for each tunnel when a spoke registers with the hub.

Queuing only kicks in when there is congestion, to signal to the routers QoS mechanism that there is congestion a shaper is used. Shape the traffic flows to the real VPN tunnel bandwidth to produce artificial back pressure. With per-tunnel QoS for DMVPN, a shaper is automatically applied by the system to each and every tunnel. This allows the router to implement differentiated services for the various data flows corresponding to each tunnel. This technique is called Hierarchical Queuing Framework (HQF).

Using NHRP, multiple spokes can be grouped together to use the same QoS policy.

This technique provides QoS in the egress direction of the hub towards the spokes. For QoS from the spokes to the hub, a QoS policy needs to be applied at the spokes.

At this time it is not possible to have an unique policy for traffic between spoke to spoke due to spokes not having access to the NHRP database.

GET VPN QoS Design

Group Encrypted Transport (GET) VPN is a technology to encrypt traffic between IPSEC endpoints without the use of tunnels. Packets transmitted use IPSEC tunnel mode but it is not defined by traditional IPSEC SA.

Because there are no tunnels, the QoS configuration is simplified.

GET VPN QoS Overview

DMVPN is suitable for hub and spoke VPNs over a public untrusted network such as the Internet, GET VPN is suitable for private networks such as a MPLS VPN. A MPLS VPN is private but not encrypted and GET VPN can encrypt the traffic between the MPLS sites. GET VPN has no real concept of hub and spoke, which simplifies the QoS architecture. There is not one major hub aggregating all the remote sites and being liable to massive oversubscription.

These are some of the major differences between DMVPN and GET VPN model:

Choosing VPN

Group Domain of Interpretation (GDOI)

GDOI is a technology that supports any to any IPSEC VPN without the use of tunnels. There is no concept of SA between specific routers, instead it uses a group SA which is used by all the encrypting nodes in the network. There is no per tunnel QoS needed since it does not use tunnels, QoS is simply applied egress on each GET VPN router.

GDOI control plane protocol uses UDP port 848 and ISAKMP on port UDP 500. These packets are normally marked DSCP CS6 by the router.

IP Header Preservation

Normally with IPSEC tunnel mode the ToS byte is copied to the new IP header but the original IP header is not preserved. On a public network such as the Internet it makes good sense to hide the source and destination IP addresses but GET VPN is deployed on MPLS networks which are private.

GET VPN keeps the original IP header intact which simplifies QoS, dynamic routing and multicast. The packet is still considered an ESP IPSEC packet, not TCP or UDP, so to classify based on port numbers the QoS preclassify feature will still be needed.

How and When to Use the QoS Preclassify Feature
Design principles:

  • If classification is based on source or destination IP, preclassify is not needed but still recommended
  • If classification is based on TCP or UDP port numbers, QoS preclassificy is needed
  • Enable the QoS preclassify feature in GET VPN deployments

A Case for Combining GET VPN and DMVPN

DMVPN has some drawbacks, spoke to hub tunnel is always up but spoke to spoke tunnels are dynamically brought up. This causes a delay which can take a second or two and may have negative impact on real-time traffic. The delay is not caused by NHRP or the packetization of the GRE tunnel but rather the exchange of ISAKMP messaging and the establishment of the IPSEC SAs between the routers.

DMVPN could then be used solely for setting up GRE tunnels and GET VPN for encryption of the packets going into the tunnel. This then allows for fast establishment of tunnels and encrypting the packets, increasing the overall user experience.

Working with Your Service Provider When Deploying GET VPN
Design principles:

  • Ensure that the service provider handles DSCP consistently troughout the MPLS WAN network
Categories: CCDE, QoS Tags: , , , , , ,

Unique RD per PE in MPLS VPN for Load Sharing and Faster Convergence

January 11, 2015 3 comments

This post describes how load sharing and faster convergence in MPLS VPNs is possible by using an unique RD per VRF per PE. It assumes you are already familiar with MPLS but here is a quick recap.

The Route Distinguisher (RD) is used in MPLS VPNs to create unique routes. With IPv4, an IP address is 32 bits long but several customers may and probably will use the same networks. If CustomerA uses and CustomerX also uses, we must in some way make this route unique to transport it over MPBGP. The RD does exactly this by prepending a 64 bit value and together with the IPv4 address, creating a 96-bit VPNv4 prefix. This is all the RD does, it has nothing to do with the VPN in itself. It is common to create RD consisting of AS_number:VPN_identifier so that a VPN has the same RD on all PEs where it exists.

The Route Target (RT) is what defines the VPN, which routes are imported to the VPN and the topology of the VPN. These are extended communities that are tagged on to the BGP Update and transported over MPBGP.

MPLS uses labels, the transport label which is used to transport the packet through the network is generated by LDP. The VPN label which is used to make sure the packets make it to the right VPN is generated by MPBGP and can be per prefix or per VRF.

Below is a configuration snipper for creating a VRF with the newer syntax that is used.

PE1#sh run vrf
Building configuration...

Current configuration : 401 bytes
vrf definition CUST1
 address-family ipv4
  route-target export 64512:1
  route-target import 64512:1
interface GigabitEthernet1
 vrf forwarding CUST1
 ip address
 negotiation auto
router bgp 64512
 address-family ipv4 vrf CUST1
  neighbor remote-as 65000
  neighbor activate

The values for the RD and RT are defined under the VRF. Now the topology we will be using is the one below.


This topology uses a Route Reflector (RR) like most decently sized net works will to overcome the scalability limitations of a BGP full mesh. The negative part of using a RR is that we will have less routes because only the best routes will be reflected. This means that load sharing may not take place and that convergence takes longer time when a link between a PE and a CE goes down.

This diagram shows PE1 and PE2 advertising the same network to the RR. The RR then picks one as best and reflects that to PE3 (and others). This means that the path through PE2 will never be used until something happens with PE1. This is assuming that they are both using the same RD.



When PE1 loses its prefix it sends a BGP WITHDRAW to the RR, the RR then sends a WITHDRAW to PE3 and then it sends an UPDATE which is the prefix via PE2. The path via PE2 is not used until this happens. This means that load sharing is not taking place and that all traffic destined for has to converge.

If every PE is using unique RD for the VRF per PE then they become two different routes and both can be reflected by the RR. The RD is then usually written in the form PE_loopback:VPN_identifier. This also helps with troubleshooting to see where the prefix originated from.


PE3 now has two routes to in its routing table.

PE3#sh ip route vrf CUST1

Routing Table: CUST1
Routing entry for
  Known via "bgp 64512", distance 200, metric 0
  Tag 65000, type internal
  Last update from 01:10:52 ago
  Routing Descriptor Blocks:
  * (default), from, 01:10:52 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 65000
      MPLS label: 17
      MPLS Flags: MPLS Required (default), from, 01:10:52 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 65000
      MPLS label: 28
      MPLS Flags: MPLS Required

The PE is now doing load sharing meaning that some traffic will take the path over PE1 and some over PE2.


We have achieved load sharing and this also means that if something happens with PE1 or PE2, not all traffic will be effected. To see which path is being used from PE3 we can use the show ip cef exact-route command.

PE3#sh ip cef vrf CUST1 exact-route -> => label 17 label 16TAG adj out of GigabitEthernet1, addr
PE3#sh ip cef vrf CUST1 exact-route -> => label 28 label 17TAG adj out of GigabitEthernet1, addr

What is the drawback of using this? It consumes more memory because the prefixes are now unique, in effect doubling the required memory to store BGP Paths. The PEs have to store several copies with different RD for the prefix before it can import it into the RIB.

PE3#sh bgp vpnv4 uni all
BGP table version is 46, local router ID is
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher:
 *>i              0    100      0 65000 i
Route Distinguisher:
 *>i              0    100      0 65000 i
Route Distinguisher: (default for vrf CUST1)
 *>               0             0 65001 i
 *mi              0    100      0 65000 i
 *>i                      0    100      0 65000 i

For the multipathing to take place, PE3 must allow more than one route to be installed via BGP. This is done through the maximum-paths eibgp command.

address-family ipv4 vrf CUST1
  maximum-paths eibgp 2

In newer releases there are other features to overcome the limitation of only reflecting one route, such as BGP Add Path. This post showed the benefits of enabling unique RD for a VRF per PE to enable load sharing and better convergence. It also showed that doing so will use more memory due to having to store multiple copies of essentially the same route. Because multiple routes get installed into the FIB, that should also be a consideration depending on how large the FIB is for your platform.

Categories: BGP, MPLS Tags: , , , ,

Book Review – End-to-End QoS Network Design: Quality of Service for Rich-Media & Cloud Networks, Second Edition

January 9, 2015 4 comments

As part of my CCDE studies, I needed a good resource on QoS. There have basically been two good books on QoS before, the first edition of End to End Qos Network Design and Qos-Enabled Networks: Tools and Foundations. The first edition of this book is good but very dated, it was released back in 2004. Qos-Enabled Networks is a great book but it’s written to not be vendor specific, so you will not get details on platforms or configuration snippets.

In my opinion, earlier books gave a good foundation to understand QoS concepts but there were too few design cases, they were lacking platform information and not enough examples to be able to act as a reference. Since the first edition of this book, a lot has happened, new products and new Places In the Network (PIN) such as Datacenter, Wireless and to some degree MPLS.

The book is written by Tim Szigeti, Christina Hattingh, Robert Barton and Kenneth Briley Jr. Tim is a long time CCIE, technical leader at Cisco. He is the QoS gury responsible for a lot of the Cisco Validated Designs (CVDs) and a frequent presenter at Cisco Live. Christina is a former Technical Marketing Engineer (TME) at Cisco now acting as an independant, writing books, teaching and consulting. Robert is a senior Systems Engineer (SE), dual CCIE and CCDE. Kenneth is a CCIE, technical lead at Cisco, focusing on convergence of QoS for wired and wireless networks.

This book was written of some of the best minds in the world on QoS, and it shows.

The book is divided into different parts, the first part consists of an QoS overview and describes Diffserv, Intserv, classification and marking, policing, shaping, congestion management and avoidance, QoS in IPv6 networks and more. The book does a very good job of laying a good foundation for the reader to build on. It has nice graphics to explain queueing, policing, shaping and so on. Every chapter also has a “Further Reading” part if you want to dive deeper into a subject.

The next part of the book is about business and application QoS requirements. What requirements does different applications have? How do you differentiate business critical apps on port 80 from bulk traffic? What are the design principles for QoS? How many classes should be deployed? The book tries to answer these questions, many books fall short on this part.

After that there is a part on Campus QoS. This is where the book really starts to shine. It shows the difference between Multi Layer Switching (MLS) QoS and Modular QoS CLI (MQC), how to apply QoS on 3750, 4500 and 6500. What are the different trust states, where should you trust, where should you mark. It also shows how to apply QoS on Etherchannels and how it behaves on different platforms, information that can be difficult to find and hidden through multiple documents otherwise. It ends with a design case and in my opinion all books should be written like this. This shows the reader how to apply the different concepts and to think of how all pieces fit together.

Then there is a part on wireless QoS, first an overview on how packets are scheduled on the radio, which standards that are relevant, why the earlier standards were not good enough and what has changed. QoS is shown on different platforms and controllers and at the end there is a case study. I don’t work much with wireless but if I did this would be a very good reference since earlier books don’t discuss wireless QoS. I was surprised to learn that there are some discrepancies in wireless QoS compared to 802.1p and DSCP.

Datacenter QoS is in the next part and this is definitely a great addition compared to earlier books. It discusses the different Nexus platforms, what additions are needed in the Datacenter to be able to deliver lossless Ethernet and also ends with a case study.

WAN and branch QoS design comes after that and this is probably what most readers will recognize as QoS. It has examples on the ISR G2 but also on the ASR1k and as usual ends with a case study.

I really like the next part which is on MPLS QoS. This is not easy to find in other books. It explains the difference between short pipe, pipe and uniform mode. It also has examples on QoS on the ASR9k, CRS and also examples on how the customer should configure QoS when connecting to a Service Provider (SP). As usual a case study at the end.

The final part of the book is on QoS in VPNs, such as IPSEC, GET VPN, DMVPN and connecting from a home office. This part is also difficult to find in other books so it’s great that it’s included in here. It also has a case study at the end.

This book is written on some of the best people out there. It has a nice flow to it, it covers all the relevant areas of QoS. It covers different platforms and shows examples on how to configure QoS on these platforms. It can serve as book for learning more or for a certification or simply as a reference for all of your needs on QoS. This book is VERY extensive but it is so for a reason. It’s not long just for the sake of it, it’s all relevant material. Read it end to end or pick the parts you are interested in. If you want to get one book for QoS, get this one! If you are studying for the CCIE, this should be your reference. I can’t recommend this book enough, you’ll see the ratings on Amazon, Safari etc that everyone agrees that this is an awesome book.

Categories: Announcement Tags: , ,