Archive

Archive for the ‘QoS’ Category

QoS Terminology – Comparing Cisco to MEF and RFC Terminology

July 31, 2015 5 comments

Have you every thought that you knew a topic pretty well but then someone uses terminology that you aren’t used to? People that use Cisco a lot or live outside the MEF world use another terminology than people that are working on MEF certified networks. Even if we both know the concepts, if we don’t speak a common language it will be difficult to communicate and to the the right end result.

When I took the CCDE written at Cisco Live, some of the QoS related material felt a bit off to me. I feel quite confident with QoS so this took me by surprise. My theory is that some of the material was written by someone coming from another background and uses some wording that just felt a bit off to me. I thought that I would read through some of the MEF material to broaden my QoS horizon and see what other terms are being used. At the very least I will have learned something new.

If we start with the basics, we have flows in our networks and these flows have different needs regarding delay, jitter and packet loss. I will write different terms and I will indicate which belong to MEF terminology, the other terms will be related to what Cisco calls them or what they would be called in general outside of the MEF world.

Delay

Latency

Round Trip Time (RTT)

Frame Delay (MEF)

These all relate to how much delay that is acceptable in the network. It may be one-way or two-way requirements depending on the nature of the traffic. RTT always refers to the two-way delay.

Jitter

Frame Delay Variation (MEF)

The MEF term is actually a bit clearer here as jitter is the variation of delay.

Packet Loss

Frame Loss Ratio (MEF)

Once again, MEF term is a bit clearer because we are interested to see the packet loss as a ratio, such as 1/100 packets which we then use as a percentage for what is acceptable loss on a circuit.

Commited Burst (Bc)

Commited Burst Size (CBS)(MEF)

The Bc or CBS value is used to define how much traffic in bits or bytes can be sent during each time interval. Picking a too low value can lead to customer dropping a lot of packets and picking a too high value can lead to long time intervals which could affect high priority traffic. The formula Tc = Bc / CIR can be used for calculations.

Burst Excess (Be)

Excess Burst Size (EBS)(MEF)

Be or EBS is normally used to provide the customer a more “fair” use of a circuit by allowing them to send unused credits from one or more previous time intervals. This means that they can burst momentarily until they have used up the Bc + Be credits.

Committed Information Rate (CIR)

This is the rate that is guaranteed to the customer in the contract. The physical line rate could be 100 Mbit/s but the CIR is 50 Mbit/s. It should be noted that this is an average rate and that traffic is always sent at line rate which produces bursts of traffic. This means that the customer will for short periods of time send at above the CIR rate but on average they get CIR rate on the circuit.

Excess Information Rate (EIR)(MEF)

A provider/carrier may allow a customer to send at above CIR rate but only those packets that are within the CIR are guaranteed the performance characteristics as defined in the SLA. This is commonly implemented with a single rate Three Color Marker (srTCM) where packets that are within the CIR/CBS are marked as green, packets above CIR but within EIR/EBS are marked as yellow and packets that exceed the EIR/CBS are marked as red. Green packets are guaranteed performance as defined in the SLA, yellow packets get delivered according to best effort and red packets are dropped.

This illustration shows the concept of srTCM:

srTCM

Peak Information Rate (PIR)

As noted by Faisal in the comments. PIR is not the same as EIR. PIR is actually CIR + EIR which means that we have two token buckets filling at the same time and incoming packets are checked against both to see if it matches CIR rate or EIR rate which will then set the color of the packet to be green or yellow. One example could be where customer has CIR of 10 Mbit/s and EIR of 10 Mbit/s which gives a combined rate (PIR) of 20 Mbit/s. The first 10 Mbit/s is guaranteed and the other 10 Mbit/s is sent through the provider network as long as there is capacity available.

This is a short post on different QoS terminology. Which terminology are you most used to?

Categories: QoS Tags: , ,

QoS Design Notes for CCDE

January 17, 2015 14 comments

Trying to get my CCDE studies going again. I’ve finished the End to End QoS Design book (relevant parts) and here are my notes on QoS design.

Basic QoS

Different applications require different treatment, the most important parameters are:

  • Delay: The time it takes from the sending endpoint to reach the receiving endpoint
  • Jitter: The variation in end to end delay between sequential packets
  • Packet loss: The number of packets sent compared to the number of received as a percentage

Characteristics of voice traffic:

  • Smooth
  • Benign
  • Drop sensitive
  • Delay sensitive
  • UDP priority

One-way requirements for voice:

  • Latency ≤ 150 ms
  • Jitter ≤ 30 ms
  • Loss ≤ 1%
  • Bandwidth (30-128Kbps)

Characteristics for video traffic:

  • Bursty
  • Greedy
  • Drop sensitive
  • Delay sensitive
  • UDP priority

One-way requirements for video:

  • Latency ≤ 200-400 ms
  • Jitter ≤ 30-50 ms
  • Loss ≤ 0.1-1%
  • Bandwidth (384Kbps-20+ Mbps)

Characteristics for data traffic:

  • Smooth/bursty
  • Benign/greedy
  • Drop insensitive
  • Delay insensitive
  • TCP retransmits

Quality of Service (QoS) – Managed unfairness, measured numerically in latency, jitter and packetloss

Quality of Experience (QoE) – End user perception of network performance, subjective and can’t be measured

Tools

Classification and marking tools: Session, or flows, are analyzed to determine what class the packets belong to and what treatment they should receive. Packets are marked so that analysis happens a limited number of times, usually at ingress as close to the source as possible. Reclassification and remarking is common as the packets traverse the network.

Policing, shaping and markdown tools: Different classes of traffic are alotted portions of the network resources. Traffic may be selectively dropped, delayed or remarked to avoid congestion when it exceeds the available network resources. Traffic can be dropped (policing), slowed down (shaped) or remarked (markdown) to conform.

Congestion management or scheduling tools: When there is more traffic than available network resources it will be queued. For traffic classes that don’t react well to queueing they can be denied access by a scheduling tool to avoid lowering quality of the existing flows.

Link-specific tools: Link fragmentation and interleaving fits into this category.

Packet Header

IPv4 packet has 8-bit Type of Service (ToS) field, IPv6 packet has 8-bit Traffic Class field. The first three bits are IP Precedence (IPP) bits for a total of 8 classes. The first three bits in combination with the nex three is known as DSCP for a total of 64 classes.

At layer two the most common marking is 802.1p Class of Service (CoS) or MPLS EXP bits, each using three bits for a total of 8 classes.

QoS Deployment Principles

  1. Define business/organizational objectives of QoS deployment. This may including provisioning real-time services for voice/video traffic or guaranteeing bandwidth for critical business applications and also managing scavenger traffic. Seek executive endorsement of the business objectives to not derail the process later on.
  2. Based on the business objectives, determine how many classes of traffic is needed. Define an end-to-end strategy how to identify the traffic and treat it across the network.
  3. Analyze the requirements of each application class so that the proper QoS tools can be deployed to meet these requirements.
  4. Design platform-specific QoS policies to meet the requirements with consideration for appropriate Place In the Network (PIN).
  5. Test the QoS designs in a controlled environment.
  6. Begin deployment with a closely monitored and evaluated pilot rollout.
  7. The tested and pilot proven QoS designs can be deployed to the production network in phases during scheduled downtime.
  8. Monitor service levels to make sure that the QoS objectives are being met.

The common mistake is to make it a technical process only and not research the business objectives and requirements.

QoS Feature Sequencing

Classification: The identification of each traffic stream.

Pre-queuing: Admission decisions, and dropping and marking the packet, are best applied before the packet enters a queue for egress scheduling and transmission.

Queueing: Scheduling the order of packets before transmission.

Post-queueing: Usually optional, sometimes needed to apply actions that are dependent on the transmission order of packets, such as sequence numbering(e.g. compression and encryption), which isn’t known until the QoS scheduling function dequeues the packets based on the priority rules.

Security and QoS

Trust Boundaries

A trust boundary is a network location where packet markings are not accepted and may be rewritten. Trust domains are network locations where packet markings are accepted and acted on.

Network Attacks

QoS tools can mitigate the effects of worms and DoS attacks to keep critical applications available during an attack.

Recommendations and Guidelines

  • Classify and mark traffic as close to the source as technically and administratively feasible
  • Classification and marking can be done on ingress or egress but queuing and shaping are usually done on egress
  • Use an end-to-end Diffserv PHB model for packet marking
  • Less granular fields such as CoS and MPLS EXP should be mapped to DSCP as close to the traffic source as possible
  • Set a trust boundary and mark or remark traffic that comes in beyond the boundary
  • Follow standards based Diffserv PHB markings if possible to ensure interopability with SP networks, enterprise networks or merging networks together
  • Set dscp and set precedence should be used to mark all IP traffic, set ip dscp and set ip precedence only marks IPv4 packets
  • When using tunnel interfaces, think of feature sequencing to make sure that the inner or outer packet headers (or both) are marked as intended

Policing and Shaping Tools

Policer: Checks for traffic violations against a configured rate. Does not delay packets, takes immediate action to drop or remark packet if exceeding rate.

Shaper: Traffic smoothing tool with the objective to buffer packets instead of dropping them, smoothing out any peaks of traffic arrival to not exceed configured rate.

Characteristics of a policer:

  • Causes TCP resends when traffic is dropped
  • Inflexible and inadaptable;makes instantaneous packet drop decisions
  • An ingress or egress interface tool
  • Does not add any delay or jitter to packets
  • Rate limiting without buffering

Characteristics of a shaper:

  • Typically delays rather than drops exceeding traffic, causes fewer TCP resends
  • Adapts to congestion by buffering exceeding traffic
  • Typically an egress interface tool
  • Adds delay and jitter if rate exceeds the shaper
  • Rate limiting with buffering

Placing Policers and Shapers in the Network

Policers make instantaneous decisions and should be deployed ingress, don’t transport packets if they are going to be dropped anyway. Policers can also be placed on egress to limit a traffic class at the edge of the network.

Shapers are often deployed as egress tools, commonly on enterprise to SP links to not exceed the commited rate of the SP.

Tail Drop and Random Drop

Tail drop means dropping the packet that is at the end of an queue. The TX ring is always FIFO, if a voice packet is trying to get into the TX ring but it’s full it will get dropped because it’s at the tail of the queue. Random drop via Random Early Detection (RED) or Weighted Random Early Detection (WRED) tries to keep the queues from becoming full by dropping packets from traffic classes to cause TCP slowing down.

Recommendations and Guidelines

  • Police as close to the source as possible, preferably on ingress.
  • Single rate three color policer handles bursts better than single rate two color policer resulting in fewer TCP retransmissions
  • Use a shaper on interfaces where speed mismatches, such as buying a lower rate than physical speed or between a remote-end access link and the aggregated head-end link
  • When shaping on an interface carrying real-time traffic, set the Tc value to 10 ms

Scheduling Algorithms

Strict priority: Lower priority queues are only served when higher priority queues are empty. Can potentially starve traffic in lower priority queues.

Round robin: Queues are served in a set sequence, does not starve traffic but can add unpredictable delays in real-time, delay sensitive traffic.

Weighted fair: Packets in the queue are weighted, usually by IP precedence so that some queues get served more often than others. Does not provide bandwidth guarantee, the bandwidth per flow varies based on number of flows and the weight of each flow.

WRED is a congestion avoidance tool and manages the tail of the queue. The goal is to avoid TCP synchronization where all TCP flows speed up and slow down at the same time, which leads to poor utilization of the link. WRED has little or no effect on UDP flows. WRED can be used to set the RFC 3168 IP ECN bits to indicated that it is experiencing congestion.

Recommendations and Guidelines

  • Critical applications like VoIP requires service guarantees regardless of network conditions. This requires to enable queueing on all nodes with a potential for congestion.
  • A large number of applications end up in the default class, reserve 25% for this default Best Effort class
  • For a link carrying a mix of voice, video and data traffic, limit the priority queue to 33% of the link bandwidth
  • Enable LLQ if real-time, latency sensitive traffic is present
  • Use WRED for congestion avoidance on TCP flows but evalute if it has any traffic on UDP flows
  • Use DSCP-based WRED wherever possible

Bandwidth Reservation Tools

Measurement based: Counting mechanism to only allow a limited number of calls (sessions). Normally statically configured by an administrator.

Resource based: Based on the availability of resources in the network, usually bandwidth. Uses the current status of the network to base its decision.

Resource Reservation Protocol (RSVP) is a resource based protocol, commonly used with MPLS-TE. The drawback of RSVP is that it requires a lot of state in the devices.

AC functionality is most effectively deployed at aplication level such as with Cisco Unified Communications Manager (CUCM). It works well in networks with limited complexity and where flows are of predictable bandwidth.

RSVP can be used in combination with Diffserv in an Intserv/Diffserv model where RSVP is only responsible for admission control and Diffserv for the queuing.

A RSVP proxy can be used because end devices such as phones and video endpoints usually don’t support the RSVP stack. A router closest to the endpoint is then used as a proxy together with CUCM to act as an AC mechanism.

Recommendations and Guidelines

Cisco recommends using RSVP Intserv/Diffserv model with a router-based proxy device. This allows for scaling of policies together with a dynamic network aware AC.

IPv6 and QoS

IPv6 headers are larger in size so bandwidth consumption for small packet sizes is higher. IPv4 header is normally 20 bytes but IPv6 is 40 bytes. IPv6 has a 20-bit Flow Label field and 8-bit Traffic Class field.

Medianet

Modern applications can be difficult to classify and can consists of multiple types of traffic. Webex provides text, audio, instant messaging, application sharing and desktop video conferencing through the same application. NBAR2 can be used to identify applications.

Application Visibility Control (AVC)

Consists of NBAR2, Flexible Netflow (FNF) and MQC. NBAR2 is used to identify traffic through Deep Packet Inspection (DPI), FNF reports on usage and MQC is used for the configuration.

FNF uses Netflow v9 and IPFIX to export flow record information. It can monitor L2 to L7 and identify apps by port and through NBAR2. When using NBAR2, CPU usage may increase significantly as well as memory usage. This is also true for FNF. Consider the performance impact before deploying it.

QoS Requirements and Recommendations by Application Class

Voice requirements:

  • One-way latency should be no more than 150 ms
  • One-way peak-to-peak jitter should be no more than 30 ms
  • Per-hop peak-to-peak jitter should be no more than 10 ms
  • Packet loss should be no more than 1%
  • A range of 20 – 320 Kbps of guaranteed priority bandwidth per call (depends on sampling rate, codec and L2 overhead)

Voice recommendations:

  • Mark to Expedited Forwarding (EF) / DSCP 46
  • Treat with EF PHB (priority queuing)
  • Voice should be admission controlled

May use jitter buffers to reduce the effects of jitter, however it does add delay. Voice packets are constant in size which means bandwidth can be provisioned accurately. Don’t forget to account for L2 overhead.

Broadcast video requirements:

  • Packet loss should be no more than 0.1%

Broadcast video recommendations:

  • Mark to CS5 / DSCP 40
  • May be treated with EF PHB (priority queuing)
  • Should be admission controlled

Flows are usually unidirectional and include application level buffering. Does not have strict jitter or latency requirements.

Real-time interactive video requirements:

  • One-way latency should be no more than 200 ms
  • One-way peak-to-peak jitter should be no more than 50 ms
  • Per-hop peak-to-peak jitter should be no more than 10 ms
  • Packet loss should be no more than 0.1%
  • Provisioned bandwidth depends on codec, resolution, frame rates, additional data components and network overhead

Real-time interactive video recommendations:

  • Should be marked with CS4 / DSCP 32
  • May be treated with an EF PHB (priority queuing)
  • Should be admission controlled

Multimedia conferencing requirements:

  • One-way latency should be no more than 200 ms
  • Packet loss should be no more than 1%

Multimedia conferencing recommendations:

  • Mark to AF4 class (AF41/AF42/AF43 or DSCP 34/36/38)
  • Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED
  • Should be admission controlled

Multimedia streaming requirements:

  • One-way latency should be no more than 400 ms
  • Packet loss should be no more than 1%

Multimedia streaming recommendations:

  • Should be marked to AF3 class (AF31/AF32/AF33 or DSCP 26/28/30)
  • Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED
  • May be admission controlled

Data applications can be divided into Transactional Data (low latency) or Bulk Data (high throughput)

Transactional data recommendations:

  • Should be marked to AF2 class (AF21/AF22/AF23 or DSCP 18/20/22)
  • Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED

This class may be subject to policing and remarking. Applications in this class can be Enterprise Resource Planning (ERP) or Customer Relationship Management (CRM).

Bulk data recommendations:

  • Should be marked to AF1 class (AF11/AF12/AF13 or DSCP 10/12/14)
  • Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED
  • Deployed in moderately provisioned queue to provide a degree of bandwidth constraint during congestion, to prevent long TCP session from dominating network bandwidth

Example applications are e-mail, backup operations, FTP/SFTP transfers, video and content distribution.

Best effort data recommendations:

  • Mark to DF (DSCP 0)
  • Provision in dedicated queue
  • May be provisioned with guaranteed bandwidth allocation and WRED/RED

Scavenger traffic recommendations:

  • Should be marked to CS1 (DSCP 8)
  • Should be assigned a minimally provisioned queue

Example traffic is Youtube, Xbox Live/360 movies, iTunes, Bittorrent.

Control plane traffic can be divided into Network Control, Signaling and Operations/Administration/Management (OAM).

Network Control recommendations:

  • Should be marked to CS6 (DSCP 48)
  • May be assigned a moderately provisioned guaranteed bandwidth queue

Do not enable WRED. Example traffic is EIGRP, OSPF, BGP, HSRP and IKE.

Signaling traffic recommendations:

  • Should be marked to CS3 (DSCP 24)
  • May be assigned a moderately provisioned guaranteed bandwidth queue

Do not enable WRED. Example traffic is SCCP, SIP and H.323.

OAM traffic recommendations:

  • Should be marked to CS2 (DSCP 16)
  • May be assigned a moderately provisioned guaranteed bandwidth queue

Do not enable WRED. Example traffic is SSH, SNMP, Syslog, HTTP/HTTPs.

QoS Design Recommendations:

  • Always enable QoS in hardware as opposed to software if possible
  • Classify and mark as close to the source as possible
  • Use DSCP markings where available
  • Follow standards based DSCP PHB markings
  • Police flows as close to source as possible
  • Mark down traffic according to standards based rules if possible
  • Enable queuing at every node that has potential for congestion
  • Limit LLQ to 33% of link capacity
  • Use AC mechanism for LLQ
  • Do not enable wred for LLQ
  • Provision at least 25% for Best Effort traffic

QoS Models:

Four-Class Model:

  • Voice
  • Control
  • Transactional Data
  • Best Effort

Eight-Class Model:

  • Voice
  • Multimedia-conferencing
  • Multimedia-streaming
  • Network Control
  • Signaling
  • Transactional Data
  • Best Effort
  • Scavenger

Twelve-Class Model:

  • Voice
  • Broadcast Video
  • Real-time interactive
  • Multimedia-conferencing
  • Multimedia-streaming
  • Network Control
  • Signaling
  • Management/OAM
  • Transactional Data
  • Bulk Data
  • Best Effort
  • Scavenger

This picture shows how different size models can be expanded or vice versa.

QoS models

Campus QoS Design Considerations and Recommendations:

The primary role of QoS is campus networks is not to control latency or jitter, but to manage packet loss. Endpoints normally connect to the campus at high speeds, it may only take a few milliseconds if congestion to overrun the buffers of switches/linecards/routers.

Trust Boundaries:

Conditionally trusted endpoints: Cisco IP phones, Cisco Telepresence, Cisco IP video surveillance cameras, Cisco digital media players.

Trusted endpoints: Centrally administered PCs and endpoints, IP video conferencing units, managed APs, gateways and other similar devices.

Untrusted endpoints: Unsecure PCs, printers and similar devices.

Port-Based QoS versus VLAN-based QoS versus Per-Port/Per-VLAN QoS

Design recommendations:

  • Use port-based QoS when simplicity and modularity are the key design drivers
  • Use VLAN-based QoS when looking to scale policies for classification, trust and marking
  • Do not use VLAN-based QoS to scale (aggregate) policing policies
  • Use per-port/per-VLAN when supported and policy granularity is the key design driver

EtherChannel QoS

  • Load balance based on source and destination IP or what is expected to give the best distribution of traffic
  • Be aware that multiple real-time flows may up on the same physical link and oversubscribing the real-time queue

EtherChannel QoS will vary by platform and some policies are applied to the bundle and some to the physical interface.

Ingress QoS Models:

Design recommendations:

  • Deploy ingress QoS models such as trust, classification and policing on all access edge ports
  • Deploy ingress queuing (if supported and required)

The probability for congestion on ingress is less than on egress.

Egress QoS Models:

Design recommendations:

  • Deploy egress queuing policies on all switch ports
  • Use a 1 priority queue and 3 normal queues or better queuing structure

Enable trust on ports leading to network infrastructure and similar devices.

Trusted Endpoint:

  • Trust DSCP
  • Optional ingress marking and/or poling
  • Minimum 1P3Q

Untrusted Endpoint:

  • No trust
  • Optional ingress marking and/or poling
  • Minimum 1P3Q

Conditionally Trusted Endpoint:

  • Conditional trust with trust CoS
  • Optional ingress marking and/or poling
  • Minimum 1P3Q

Switch to Switch/Router Port QoS:

  • Trust DSCP
  • Minimum 1P3Q

Control Plane Policing

Can be used to harden the network infrastructure. Packets handled by main CPU typically include the following:

  • Routing protocols
  • Packets destined to the local IP of the router
  • Packets from management protocols such as SNMP
  • Interactive access protocols such as Telnet and SSH
  • ICMP or packets with IP options may have to be handled by CPU
  • Layer two packets such as BPDUs, CDP, DTP and so on

Wireless QoS

802.11e Working Group (WG) proposed QoS enhancements to the 802.11 standard in 2007. This was also revised in IEEE 802.11-2012. Wi-Fi Alliance has a compatibility standard called Wireless Multimedia (WMM).

In Wi-Fi networks only one station may transmit at a time, physical constraints that are not in place on wired networks. The Radio Frequency (RF) is shared between devices. This is similar to a hub environment. Wireless networks operate at variable speeds.

Distributed Coordination Function (DCF) is responsible for scheduling and transmitting frames onto the wireless medium.

Wirless uses Carrier Sense Multiple Access/Collision Avoidance (CSMA/CA). It actively tries to avoid collisions. A wireless client has a random period where it may send traffic to try to avoid collisions.

DCF evolved to Enhanced Distributed Channel Access (EDCA) which is a MAC layer protocol. It has the following additions compared to DCF:

  • Four priority queues, or access categories
  • Different interframe spacing for each AC as compared to a single fixed value for all traffic
  • Different contention window for each AC
  • Transmission Opportunity (TXOP)
  • Call admission control (TSpec)

802.11e Ethernet frame uses 3-bit field known as User Priority (UP) for traffic marking. It is analogous to 802.1p CoS. One difference is that voice is marked with UP 6 as compared to CoS 5.

Interframe spacing is a time the client needs to wait before starting to send traffic, the wait time is lower for higher priority traffic.

The contention window is used when the wireless media is not free, higher priority traffic waits a shorter period of time before trying to send again than lower priority data.

TXOP is a period of time when the client is allowed to send to not make it hog up the media for a long period of time.

TSpec is used for admission control, the client sends it requirements such as data rate, frame size to the AP and the AP only admits it if there is available bandwidth.

Upstream QoS is packets from the wireless network onto the wired network. Downstream QoS is packets from the wired network onto the wireless network.

Wireless marking may not be consistent with wired markings so mapping may have to be done to map traffic into the correct classes on the wired network.

Upstream QoS:

  1. 802.11e UP marking on upstream frame from client to AP is translated to a DSCP valued on the outside of the CAPWAP tunnel. The inner DSCP marking is preserved
  2. After the CAPWAP packet is decapsulated at the WLC, the original IP headers DSCP value is used to derive the 802.1p CoS value

Downstream QoS:

  1. A frame with 802.1p CoS marking arrives a WLC wired interface. DSCP value of the IP packet is used to set the DSCP of the outer CAPWAP header.
  2. The DSCP value of the CAPWAP header is used to set the 802.11e UP value on the wireless frame

The 802.1p CoS value is not used in the above process.

Data Center QoS

Primary goal is to manage packet loss. A few milliseconds of traffic during congestion can cause buffer overruns.

Various data center designs have different QoS needs. These are a few data center architectures:

  • High-Performance Trading (HPT)
  • Big data architectures, including High-Performance Computing (HPC), High-Throughput Computing (HTC) and grid data
  • Virtualized Multiservice Data Center (VMDC)
  • Secure Multitenant Data Center (SMDC)
  • Massively Scalable Data center (MSDC)

High-Performance Trading:

Minimal or no QoS requirements because the goal of the architecture is to introduce as little delay as possible using low latency platforms such as the Nexus.

Big Data (HPC/HTC/Grid) Architectures

Have similar QoS needs as a campus network. The goal is to process large and complex data sets that are too difficult to handle by traditional data processing applications.

High-Performance Computing: Uses large amounts of computing power for a short period of time. Often measured in Floating-point Operations Per Second (FLOPS)

High-Throughput Computing: Also uses large amounts of computing power but for a larger period of time. More focused on operations per month or year.

Grid: A federation of computer resources from multiple locations to reach a common goal. A distributed system with noninteractive workloads that involve a large number of files. Compared to HPC, Grid is usually more heterogenous, loosely coupled and geographically dispersed.

Virtualized Multiservice Data Center (VMDC):

VMDC comes with unique requirements due to compute and storage virtualization, including provisioning a lossless Ethernet service.

  • Applications no longer map to physical servers (or cluster of servers)
  • Storage is no longer tied to a physical disk (or array)
  • Network infrastructure is no longer tied to hardware

Lossless compute and storage virtualization protocols such as RoCE and FCoE need to be supported as well as Live Migration/vMotion.

Secure Multitenant Data Center (SMDC):

Virtualization is leveraged to support multitenants over a common infrastructure and this affects the QoS design. SMDC has similar needs as VMDC but a different marking model.

Massively Scalable Data Center:

A framework used to build elastic data centers that host a few applications that are distributed across thousands of servers. Geographically distributed homogenous pools of compute and storage. The goal is to maximize throughput. Common to use a leaf and spine design.

Data Center Bridging Toolset

IEEE 802.1 Data Center Bridging Task Group has defined enhancements to Ethernet to support requirements of converged data center networks.

  • Priority flow control (IEEE 802.1Qbb)
  • Enhanced transmission selection (IEEE 802.1Qaz)
  • Congestion notification (IEEE 802.1Qau)
  • DCB exchange (DCBX) (IEEE 802.1Qaz combined with 802.1AB)

Priority Flow Control (802.1Qbb): PFC provides link level flow control mechanism that can be controlled independently for each 802.1p CoS priority. The goal is to provide zero frame loss due to congestion in DCB networks and mitigating Head of Line (HoL) blocking. Uses PAUSE frames.

Skid Buffers

Buffer management is critical to PFC, if transmit or receive buffers are overflowed, transmission will not be lossless. A switch needs sufficient buffers to:

  • Store frames sent during the time it takes to send the PAUSE frame across the network between stations
  • Store frames that are already in transit when the sender receives the PFC PAUSE frame

The buffers used for this are called skid buffers and usually engineered on a per port basis in hardware on ingress.

An incast flow is a flow from many senders to one receiver.

Virtual Output Queuing (VOQ)

Artifically induce congestion on ingress ports where there is an incast flow going to a host. This lessens the need for deep buffers on egress. VOQ consumes congestion at every ingress port and optimizes switch buffering capacity for incast flows. It does not consume fabric bandwidth only to be dropped on the egress port.

Enhanced Transmission Selection – IEEE 802.1Qaz

Uses a virtual lane concept on a DCB enabled NIC, also called Converged Network Adaptor (CNA). Each virtual interface queue is accountable for managing its alloted bandwidth for its traffic group. If a group is not using all its bandwidth it may be used by other groups.

ETS virtual interface queues can be serviced as follows:

  • Priority – a virtual lane can be assigned a strict priority service
  • Guaranteed bandwidth – a percentage of the physical link capacity
  • Best effort – the default virtual lane service

Congestion Notification IEEE 802.1Qau

Layer two traffic management system that pushes congestion to the edge of the network by instructing rate limiters to shape the traffic that is causing congestion. The congestion point such as a distribution switch connecting to several access switches can instruct these switches called reaction points to throttle the traffic by sending control frames.

Data Center Bridging Exchange (DCBX) IEEE 802.1Qaz + 802.1AB

DCB capabilities:

  • DCB peer discovery
  • Mismatched configuration detection
  • DCB link configuration of peers

The following DCB parameters can be exchanged by DCBX:

  • PFC
  • ETS
  • Congestion notification
  • Applications
  • Logical link-down
  • Network interface virtualization

DCBX can be used between switches and with some endpoints.

Data Center Transmission Control Protocol (DCTCP)

A goal of the data center is to maximize the goodput which is the application level throughput excluding protocol overhead. Goodput is reduced by TCP flow control and congestion avoidance, specifically TCP slow start.

DCTCP is based on two key concepts:

  • React in proportion to the extent of congestion, not its presence – this reduces variance in sending rates
  • Mark ECN base on instantaneous queue length – this enables fast feedback and corresponding window adjustments to better deal with bursts

Considerations affecting the marking model to be used in the data center include the following:

  • Data center applications and protocols
  • CoS/DSCP marking
  • CoS 3 overlapping considerations
  • Application-based marking models
  • Application- and tenant-based marking modelse

Data Center Applications and Protocols

Recommendations:

  • Consider what applications/protocols are present in the data center and may not already be reflected in the enterprise QoS model and how these may be integrated
  • Consider what applications/protocols may not be present or have a significantly reduced presence in the DC

Compute Virtualization Protocols:

Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE):
Supports direct memory access of one computer into another over converged Ethernet without involving either one’s operating system. Permits high-throughput low-latency networking, especially useful in massively parallel computer clusters. It’s a link layer protocol that allows communcation between any two hosts in the same broadcast domain. RoCE requires lossless service via PFC. When implemented along with FCoE, it should be assigned its own no-drop class/virtual lane, such as CoS 4. Other applications such as video using CoS 4 need to be reassigned to improve RoCE performance.

Internet Wide Area RDMA Protocol (iWARP):
Extends the reach of RDMA over IP networks. Does not require lossless service because it runs over TCP or STCP which uses reliable transport. It can be marked to unused CoS/DSCP or combined with internetwork control (CS6/CoS 6) or network control (CS7/CoS 7).

Virtual machine control and live migration protocols (VM control):
Virtual Machines (VMs) require control traffic to be passed between hypervisors. VM control is control plane traffic and should be marked to CoS 6 or CoS 7, depending on QoS model in use.

Live migration:
Protocols that support the process of moving a running VM (or application) between different phsycical machines without disconnecting the client or the application. Memory, storage and network connection are moved from original host matchine to the destination. A common example being vMotion. Can be argued to be a candidate for internetwork control (CoS 6) due to being a control plane protocol but sends too much traffic to be put in that class. Use an available marking or combine with CoS 4, CoS 2 or even CoS 1.

Storage Virtualization Protocols:

Fibre Channel over Ethernet (FCoE):
Encapsulates Fibre Channel (FC) frames over Ethernet networks, requires lossless service and is a layer two protocol that can’t be natively routed. Requires lossless service via PFC and usually marked with CoS 3 which should be dedicated for FCoE.

Internet Protocol Small Computer System Interface (iSCSI):
Encapsulates SCSI commands within IP to enable data transfers. Can be used to transmit data over LANS, WANS or even the Internet and can enable location independent data storage and retrieval. Does not require lossless service due to using TCP. Can be provisioned in dedicated class or in another class such as CoS 2 or CoS 1.

CoS/DSCP Marking:

Recommendations:

  • Some layer two protocols within the DC require CoS marking
  • CoS marking has limitations so consider a hybrid CoS/DSCP model (when supported)

CoS 3 Overlap Considerations and Tactical Options:

Recommendations:

  • Recognize the potential overlap of signaling (and multimedia streaming) markings with FCoE
  • Select a tactical option to address this overlap

Signaling is normally marked with CoS 3 but so is also FCoE. Some administrators prefer to dedicate CoS 3 to FCoE but that leaves the question what to do with signaling. Options to handle the overlap:

Hardware Isolation:
Some platforms and interface modules do not support FCoE such as Nexus 7k M-Series module but F-series do. M-series module can connect to CUCM and multimedia streaming servers and F-series modules to DCB extended fabric supporting FCoE.

Layer 2 Versus Layer 3 Classification:
Signaling and multimedia streaming can be classified by DSCP values (CS3 and AF3) to be assigned to queues and FCoE can be classified by CoS 3 to its own dedicated queue.

Asymmetrical CoS/DSCP Marking:
Asymmetrical meaning the that the three bits forming the CoS do not match the first three bits of the DSCP value. Signaling could be marked with CoS 4 but DSCP CS3.

DC/Campus DSCP Mutation:
Perform ingress and egress DSCP mutation on data center to campus links. Signaling and multimedia streams can be assigned DSCP values that map to CoS 4 (rather than CoS 3).

Coexistence:
Allow signaling and FCoE to coexist in CoS 3. The reasoning being that if the CUCM server has CNA then both signaling and FCoE will be provided a lossless service.

Data Center QoS Models:

Trusted Server Model:
Trust L2/L3 markings sent on application servers. Only approved servers should be deployed in the DC.

Untrusted Server Model:
Do not trust markings, reset markings to 0.

Single-Application Server Model:
Same as the untrusted server model but remarked to a non zero value.

Multi-Application Server Model:
Access-lists are used for classification and traffic is marked to multiple codepoints. Application server does not mark traffic at all or it marks it to different values than the enterprise QoS model.

Server Policing Model:
One or more application classes are metered via one-rate or two-rate policers, with conforming, exceeding and optionally violating traffic marked to different DSCP values.

Lossless Transport Model:
Provision lossless service to FCoE.

Trusted Server/Network Interconnect:

  • Trust CoS/DSCP
  • Ingress queuing
  • Egress queuing

Untrusted Server:

  • Set CoS/DSCP to 0
  • Ingress queuing
  • Egress queuing

Single-App Server:

  • Set CoS/DSCP to non zero value
  • Ingress queuing
  • Egress queuing

Multi-App Server:

  • Classify by ACL
  • Set CoS/DSCP values
  • Ingress queuing
  • Egress queuing

Policed Server:

  • Police flows
  • Remark/drop
  • Ingress queuing
  • Egress queuing

Lossless Transport:

  • Enable PFC
  • Enable ETS
  • Enable DCBX
  • Ingress queuing
  • Egress queuing

WAN & Branch QoS Design Considerations & Recommendations:

  • To manage packet loss (and jitter) by queuing policies
  • To enhance classification granularity by leveraging deep packet inspection engines

Packet jitter is most apparent at WAN/branch edge because of downshift in link speeds.

Latency and Jitter:
Recommendations:

  • Choose service provider paths to target 150 ms for one-way latency. If this target can’t be met, 200 ms is generally acceptable
  • Only queuing delay is managable by QoS policies

Network latency consists of:

  • Serialization delay (fixed)
  • Propagation delay (fixed)
  • Queuing delay (variable)

Serialization delay is the time it takes to convert a layer two frame into electrical or optical pulses onto the transmission media. The delay is fixed and a function of the line rate.

Propagation delay is also fixed and a function of the physical distance between endpoints. The gating factor is speed of light at 300 000km/s in vacuum but speed in fiber circuits is around a third of that. Propagation delay is then approximately 6.3 microseconds per km. Propagation delay is what makes up most of the network delay.

Queuing delay is variable and a function of whether a node is congested or not and if scheduling policies have been applied to resolve congestion events.

Tx-Ring:
Recommendation:

  • Be aware of the Tx-Ring function and depth;tune only if necessary

The Tx-Ring is the final IOS output buffer for an interface, it’s a relatively small FIFO queue that maximizes physical link bandwidth utilization by matching the outbound packet rate on the router with the physical interface rate. If the size of the Tx-Ring is too large, packets will be subject to latency and jitter while waiting to be served. If the Tx-Ring is too small the CPU will be continually interrupted, causing higher CPU usage.

LLQ:
Recommendations:

  • Use a dual-LLQ design when deploying voice and real-time video applications
  • Limit sum of all LLQs to 33% of bandwidth
  • Tune the burst parameter if needed

Some applications like Telepresence may be bursty by nature, the burst value may have to be adjusted to account for this.

WRED:
Recommendations:

  • Optionally tune WRED thresholds as required
  • Optionally enable ECN

To match behavior of AF PHB defined in RFC 2597 use these values:

  • Set minimum WRED threshold for AFx3 to 60% of queue depth
  • Set minimum WRED threshold for AFx2 to 70% of queue depth
  • Set minimum WRED threshold for AFx1 to 80% of queue depth
  • Set all WRED maximum thresholds to 100%

RSVP
Recommendations:

  • Enable RSVP for dynamic network-aware admission control requirements
  • Use the Intserv/Diffserv RSVP model to increase efficiency and scalability
  • Use application-identification RSVP policies for greater policy granularity

Ingress QoS Models
Recommendations:

  • DSCP is trusted by default in IOS
  • Enable ingress classification with NBAR2 on LAN edges, as required
  • Enable ingress/internal queuing, if required

Egress QoS Models
Recommendations:

  • Deploy egress queuing policies on all WAN edge interfaces
  • Egress queuing policies may not be required on LAN edge interfaces

Recommendation for queues:
LLQ:

  • Limit the sum of all LLQs to 33%
  • Use an admission control mechanism
  • Do not enable WRED

Multimedia/Data:

  • Provision guaranteed bandwidth according to application requirements
  • Enable fair-queuing presorters
  • Enable DSCP-based WRED

Control:

  • Provision guaranteed bandwidth according to control traffic requirements
  • Do not enable presorters
  • Do not enable WRED

Scavenger:

  • Provision with a minimum bandwidth allocation such as 1%
  • Do not enable presorters
  • Do not enable WRED

Default/Best effort:

  • Allocate at least 25% for the default/Best effort queue
  • Enable fair-queuing pre-sorters
  • Enable WRED

WAN and Branch Interface QoS Roles:

WAN aggregator LAN edge:

  • Ingress DSCP trust should be enabled
  • Ingress NBAR2 classification and marking policies may be applied
  • Ingress Medianet metadata classification and marking policies may be applied
  • Egress LLQ/CBWFQ/WRED policies may be applied (if required)

WAN aggregator WAN edge:

  • Ingress DSCP trust should be enabled
  • Egress LLQ/CBWFQ/WRED policies should be applied
  • RSVP policies may be applied
  • Additional VPN specific policies may be applied

Branch WAN edge:

  • Ingress DSCP trust should be enabled
  • Egress LLQ/CBWFQ/WRED policies should be applied
  • RSVP policies may be applied
  • Additional VPN specific policies may be applied

Branch LAN edge:

  • Ingress DSCP trust should be enabled
  • Ingress NBAR2 classification and marking policies may be applied
  • Ingress Medianet metadata classification and marking policies may be applied
  • Egress LLQ/CBWFQ/WRED policies may be applied (if required)

MPLS VPN QoS Design Considerations & Recommendations
The role of QoS over MPLS VPNs may include the following:

  • Shaping traffic to contracted service rates
  • Performing hierarchical queuing and dropping within these shaped rates
  • Mapping enterprise classes to the service provider classes
  • Policing traffic according to contracted rates
  • Restoring packet markings

MEF Ethernet Connectivity Services

E-line:
A service connecting two customer Ethernet ports over a WAN. It is based on point-to-point Ethernet Virtual Connection (EVC)

Ethernet Private Line(EPL):
A basic point-to-point service characterized by low frame delay, frame delay variation and frame loss ratio. Service multiplexing is not allowed. No CoS bandwidth profiling is allowed, only a Committed Information Rate (CIR).

Ethernet Virtual Private Line(EVPL):
Multiplexing of EVCs is allowed. The individual EVCs can be defined with different bandwidth profiles and layer two control processing methods.

E-LAN:
A multipoint service connecting customer endpoints and acting as a bridged Ethernet network. It is based on multipoint EVC and service multiplexing is allowed. It can be configured with a CIR, Committed Burst Size (CBS) and Excess Information Rate (EIR).

E-Tree:
A point-to-multipoint version of the E-LAN, essentialy it’s a hub and spoke topology where the spokes can only communicate with the hub but not each other. Common for franchise operations.

Sub-Line-Rate Ethernet Design Implications
Recommendations:

  • Sub line rate may require hierarchical shaping with nested queuing policies
  • Configure the CE shaper’s Committed Burst (Bc) value to be no more than half of the SP’s policer Bc

If the Bc of the shaper is set too high, packets may be dropped by the policer even though the shaper is shaping to CIR of the service.

When using sub line rate there will be no congestion on the interface, congestion is artificially induced by using a shaper and then a nested policy for the queuing. This may be referred to as Hierarchical QoS (HQoS).

QoS Paradigm Shift
Recommendation:

  • Enterprises and service providers most cooperate to jointly administer QoS over MPLS VPNs

MPLS VPNs offer a full mesh of connectivity between campus and branch networks. This fully meshed connectivity has implications for the QoS design. Previously WANs were usually point-to-point or hub and spoke which made the QoS design simpler. Branch to branch traffic would pass through the hub which controlled the QoS.

When using MPLS VPNs traffic from branch to branch will not pass the hub meaning that QoS needs to be deployed on all the branches as well. However, this is not enough, contending traffic may not be coming from the same site, it could be coming from any site. To overcome this the service provider needs to deploy QoS policies that are compatible with the enterprise policies on the PE routers. This is a paradigm shift in QoS administration and requires the enterprise and SP to jointly administer the QoS policies.

Service Provider Class of Service Models
Recommendations:

  • Fully understand the CoS models of the SP
  • Select the model that most closely matches your strategic end-to-end model

MPLS DiffServ Tunneling Modes
Recommendations:

  • Understand the different MPLS Diffserv tunneling modes and how they affect customer DSCP markings
  • Short pipe mode offers enterprise customers the most transparency and control of their traffic classes

Uniform Mode
Recommendation:

  • If provider uses uniform mode, be aware that your packets DSCP values may be remarked

Uniform mode is generally used when the customer and SP share the same Diffserv domain, which would be the case for an enterprise deploying MPLS.

Uniform mode is the default mode. The first three bits of the IP ToS field are mapped to MPLS EXP bits on the ingress PE when it adds the label. If a policer or other mechanism remarks the MPLS EXP value this value is copied to lower level labels and at the egress PE the MPLS EXP value is used to set the IPP value.

Short Pipe Mode

It is used when customer and SP are in different Diffserv domains. This mode is useful when the SP wants to enfore its own Diffserv policy but the customer wants its Diffserv information to be preserved across the MPLS VPN.

The ingress PE sets the MPLS EXP value based on the SPs policies. Any remarking will only propagate to the MPLS EXP bits of labels but not to the IPP bits of the customers IP packet. On egress the queuing is based on the IPP marking of the customers packet, giving the customer maximum control.

Pipe Mode

Pipe mode is the same as short pipe mode except for that the queuing is based on MPLS EXP bits at the egress PE and not on the customers IPP marking.

Enterprise-to-Service Provider Mapping
Recommendation:

  • Map the enterprise application classes to the SP CoS classes as efficiently as possible

Enterprise to service provider mapping considerations include the following:

  • Mapping real-time voice and video traffic
  • Mapping signaling and control traffic
  • Separating TCP-based applications from UDP-based applications (where possible)
  • Remarking and restoring packet markings (where required)

Mapping Real-Time Voice and Video
Recommendation:

  • Balance the service level requirements for real-time voice and video with the SP premium for real-time bandwidth
  • In either scenario, use a dual LLQ policy at CE egress edge

SPs often only a single real-time CoS, if you are deploying both real-time voice and video you will have to make a choice to put the video in the real-time class or not. Putting both voice and video into the real-time class may be costly or even cost prohibitive. You should still use a dual LLQ at the CE edge since that is under your control and that way you can protect voice from video. Downgrading video to a non real-time class may only produce slightly lower quality which could be acceptable.

Mapping Control and Signaling Traffic
Recommendation:

  • Avoid mixing control plane traffic with data plane traffic in a single SP CoS

Signaling should be separated from data traffic if possible since the signaling could get dropped if the class is oversubscribed and thus producing voice/video instability. If the SP does not offer enough classes to put signaling in its own, consider putting it in the real-time class since these flows are lightweight, but critical.

Separating TCP from UDP
Recommendation:

  • Separate TCP traffic from UDP traffic when mapping to SP CoS classes

It is generally best to not mix TCP-based traffic with UDP-based traffic (especially if the UDP traffic is streaming video such as broadcast video) within a single SP CoS. These protocols behave differently under congestion. Some UDP applications may have application-level windowing, flow control and retransmission capabilities but most UDP transmitters are oblivious to drops and don’t lower transmission rates due to dropping.

When TCP and UDP share a SP CoS and that class experiences congestion, the TCP flows continually lower their transmission rates, potentially giving up their bandwidth to UDP flows that are oblivious to drops. This is called TCP starvation/UDP dominance.

Even if enabling WRED the same behavior would be seen because WRED (primarily) manages congestion only on TCP-based flows.

Re-Marking and Restoring Markings
Recommendation:

  • Remark application classes on CE edge on egress (as required)
  • Restore markings on the CE edge on ingress via deep packet inspection policies (as required)

If packets need to be remarked to fit with the SP CoS model, do it at the CE edge on egress. This requires less of an effort than doing it in the campus.

To restore DSCP markings, traffic can be classified on ingress on the CE edge via DPI.

MPLS VPN QoS Roles

CE LAN edge:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Ingress NBAR2 classification and marking policies may be applied
  • Ingress Medianet metadata classification and marking policies may be applied
  • Egress LLQ/CBWFQ/WRED policies may be applied (if required)

CE VPN edge:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Ingress NBAR2 classification and marking policies may be applied (to restore markings lost in transit)
  • Ingress Medianet metadata classification and marking policies may be applied (to restore markings lost in transit)
  • RSVP policies may be applied
  • Egress LLQ/CBWFQ/WRED policies should be applied
  • Egress hierarchical shaping with nested LLQ/CBWFQ/WRED policies may be applied
  • Egress DSCP remarking policies may be applied (used to map application classes into specific SP CoS)

PE customer-facing edge:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Ingress policing policies to meter customer traffic should be applied
  • Ingress MPLS tunneling mode policies may be applied
  • Egress MPLS tunneling mode policies may be applied
  • Egress LLQ/CBWFQ/WRED policies should be applied

PE core-facing edge:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Ingress policing policies to meter customer traffic should be applied
  • Egress MPLS EXP-based LLQ/CBWFQ policies should be applied
  • Ergess MPLS EXP-based WRED policies may be applied

P edges:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Egress MPLS EXP-based LLQ/CBWFQ policies may be applied
  • Egress MPLS EXP-based WRED policies may be applied

IPSEC QoS Design

Tunnel Mode

Default IPSEC mode of operation on Cisco IOS routers. The entire IP packet is protected by IPSEC, the sending VPN router encrypts the entire original IP packet and adds a new IP header to the packet. It supports multicast and routing protocols.

Transport Mode

Often used for encrypting peer-to-peer communications, does not encase the original IP packet into a new packet. Only the payload is encrypted while the original IP header is preserved, in effect being copied to outside of the new IP packet. Because the header is left intact its not possible to do multicast or routing protocols in transport mode.

IPSEC with GRE

GRE can be used to enable VPN services that connect disparate networks. It’s a key building block when using VRF Lite, a technology allowing related Virtual Routing and Forwarding (VRF) instances running on different routers to be interconnected across an IP network, while maintaining their separation from both the global routing table and other VRFs.

When using GRE as a VPN technology, it is often desirable to encrypt the GRE tunnel so that privacy and authentication of the connection can be ensured. GRE can be used with IPSEC tunnel mode or transport mode but if the tunnel transits a NAT or PAT device, tunnel mode is required.

Remote-Access VPNs

Cisco’s primary remote-access VPN client is AnyConnect Secure Mobility Client, which supports both IPSEC and Secure Sockets Layer (SSL) encryption.

Anyconnect uses Data Transport Layer Security (DTLS) to optimize real-time flows over SSL encrypted tunnel. Anyconnect connects to remote headend concentrator (such as an ASA firewall) through TCP-based SSL. All traffic from the client including voice, video and data traverses the SSL TCP connection. When TCP loses packets it pauses and waits for them to be resent, this is not good for real-time UDP based packets.

DTLS is a datagram technology, meaning it uses UDP packets instead of TCP. After Anyconnect establishes the TCP SSL tunnel it also establishes an UDP-based DTLS tunnel which is reserved for the use of real-time applications. This allows RDP voice and video packets to be sent unhindered. In case of packet loss, the session does not pause.

The decision on which tunnel to send the packets to is dynamic and made by the Anyconnect client.

QoS Classification of IPsec Packets
Recommendation:

  • Understand the default behavior of Cisco VPN routers to copy the ToS byte from the inner packet to the VPN packet header

Cisco routers by default copy the the ToS field from the original IP packet and write it into the new IPSEC packet header, thus allowing classification to still be accomplished by matching DSCP values. The same holds true for GRE packets as well. The IP packet is encrypted so it’s not possible to match on other fields such as IP addresses, ports, protocol and so on without using another feature.

The IOS Preclassify Feature
Recommendations:

  • Be aware of the limitations of QoS classification when using something other than the ToS byte
  • Use the IOS preclassify feature for all non ToS types of QoS classification
  • As a best practice, enable this feature for all VPN connections

Normally tunneling and encryption takes place before QoS classification in the order of operations, QoS preclassify reverses the order so that classification can be done on the IP header before it gets encrypted. Actually the order isn’t really reversed but the router clones the original IP header and keeps it in memory so that it can be used for QoS classification after tunneling and encryption.

This feature is only applicable on the encrypting routers outbound interface (physical or tunnel). Downstream routers can’t make decisions on the header because the packet will be encrypted at that point. Always enable the feature since tests have shown that it has very little impact on the routers performance to enable it.

MTU Considerations
Recommendations:

  • Be aware that MTU issues can severely impact network connectivity and the quality of user experience in VPN networks

When tunneling technologies are used there is always the risk of exceeding the MTU somewhere in the path. Unless jumbo frames are available end-to-end, MTU issues will almost always need to be addressed when dealing with any kind of VPN technology. Common symptoms when having MTU issues is that applications using small packets such as voice work but not e-mail, file server connections and many other applications.

Path MTU Discovery (PMTUD) can be used to discover what the MTU is along the path but it relies on ICMP messages which may be blocked on intermediary devices.

TCP Adjust-MSS

TCP Maximum Segment Size (MSS) is the maximum amount of payload data that a host is willing to accept in a single TCP/IP datagram. During a TCP connection setup between two hosts (TCP SYN), the MSS for each side of the connection is reported to each other. It’s the responsibility of the sending host to limit the size of the datagram to a value less than or equal to the receiving hosts MSS.

For an IP packet that is 1500 bytes and using TCP, the MSS is 1460 bytes, 20 bytes for IP and 20 bytes for TCP excluded from the 1500 byte packet.

Two hosts may not be aware they are communication through a tunnel and send a TCP SYN with MSS 1460 but the MTU may be lower. TCP Adjust-MSS can rewrite the MSS of the SYN packet so that when the receiving hosts gets it, the value is set to something lower to be able to send traffic through the tunnel without fragmentation. The receiving host will then reply with this value to the sender host. The router is acting as a middleman for the TCP session.

When using IPSEC over GRE, a MTU of 1378 bytes can be used:

  • Original IP packet = 1500 bytes
  • Subtract 20 bytes for IP header = 1480 bytes
  • Subtract 20 bytes for IP header = 1460 bytes
  • Subtract 24 bytes for GRE header = 1436 bytes
  • Subtract a maximum of 58 bytes for IPSEC = 1378 bytes

Adjusting MSS is a CPU intensive process. Enable it at remote sites rather than headend since it might be terminating a lot of tunnels. Adjusting MSS only needs to be done at one point in the path.

TCP Adjust-MSS only has impact on TCP packets, UDP packets are less likely to be of large size compared to TCP.

Compression Strategies Over VPN
Recommendations:

  • Compression can improve overall throughput, latency and user experience on VPN connections
  • Some compression technologies tunnel and may hide the fields used for QoS classification

TCP Optimization Using WAAS

Wide Area Application Services (WAAS) is a WAN accelerator, it uses compression technologies such as LZ compression, Date Redundancy Elimination (DRE) and specific Application Optimizers (AO). This significantly reduces the amount of data send over the WAN or VPN. For a technology like WAAS to work, the compression must take place before encryption.

Compression technologies can have a significant effect on the QoE but it works mainly for TCP traffic. Some WAN acceleration solution may break classification if the traffic is tunnel so that the original IP header is obfuscated. WAAS only compresses the data partion of the packet and keeps the header intact leaving the ToS byte available for classification.

Using Voice Codecs over a VPN Connection

To improve voice quality over bandwidth constrained VPN links, administrators may use compression codecs such as ILBC or G.729.

G.729 uses about a third of the bandwidth of G.711 but this also increases the effect of packet loss since more data is lost in every packet. To overcome this when the a packet is lost and the jitter buffer expires, the voice from the previous packet can be replayed to hide the gap, essentially tricking the listener. Through this technology, up to 5% of packet loss can be acceptable.

Internet Low Bitrate Codec (ILBC) uses 15.2 Kbit/s or 13.33 Kbit/s and performs similarly to G.729, the Mean Opinion Score (MOS) for ILBC is significantly better though when there is packet loss.

Compress Real-Time Protocol (cRTP) is not compatible with IPSEC because the packets are already encrypted when cRTP would try to compress them.

Antireplay Implications
Recommendation:

  • Antireplay drops may introduce in an IPSEC VPN network with QoS enabled

When ESP authentication is configured in an IPSEC transform set, every Security Association (SA) keeps a 64-packet sliding window where it checks the incoming sequence number of the encrypted packets. This is to stop from someone replaying packets and is called connectionless integrity. If packets arrive out of order due to queuing it must fit inside the window or the packet will be drop and seen as antireplay error. A data packet may get stuck behind voice in a queue so that it misses to fit inside its sliding window and then the packet would get dropped. To overcome this use a line in the ACL for every type of traffic such as voice, data, video. This will create a SA for each type of traffic.

TCP will be affected by packet loss, it will not know that the packets are dropped due to antireplay.

Antireplay drops are around 1 to 1.5% on congested VPN links with queuing enabled. A CBWFQ policy will often hold 64 packets per queue, decreasing this will lead to fewer antireplay drops as the packets are dropped before traversing the VPN but it may also increase the CPU usage.

DMVPN QoS Design

DMVPN offers some advantages regarding QoS compared to IPSEC, such as the following:

  • Reduction of overall hub router QoS configuration
  • Scalability to thousands of sites, with QoS for each tunnel on the hub router
  • Zero-touch QoS support on the hub router for new spokes
  • Flexibility of both hub and spoke and spoke to spoke (full mesh) deployment models

DMVPN Building Blocks

mGRE: Multi-point GRE allows a single tunnel interface to server a large number of remote spokes. One outbound QoS policy can be applied instead of one per tunnel as with normal GRE which is point-to-point.

Dynamic discover of IPSEC tunnel endpoints and crypto profiles: Dynamic creation of crypto maps, no need to statically build crypto map for each tunnel endpoint.

NHRP: Allows spoke to be configured with dynamically configured IP address. Also enables zero-touch deployment that makes DMVPN spokes easy to set up. Think of the hub router as a “next-hop server” rather than a traditional VPN router. NHRP is also used for per tunnel QoS feature.

The Per-Tunnel QoS for DMVPN Feature

Allows the administrator to enable QoS on a per-tunnel or per-spoke basis. QoS policy is applied to the mGRE tunnel interface. This protects spokes from each other and keeps one spoke from using all the BW so that there is none left for the others. The QoS policy at the hub is automatically generated for each tunnel when a spoke registers with the hub.

Queuing only kicks in when there is congestion, to signal to the routers QoS mechanism that there is congestion a shaper is used. Shape the traffic flows to the real VPN tunnel bandwidth to produce artificial back pressure. With per-tunnel QoS for DMVPN, a shaper is automatically applied by the system to each and every tunnel. This allows the router to implement differentiated services for the various data flows corresponding to each tunnel. This technique is called Hierarchical Queuing Framework (HQF).

Using NHRP, multiple spokes can be grouped together to use the same QoS policy.

This technique provides QoS in the egress direction of the hub towards the spokes. For QoS from the spokes to the hub, a QoS policy needs to be applied at the spokes.

At this time it is not possible to have an unique policy for traffic between spoke to spoke due to spokes not having access to the NHRP database.

GET VPN QoS Design

Group Encrypted Transport (GET) VPN is a technology to encrypt traffic between IPSEC endpoints without the use of tunnels. Packets transmitted use IPSEC tunnel mode but it is not defined by traditional IPSEC SA.

Because there are no tunnels, the QoS configuration is simplified.

GET VPN QoS Overview

DMVPN is suitable for hub and spoke VPNs over a public untrusted network such as the Internet, GET VPN is suitable for private networks such as a MPLS VPN. A MPLS VPN is private but not encrypted and GET VPN can encrypt the traffic between the MPLS sites. GET VPN has no real concept of hub and spoke, which simplifies the QoS architecture. There is not one major hub aggregating all the remote sites and being liable to massive oversubscription.

These are some of the major differences between DMVPN and GET VPN model:

Choosing VPN

Group Domain of Interpretation (GDOI)

GDOI is a technology that supports any to any IPSEC VPN without the use of tunnels. There is no concept of SA between specific routers, instead it uses a group SA which is used by all the encrypting nodes in the network. There is no per tunnel QoS needed since it does not use tunnels, QoS is simply applied egress on each GET VPN router.

GDOI control plane protocol uses UDP port 848 and ISAKMP on port UDP 500. These packets are normally marked DSCP CS6 by the router.

IP Header Preservation

Normally with IPSEC tunnel mode the ToS byte is copied to the new IP header but the original IP header is not preserved. On a public network such as the Internet it makes good sense to hide the source and destination IP addresses but GET VPN is deployed on MPLS networks which are private.

GET VPN keeps the original IP header intact which simplifies QoS, dynamic routing and multicast. The packet is still considered an ESP IPSEC packet, not TCP or UDP, so to classify based on port numbers the QoS preclassify feature will still be needed.

How and When to Use the QoS Preclassify Feature
Design principles:

  • If classification is based on source or destination IP, preclassify is not needed but still recommended
  • If classification is based on TCP or UDP port numbers, QoS preclassificy is needed
  • Enable the QoS preclassify feature in GET VPN deployments

A Case for Combining GET VPN and DMVPN

DMVPN has some drawbacks, spoke to hub tunnel is always up but spoke to spoke tunnels are dynamically brought up. This causes a delay which can take a second or two and may have negative impact on real-time traffic. The delay is not caused by NHRP or the packetization of the GRE tunnel but rather the exchange of ISAKMP messaging and the establishment of the IPSEC SAs between the routers.

DMVPN could then be used solely for setting up GRE tunnels and GET VPN for encryption of the packets going into the tunnel. This then allows for fast establishment of tunnels and encrypting the packets, increasing the overall user experience.

Working with Your Service Provider When Deploying GET VPN
Design principles:

  • Ensure that the service provider handles DSCP consistently troughout the MPLS WAN network
Categories: CCDE, QoS Tags: , , , , , ,

Borrowing Credits When Using Shaper on Cisco IOS

April 15, 2014 4 comments

Introduction

When using a shaper on IOS, the shaper allows a deficit to be created, borrowing
future credits. It’s common knowledge that a shaper queues or buffers packets but
it’s not common knowledge that the shaper allows a deficit to be created.

To demonstrate the concepts I have setup a very simple network with two routers
connected by a FastEthernet link.

Their clocks have been synchronized to show the timing of the events going on.

This post assumes prior knowledge of QoS with regard to concepts such as Bc, Be
CIR and Tc.

Using a Policer

A policer does not allowed a deficit to be created. This can be proven very easily.
To prove the concept a single rate, two color policer will be used. A two color
policer does not have a Be bucket so no tokens will be spilled over from the Bc
bucket.

The Bc bucket starts out full. When a packet arrives, the packet size is compared
to the number of tokens (bytes) in the Bc bucket. If the packet fits then the appropriate
number of tokens is taken from the Bc bucket and the packet is sent on its way.

The next time a packet arrives, the number of tokens in the bucket will depend on the
time interval between the packets. This is in contrast to a shaper that submits tokens
to the bucket at fixed intervals.

A policer does not allow a deficit to be created. A policer is created with a Bc value
of 1000 bytes. The CIR is set to 10 kbit/s. With such a low value for Bc it means that
any packets with a size over 1000 bytes will be dropped.

R1#sh policy-map
  Policy Map POLICER
    Class class-default
     police cir 10000 bc 1000
       conform-action transmit 
       exceed-action drop
R1#ping 10.0.0.2 size 1000

Type escape sequence to abort.
Sending 5, 1000-byte ICMP Echos to 10.0.0.2, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

No packets made it through due to the size of the packet being 1000 bytes payload,
20 bytes of IP and 8 bytes of ICMP which is more than 1000 bytes in total.
The policer does not allow a deficit to be created so all packets had to be dropped.

If we ping with a 972 byte payload some packets should make it through.

R1#ping 10.0.0.2 size 972 ti 1

Type escape sequence to abort.
Sending 5, 972-byte ICMP Echos to 10.0.0.2, timeout is 1 seconds:
!.!.!
Success rate is 60 percent (3/5), round-trip min/avg/max = 32/46/68 ms

The policer shows that some packets have exceeded.

R1#sh policy-map int f0/0
 FastEthernet0/0 

  Service-policy output: POLICER

    Class-map: class-default (match-any)
      35 packets, 12728 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
      Match: any 
      police:
          cir 10000 bps, bc 1000 bytes
        conformed 5 packets, 3138 bytes; actions:
          transmit 
        exceeded 7 packets, 7042 bytes; actions:
          drop 
        conformed 0 bps, exceed 0 bps

While sending the packets I had debugs going on both devices. This is the timing
of the event.

Apr 15 12:08:10.183: IP: tableid=0, s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), routed via FIB
Apr 15 12:08:10.187: IP: s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), len 972, sending
Apr 15 12:08:10.247: IP: tableid=0, s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), routed via RIB
Apr 15 12:08:10.247: IP: s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), len 972, rcvd 3

A packet is sent at 10.183 and received at 10.247. A look at R2 confirms that
it sent the packet at 10.195.

Apr 15 12:08:10.195: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1

The next packet is sent at 10.255 but this does not make it through the policer.

Apr 15 12:08:10.255: IP: tableid=0, s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), routed via FIB
Apr 15 12:08:10.259: IP: s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), len 972, sending

With a CIR of 10 kbit/s, we can only send 1250 bytes every second.

The router then waits for the ICMP packet to timeout which was set to one second.
Then the next packet is sent at 11.255 and received at 11.287.

Apr 15 12:08:11.255: IP: tableid=0, s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), routed via FIB
Apr 15 12:08:11.255: IP: s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), len 972, sending
Apr 15 12:08:11.287: IP: tableid=0, s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), routed via RIB
Apr 15 12:08:11.287: IP: s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), len 972, rcvd 3

Output from the other router shows it was sent at 11.255.

Apr 15 12:08:11.255: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1

It is clear that a policer does not allow a deficit, either the packet makes it
through or it is dropped.

Using a Shaper

A shaper allows a deficit to be created. This can be proven by creating a shaper
that uses only Bc and no Be. If a packet is sent with a size larger than Bc it
should in theory be dropped. This is however not the case. The following shaper
is used.

R1#sh policy-map
  Policy Map SHAPER
    Class class-default
      Traffic Shaping
         Average Rate Traffic Shaping
         CIR 10000 (bps) Max. Buffers Limit 1000 (Packets)
         Bc 8000 Be 0

If a shaper does not allow a deficit then all packets larger than 1000 bytes should
be dropped.

R1#ping 10.0.0.2 size 972 ti 1

Type escape sequence to abort.
Sending 5, 972-byte ICMP Echos to 10.0.0.2, timeout is 1 seconds:
!!!!.
Success rate is 80 percent (4/5), round-trip min/avg/max = 32/409/808 ms

Almost all packets made it through which could be due to buffering but let’s
have a look at the timing of what happened.

Apr 15 12:19:45.683: IP: tableid=0, s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), routed via FIB
Apr 15 12:19:45.687: IP: s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), len 972, sending
Apr 15 12:19:45.775: IP: tableid=0, s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), routed via RIB
Apr 15 12:19:45.775: IP: s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), len 972, rcvd 3

The Bc bucket starts out full so the packet is immediately transmitted.
Packet was sent at 45.683 and received at 45.775. We confirm with output
from the other router.

Apr 15 12:19:45.714: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1

The interesting part is that R1 sent its second packet at 45.783.

Apr 15 12:19:45.783: IP: tableid=0, s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), routed via FIB
Apr 15 12:19:45.787: IP: s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), len 972, sending
Apr 15 12:19:45.811: IP: tableid=0, s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), routed via RIB
Apr 15 12:19:45.815: IP: s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), len 972, rcvd 3

This packet was then received at 45.811. Once again output from the other router.

Apr 15 12:19:45.782: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1

R1 should not have been allowed to send this packet so quickly after the first one.
With our shaper applied it should have had to wait around 800ms before sending the
next one. However a deficit was created to allow sending the packet more quickly.

If we look at the five packets that R2 replied to we can see a pattern.

Apr 15 12:19:45.714: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1
Apr 15 12:19:45.782: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1
Apr 15 12:19:46.498: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1
Apr 15 12:19:47.298: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1
Apr 15 12:19:48.918: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1

The first two packets came in very quickly. Between packet two and three there is
a 716ms gap. Between three and four there is a 800ms gap. Between four and five
there is a 1620ms gap.

It is clear that at the end the router had to pay its dues.

Conclusion

Shapers on Cisco IOS allows a deficit to be created. This means that packets larger
than the size of the Bc bucket can be sent. The internals of this mechanism is only
known by Cisco.

What is the reason for this behavior? I can only speculate but it could be to try to
send packets rather than dropping them. What are your ideas?

Categories: QoS Tags: , , , ,

Catalyst QoS – A deeper look at the egress queues

October 8, 2012 2 comments

I’ve done a post earlier on Catalyst QoS. That described how to
configure the QoS features on the Catalyst but I didn’t describe
in detail how the buffers work on the Catalyst platform. In this
post I will go into more detail about the buffers and thresholds
that are used.

By default, QoS is disabled. When we enable QoS all ports
will be assigned to queue-set 1. We can configure up to two
different queue-sets.

sh mls qos queue-set 
Queueset: 1
Queue     :       1       2       3       4
----------------------------------------------
buffers   :      25      25      25      25
threshold1:     100     200     100     100
threshold2:     100     200     100     100
reserved  :      50      50      50      50
maximum   :     400     400     400     400
Queueset: 2
Queue     :       1       2       3       4
----------------------------------------------
buffers   :      25      25      25      25
threshold1:     100     200     100     100
threshold2:     100     200     100     100
reserved  :      50      50      50      50
maximum   :     400     400     400     400

These are the default settings. Every port on the Catalyst has
4 egress queues (TX). When a port is experiencing congestion
it needs to place the packet into a buffer. If a packet gets
dropped it is because there were not enough buffers to store it.

So by default each queue gets 25% of the buffers. The value is
in percent to make it usable across different versions of the Catalyst
since they may have different size of buffers. The ASIC will have
buffers of some size, maybe a couple of megs but this size is not known
to us so we have to use the percentages.

Of the buffers we assign to a queue we can make the buffers reserved.
This means that no other queue can borrow from these buffers. If we
compare it to CBWFQ it would be the same as the bandwidth percent command
because that guarantees X percent of the bandwidth but it may use more
if there is bandwidth available. The buffers work the same way. There is
a common pool of buffers. The buffers that are not reserved go into the
common pool. By default 50% of the buffers are reserved and the rest go
into the common pool.

There is a maximum how much buffers the queue may use and by default this
is set to 400% This means that the queue may use up to 4x more buffers than
it has allocated (25%).

To differentiate between packets assigned to the same queue the thresholds
can be used. You can configure two thresholds and then there is an implicit
threshold that is not configurable (threshold3). It is always set to the maximum the queue
can support. If a threshold is set to 100% that means it can use 100% of
the buffers allocated to a queue. It is not recommended to put a low value
for the thresholds. IOS enforces a limit of at least 16 buffers assigned
to a queue. Every buffer is 256 bytes which means that 4096 bytes are
reserved.

	 Q1% Q1buffer Q2% Q2buffer Q3% Q3buffer Q4% Q4buffer
buffers  25           25           25           25
Thresh1  100 50       100 50       100 50       100 50
Thresh2  100 50       100 50       100 50       100 50
Reserved 50  25       50  25       50  25       50  25
maximum  400 200      400 200      400 200      400 200

This table explains how the buffers works. Lets say that this port
on the ASIC has been assigned 200 buffers. Every queue gets 25% of the
buffers which is 50 buffers. However out of these 50 buffers only 50%
are reserved which means 25 buffers. The rest of the buffers go to the
common pool. The thresholds are set to 100% which means they can use 100%
of the allocated buffers to the queue which was 50 buffers. For packets
that go to threshold3 400% of the buffers can be used which means 200 buffers.
This means that a single queue can use up all the non reserved buffers
if the other queues are not using them.

To see which queue packets are getting queued to we can use the show
platform port-asic stats enqueue command.

Switch#show platform port-asic stats enqueue gi1/0/25
Interface Gi1/0/25 TxQueue Enqueue Statistics
Queue 0
Weight 0 Frames 2
Weight 1 Frames 0
Weight 2 Frames 0
Queue 1
Weight 0 Frames 3729
Weight 1 Frames 91
Weight 2 Frames 1894
Queue 2
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Queue 3
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 577

In this output we have the four queues with three thresholds. Note that queue 0
here is actually queue 1. Queue 1 is queue 2 and so on. Weight 0 is
threshold1, weight 1 is threshold2 and weight 3 is the maximum threshold.

We can also list which frames are being dropped. To do this we use the
show platform port-asic stats drop command.

Switch-38#show platform port-asic stats drop gi1/0/25
Interface Gi1/0/25 TxQueue Drop Statistics
Queue 0
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Queue 1
Weight 0 Frames 5
Weight 1 Frames 0
Weight 2 Frames 0
Queue 2
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Queue 3
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0

The queues are displayed in the same way here where queue 0 = queue 1.
This command can be good to find out if you are having packet loss for important
traffic like IPTV traffic or such that is dropping in a certain queue.

The documentation for Catalyst QoS can be a bit shady and by this post I
hope that you know have a better understanding how the egress queueing works.

Categories: Catalyst, CCIE, QoS Tags: , , ,

Catalyst QoS

March 7, 2012 22 comments

I’m back studying and I have already booked a new lab date.
I won’t make an announcement until I get back.
This is to keep a bit of the pressure off from taking the lab.

The last couple of days I have been studying Catalyst QoS. It can get a bit messy.

There are no Catalyst 3550’s in the lab any longer, only 3560. So when practicing
forget about 3550. If you have a 3750 that is fine since it is basically the same
switch as 3560 but with stacking.

The Catalyst has 2 ingress queues, one can be used for priority. The switch also
has 4 egress queues where one can be used for priority. It is more likely to end
up with congestion on egress than ingress but we have the option of configuring
both.

Lets assume that we have a switch with a Cisco IP phone connected to it.
The switchport will have a general configuration like the one below.

interface FastEthernet0/10
switchport mode access
switchport access vlan 10
switchport voice vlan 100

Even though the port is configured as access this is actually a form of trunk since
voice and data are using different VLANs. By default QoS is disabled. This means that
the switch will be transparent to the QoS markings. Whatever the phone, computers are
setting will be transparently forwarded through the switch. As soon as we turn on QoS
with the mls qos command this behaviour is no longer true.
If we just enable QoS and do nothing more than all markings will be set to BE (0).
This is true both for CoS and DSCP. To check if QoS is enabled use show mls qos.

Rack18SW1#sh mls qos
QoS is disabled
QoS ip packet dscp rewrite is enabled

If we trust the device connecting to a port, most likely a phone then we setup
a trust boundary. We can trust CoS, IP precedence or DSCP. CoS or DSCP will be
more common than precedence.
The CoS is a layer 2 marking, sometimes also called 802.1p priority bits.
CoS is only available in tagged frames like on a 802.1Q trunk. There is a risk
of loosing the marking when the frame gets forwarded through different media from
Ethernet to frame relay or PPP or whatever your links are running. Because of this
it makes much sense to either trust DSCP or use the CoS value to map to a DSCP value.
If we want to configure trusting of CoS on a port we configure it like this.

interface FastEthernet0/10
mls qos trust cos

This means that the CoS marking coming in to the port is trusted. Untagged frames
will receive BE treatment since there is no marking of those packets. If we want to
mark the untagged frames we use the following configuration.

interface FastEthernet0/10
mls qos trust cos
mls qos cos 3

All untagged frames will get a CoS marking of 3. What if we want the port to
mark all the packets the same no matter comes in to the port?
We can use the override command for this.

interface FastEthernet0/10
mls qos cos 1
mls qos cos override

This will effectively set the CoS value to 1 for all frames entering the port.
We can also use the switchport priority command to instruct the Cisco phone to set a
CoS marking on packets from the computer (data) entering the IP phone.

interface FastEthernet0/10
switchhport priority extend cos 1

This will set all frames from the computer entering the phone to have a
marking of 1 no matter what the computer tries to set them to.

It is important to know that the Catalyst switch uses a concept of an
internal QoS label. This is a DSCP value which is used internally and
will define into which queues the traffic ends up.
If you type show mls qos map you will see a lot of different maps that
the Catalyst uses. The CoS to DSCP map is used by the switch so if we trust
CoS then a DSCP value will be derived from that and when the frame is exiting
the switch to another switch then the CoS value will be set according
to the DSCP to QoS mapping table. This effectively keeps the QoS labels synchronized.

Now lets take a look at the ingress queues. We have two of them.
By default queue 2 will be the priority queue. To see the default settings
use the show mls qos input-queue command. We can manipulate which queue becomes
the priority queue and this is done with the
mls qos srr-queue input priority-queue bandwidth command.
If you want to use queue 1 as the priority queue then enter a 1 in the command.
The weight defines how much bandwidth the priority queue can use. By default it uses 10%
You can set this value from 0 to 40 so that it does not starve all of the bandwidth.

Rack18SW1#sh mls qos input-queue
Queue     :       1       2
----------------------------------------------
buffers   :      90      10
bandwidth :       4       4
priority  :       0      10
threshold1:     100     100
threshold2:     100     100
Rack18SW1#
Rack18SW1(config)#mls qos srr-queue input priority-queue 1 bandwidth ?
    enter bandwidth number [0-40]

The switch uses buffers if there is a need to queue packets. Remember the basic
function of QoS that without congestion there is no queueuing to begin with, only forwarding.
Unfortunately Cisco does not tell us a lot about how much buffers are available
in the Catalyst platforms. To tune the buffers we use mls qos srr-queue input buffers.
We should not assign too much of the buffers to the priority queue.
Finding optimal values depends a lot on your network and takes a lot of testing.
The safest bet might be to use Auto QoS and look what Cisco is using.
These values have been researched by Cisco and should be safe to use.
Lets temporarily enable Auto QoS and look which values we get.

Rack18SW2#sh mls qos input-queue
Queue     :       1       2
----------------------------------------------
buffers   :      67      33
bandwidth :      90      10
priority  :       0      10
threshold1:       8      34
threshold2:      16      66

With Auto QoS configured the priority queue gets 10% of bandwidth
and 33% of the buffers. The thresholds for the non priority queue are significantly
lower than the default settings. So the buffers assigns buffer space to the
queues but it does not say how much bandwidth is available to each queue.
We control this with mls qos srr-queue input bandwidth.
It is important to note here that the priority queue gets served first and then a
Shared Round Robin (SRR) algorithm is used to divide the traffic between the
two queues according to the weights. These are just weights and not necessarily
percentages although you could configure it to be.

Rack18SW1(config)#mls qos srr-queue input bandwidth ?
    enter bandwidth weight for queue id 1

If we look at show mls qos map cos-input-q and show mls qos dscp-input-q
we can see the maps that are used to define which queue the traffic ends up in.
We can of course set these values according to our needs.

 Rack18SW1#sh mls qos map cos-input-q
   Cos-inputq-threshold map:
              cos:  0   1   2   3   4   5   6   7
              ------------------------------------
  queue-threshold: 1-1 1-1 1-1 1-1 1-1 2-1 1-1 1-1

Everything is by default mapped to queue 1 except for CoS 5 which is
mapped to queue 2. The general idea is to map VoIP to queue 2 and everything
else in queue 1. Lets look at the DSCP table as well.

Rack18SW1#sh mls qos map dscp-input-q
   Dscp-inputq-threshold map:
     d1 :d2    0     1     2     3     4     5     6     7     8     9
     ------------------------------------------------------------
      0 :    01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01
      1 :    01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01
      2 :    01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01
      3 :    01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01
      4 :    02-01 02-01 02-01 02-01 02-01 02-01 02-01 02-01 01-01 01-01
      5 :    01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01
      6 :    01-01 01-01 01-01 01-01

To read this table start by reading from the left column and combining
that number with a number to a row on the right. Almost everything is mapped
to queue 1 except for DSCP 40-47 which is mapped to queue 2.

Up until now we have only discussed queues. The catalyst switch also uses
a congestion avoidance mechanism that is called Weighted Tail Drop (WTD).
The switch has three thresholds for every queue where the third threshold
is not configurable, it is always set to 100%
We can set the other two thresholds to values of our liking. Now we will
map CoS 6 to queue 2, threshold 3. We don’t want this traffic to get dropped unless
there is no other option.

Rack18SW1(config)#mls qos srr-queue input cos-map queue 2 threshold 3 ?
    8 cos values separated by spaces

Rack18SW1(config)#mls qos srr-queue input cos-map queue 2 threshold 3 6

Always confirm your result with the show mls qos map command.

Rack18SW1#sh mls qos map cos-input-q
   Cos-inputq-threshold map:
              cos:  0   1   2   3   4   5   6   7
              ------------------------------------
  queue-threshold: 1-1 1-1 1-1 1-1 1-1 2-1 2-3 1-1

Now lets try to map DSCP EF to queue 1, threshold 3.

Rack18SW1(config)#mls qos srr-queue input dscp-map queue 1 threshold 3 ?
    dscp values separated by spaces (up to 8 values total)

Rack18SW1(config)#mls qos srr-queue input dscp-map queue 1 threshold 3 46

That covers the ingress queues. Note that all commands will affect all
ports on the switch, there is no way of setting port specific QoS settings for input queues.

Now lets look at our options for egress queues. We have four queues
where every queue has three thresholds. We start by looking at the default settings.

Rack18SW1#sh mls qos map cos-output-q
   Cos-outputq-threshold map:
              cos:  0   1   2   3   4   5   6   7
              ------------------------------------
  queue-threshold: 2-1 2-1 3-1 3-1 4-1 1-1 4-1 4-1
Rack18SW1#sh mls qos map dscp-output-q
   Dscp-outputq-threshold map:
     d1 :d2    0     1     2     3     4     5     6     7     8     9
     ------------------------------------------------------------
      0 :    02-01 02-01 02-01 02-01 02-01 02-01 02-01 02-01 02-01 02-01
      1 :    02-01 02-01 02-01 02-01 02-01 02-01 03-01 03-01 03-01 03-01
      2 :    03-01 03-01 03-01 03-01 03-01 03-01 03-01 03-01 03-01 03-01
      3 :    03-01 03-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01
      4 :    01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01 04-01 04-01
      5 :    04-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01
      6 :    04-01 04-01 04-01 04-01

The egress queueing is a bit more flexible. With the SRR algorithm we
can do some port specific bandwidth control. When it comes to egress queues
we can either shape or share a queue. A shaped queue is guaranteed an amount
of bandwidth but is also policed to that value. Even if there is no
congestion that queue still can’t use more bandwidth then it has been assigned.
The shaped value is calculated against the physical interface speed.
Look at the following command.

Rack18SW1(config-if)#srr-queue bandwidth shape 25 0 0 0

How much bandwidth did we just assign to queue 1? 25 Mbit?
We assigned 4 Mbit since (1/25)*100 = 4. When we set the other queues to 0
this means that they are operating in shared mode instead of shaped.
Now we configure the three other queues.

Rack18SW1(config-if)#srr-queue bandwidth share 33 33 33 33

How much does every queue get? Notice I put a 33 for queue 1 but
that will do nothing since it is operating in shaped mode. That leaves
us with the other three queues. To calculate their share we use
(33/33+33+33)*96 = 32 Mbit. So these values are just a weight
and we have to subtract the value from the shaped queue when calculating
how much the other queues get. When operating in shared mode if one
queue is not using all of its bandwidth the other queues may cut into this.
This is different compared to the shaped mode.

Assigning values to the egress queue depending on CoS or DSCP works
the same way as for ingress.

Rack18SW1(config)#mls qos srr-queue output cos-map queue 3 threshold 3 ?
    8 cos values separated by spaces

Rack18SW1(config)#mls qos srr-queue output cos-map queue 3 threshold 3 5
Rack18SW1(config)#mls qos srr-queue output dscp-map queue 2 threshold 3 ?
    dscp values separated by spaces (up to 8 values total)

Rack18SW1(config)#mls qos srr-queue output dscp-map queue 2 threshold 3 46

The egress queues also uses buffers. These can be tuned by configuring a
queue-set. By default all ports will use queue-set 1 with these settings.

Rack18SW1#show mls qos queue-set
Queueset: 1
Queue     :       1       2       3       4
----------------------------------------------
buffers   :      25      25      25      25
threshold1:     100     200     100     100
threshold2:     100     200     100     100
reserved  :      50      50      50      50
maximum   :     400     400     400     400

We can configure one of our own queue-sets and tell a port to use this instead.

Rack18SW1(config)#mls qos queue-set output 2 buffers 10 10 40 40

We can also configure thresholds and how much of the buffers are reserved.

Rack18SW1(config)#mls qos queue-set output 2 threshold ?
    enter queue id in this queue set

Rack18SW1(config)#mls qos queue-set output 2 threshold 1 ?
    enter drop threshold1 1-3200

Rack18SW1(config)#mls qos queue-set output 2 threshold 1 50 ?
    enter drop threshold2 1-3200

Rack18SW1(config)#mls qos queue-set output 2 threshold 1 50 200 ?
    enter reserved threshold 1-100

Rack18SW1(config)#mls qos queue-set output 2 threshold 1 50 200 75 ?
    enter maximum threshold 1-3200

Rack18SW1(config)#mls qos queue-set output 2 threshold 1 50 200 75 300

Then we need to actually assign the queue-set to an interface.

Rack18SW1(config)#interface FastEthernet0/10
Rack18SW1(config-if)#queue-set 2

We can check our settings with show mls qos queue-set

Rack18SW1#sh mls qos queue-set 2
Queueset: 2
Queue     :       1       2       3       4
----------------------------------------------
buffers   :      10      10      40      40
threshold1:      50     200     100     100
threshold2:     200     200     100     100
reserved  :      75      50      50      50
maximum   :     300     400     400     400

The buffer values can be a bit confusing. First we define how big a share
the queue gets of the buffers in percent. The thresholds will define when
traffic will be dropped. For queue 1 we start dropping traffic in threshold 1 at 50%
Then we drop traffic in threshold 2 at 200%
How can a queue get to 200%?! The secret here is that a queue can outgrow
the buffers we assign if there are buffers available in the common pool.
This is where the reserved values comes into play.
Every queue gets assigned buffers but we can define that only 50% of
these buffers are strictly reserved for the queue.
The other 50% goes into a common pool and can be used by the other queues as well.
We then set a maximum value for the queue which says that it can grow
up to 400% but no more than that.

Early in this post I talked about the priority queue for egress queues.
This how we enable it.

Rack18SW1(config)#int f0/10
Rack18SW1(config-if)#priority-queue out

This will always be queue 1 and is not configurable.

Now lets move on to some other things we can do with QoS. Lets assume that we
have a customer connecting to switch and internally they are using totally different
DSCP values than we want. We can use a DSCP mutation map for that.

Rack18SW1(config)#mls qos map dscp-mutation MUTATE 40 ?
    DSCP values separated by spaces (up to 8 values total)
  to      to keyword

Rack18SW1(config)#mls qos map dscp-mutation MUTATE 40 to 46
Rack18SW1(config)#int f0/10
Rack18SW1(config-if)#mls qos trust dscp
Rack18SW1(config-if)#mls qos dscp-mutation MUTATE

So in this example we are mutating DSCP 40 (CS5) to DSCP 46 (EF).

We also have the option of using policy maps just like on routers.
And we can even police traffic. This policy-map will match all ICMP and police
it to 128k with a marking of EF, any exceeding traffic will be remarked
to DSCP 0.

Rack18SW1(config)#ip access-list extended ICMP
Rack18SW1(config-ext-nacl)#permit icmp any any
Rack18SW1(config-ext-nacl)#class-map CM_ICMP
Rack18SW1(config-cmap)#match access-group name ICMP
Rack18SW1(config-cmap)#policy-map POLICE
Rack18SW1(config-pmap)#class CM_ICMP
Rack18SW1(config-pmap-c)#police 128000 32000 ex
Rack18SW1(config-pmap-c)#police 128000 32000 exceed-action policed-dscp-transmit
Rack18SW1(config-pmap-c)#set dscp ef
Rack18SW1(config-pmap-c)#exit
Rack18SW1(config-pmap)#exit
Rack18SW1(config)#mls qos map policed-dscp 46 ?
    DSCP values separated by spaces (up to 8 values total)
  to      to keyword
Rack18SW1(config)#mls qos map policed-dscp 46 to 0
Rack18SW1(config)#int f0/10
Rack18SW1(config-if)#service-policy input POLICE

If you are used to configuring MQC on routers then you will be surprised
to know that the show policy-map is not working for switches.
We need to use show mls qos int statistics instead.

Rack18SW1#sh mls qos int f0/10 statistics
FastEthernet0/10 (All statistics are in packets)

  dscp: incoming
-------------------------------

  0 -  4 :           0            0            0            0            0
  5 -  9 :           0            0            0            0            0
 10 - 14 :           0            0            0            0            0
 15 - 19 :           0            0            0            0            0
 20 - 24 :           0            0            0            0            0
 25 - 29 :           0            0            0            0            0
 30 - 34 :           0            0            0            0            0
 35 - 39 :           0            0            0            0            0
 40 - 44 :           0            0            0            0            0
 45 - 49 :           0            0            0            0            0
 50 - 54 :           0            0            0            0            0
 55 - 59 :           0            0            0            0            0
 60 - 64 :           0            0            0            0
  dscp: outgoing
-------------------------------

  0 -  4 :           0            0            0            0            0
  5 -  9 :           0            0            0            0            0
 10 - 14 :           0            0            0            0            0
 15 - 19 :           0            0            0            0            0
 20 - 24 :           0            0            0            0            0
 25 - 29 :           0            0            0            0            0
 30 - 34 :           0            0            0            0            0
 35 - 39 :           0            0            0            0            0
 40 - 44 :           0            0            0            0            0
 45 - 49 :           0            0            0            0            0
 50 - 54 :           0            0            0            0            0
 55 - 59 :           0            0            0            0            0
 60 - 64 :           0            0            0            0
  cos: incoming
-------------------------------

  0 -  4 :           2            0            0            0            0
  5 -  7 :           0            0            0
  cos: outgoing
-------------------------------

  0 -  4 :           0            0            0            0            0
  5 -  7 :           0            0            0
Policer: Inprofile:            0 OutofProfile:            0

This is a huge table showing how much traffic with different markings
are coming in and going out. At the very end you see the Policer which
shows how much is in profile and out of profile.

So that is one way of configuring policy-maps. When using the
catalyst switches they can use QoS in either VLAN based mode or port based mode.
If we use VLAN based mode we will apply the policy to a SVI instead.
This might be more scalable depending on your setup.
The caveat with using a policy-map on SVI is that you can’t police in the parent map.
You need a child map for that. Lets look at an example using a parent and child map.
Any IP traffic from trunks may use 256k (Fa0/13 – 21) and traffic from Fa0/6 will
be restricted to 56k. This will all be configured for VLAN 146.

Rack18SW1(config)#int range fa0/13 -21 , fa0/6
Rack18SW1(config-if)#mls qos vlan-based
Rack18SW1(config-if)#exit
Rack18SW1(config)#ip access-list extended IP_ANY
Rack18SW1(config-ext-nacl)#permit ip any any
Rack18SW1(config-ext-nacl)#class-map CM_IP_ANY
Rack18SW1(config-cmap)#match access-group name IP_ANY
Rack18SW1(config-cmap)#class-map CM_TRUNKS
Rack18SW1(config-cmap)#match input-interface fa0/13 - fa0/21
Rack18SW1(config-cmap)#class-map CM_R6
Rack18SW1(config-cmap)#match input-interface fa0/6
Rack18SW1(config-cmap)#policy-map CHILD
Rack18SW1(config-pmap)#class CM_TRUNKS
Rack18SW1(config-pmap-c)#police 256000 32000
Rack18SW1(config-pmap-c)#class CM_R6
Rack18SW1(config-pmap-c)#police 56000 28000
Rack18SW1(config-pmap-c)#policy-map PARENT
Rack18SW1(config-pmap)#class CM_IP_ANY
Rack18SW1(config-pmap-c)#service-policy CHILD
Rack18SW1(config-pmap-c)#set dscp cs1
Rack18SW1(config-pmap-c)#int vlan 146
Rack18SW1(config-if)#service-policy input PARENT

The final thing I want to show is how to use an aggregate policer.
We can use this if we want several classes to share a bandwidth instead
of setting bandwidth per class. Take a look at this.

Rack18SW1(config)#ip access-list extended ICMP
Rack18SW1(config-ext-nacl)#permit icmp any any
Rack18SW1(config-ext-nacl)#ip access-list extended HTTP
Rack18SW1(config-ext-nacl)#permit tcp any eq www any
Rack18SW1(config-ext-nacl)#permit tcp any any eq www
Rack18SW1(config-ext-nacl)#class-map CM_ICMP
Rack18SW1(config-cmap)#match access-group name ICMP
Rack18SW1(config-cmap)#class-map CM_HTTP
Rack18SW1(config-cmap)#match access-group name HTTP
Rack18SW1(config-cmap)#exit
Rack18SW1(config)#mls qos aggregate-policer AGG256k ?
    Bits per second (postfix k, m, g optional; decimal point
                      allowed)

Rack18SW1(config)#mls qos aggregate-policer AGG256k 256000 ?
    Normal burst bytes

Rack18SW1(config)#mls qos aggregate-policer AGG256k 256000 32000 ?
  exceed-action  action when rate is exceeded

Rack18SW1(config)#$regate-policer AGG256k 256000 32000 exceed-action drop
Rack18SW1(config)#policy-map AGG_POLICER
Rack18SW1(config-pmap)#class CM_ICMP
Rack18SW1(config-pmap-c)#police aggre
Rack18SW1(config-pmap-c)#police aggregate AGG256k
Rack18SW1(config-pmap-c)#class CM_HTTP
Rack18SW1(config-pmap-c)#police aggregate AGG256k
Rack18SW1(config-if)#int f0/2
Rack18SW1(config-if)#service-policy input AGG_POLICER

And before I leave you there is this final thing I want to show.
That is how to limit the egress traffic on an interface with the SRR command.
Let’s say that We have a 100 Mbit interface but customer only pays for 4 Mbit.
We can use this command.

Rack18SW1(config-if)#srr-queue bandwidth limit ?
    enter bandwidth limit for interface  as percentage

The lowest we can set is 10 though. If we have a port running at
100 Mbit that will leave us with 10 Mbit. We can set the port to 10 Mbit
and configure this to 40% Then we will achieve the value that was requested.

This post turned out to be very long but I hope it has been informative
and I know for sure it helped me to solidify the concepts.
I hope it will help a lot of other people as well.

QoS studies

September 12, 2011 2 comments

Did some more QoS labs yesterday. I have completed roughly 50/80 labs so far. Here is a quick tip for looking at what policy-maps are assigned to interfaces.

Show run | i interface|service-policy

Here is a question for my readers.

I want to shape to an average rate of 512k. The Tc shoud be 10 ms. What command and parameters do I need to use? Post in comments.

Categories: CCIE, QoS Tags: ,

Quality of Service – notes

December 13, 2010 Leave a comment

Hardware queue

  • Packets can be sent trough hardware queue without interrupting CPU
  • Always uses FIFO logic
  • Cannot be affected by IOS queuing tools

Class Based Weighted Fair Queing

  • Every class (queue) gets a defined percentage/amount of bandwidth
  • If a class does not used all its bandwidth this is distributed across the other classes

Max reserved bandwidth

Is by default 75%. Can be set by user. If interface has 1 Mbit, 750 kbit will be available
and 250 kbit reserved. Bandwidth can be reserved in percentages with bandwidth percent
and/or bandwidth remaining percent.

LLQ

Low Latency Queuing, the low latency queue is a priority queue and the packets in this queue get sent first (usually voice). The LLQ has a bult in policer so the guaranteed amount of bandwidth for the queue is also the maximum amount of bandwidth.  QoS as always is only active when there is congestion. If there is no congestion the LLQ can use any available bandwidth just as any other queue.

Queuing

Queuing only occurs when there is congestion. IOS considers congestion when the TX ring is full which might occur before line rate of the interface.

Tail drop

Occurs when the queue is full and has no more room for packets. The packets that come in last (tail) are dropped. Most sessions are TCP which means when packets get dropped rate will lower. Performance can be improved by dropping random packets. This can be done by using WRED.

Weighted Random Early Detection

When traffic is below minimum threshold no packets are dropped. When traffic is between minimum threshold and maximum threshold packets are dropped at a linear growing rate. When the maximum threshould has been been reached full drop occurs. The Mark Probability Denominator (MPD) decides how many packets will be dropped. If set to ten every tenth packet will be dropped.

Modified Deficit Round Robin

This queuing mode serves packets in a round robin way. It does have support for a priority queue and the queue can be served in strict mode or in alternative mode. If using strict mode there is a risk for starvation of other queues. If alternate mode is used the priority queue is served in between other queues which means no starvation but more jitter and latency for the prioritized packets. Uses a Quantum Value (QV) to decide how many bytes to send for each queue every cycle. If too many packets have been taken one round this is a deficit and fewer bytes will be sent the next round, this gives every queue a certain amound of bandwidth which over time will be accurate.

Catalyst 3560 queuing

Has support for both ingress and egress queueing, two ingress queues are supported of which one can be configured as a priority queue. Uses Shared Round Robin (SSR) to schedule the packets being sent. Bandwidth for each queue is guaranteed but not limited, if other queues are empty that bandwidth may be used.

Default values

Queue two is priority queue
Gets 10 percent of bandwidth
CoS 5 traffic gets placed into queue two

Egress queuing

Can use shared or shaped round robin, shared can use excess bandwidth when queues are not full but shaped only uses the configured amount of bandwidth.

RSVP

Resource Reservation Protocol (RSVP) is a protocol that reserves bandwidth through the entire path that the packets take. The path is unidirectional. Uses PATH messages to setup the path and RESV messages to reserve the bandwidth needed.

Shaping

Interfaces can only send at line rate. To send traffic “slower” traffic is sent during shorter periods of time. To half the bandwidth, traffic can be sent only half of the time. Cisco uses time interval (Tc) to define the time period. Every Tc an amount of commited burst (Bc) can be sent. Excess burst (Be) is the number of bits that can be sent in excess of Bc.

TC = BC/shaping rate

Frame relay

Traffic shaping adaption lowers shaping rate when there is congestion until it reaches the Minimum Information Rate (MIR) or the mincir. The shaper notices congestion if it receives a frame with BECN set or a Cisco ForeSight message. Every time a BECN or ForeSight message is received the shaper slows down the rate by 25%. If no messages have been received for 16 consecutive Tc the shaper starts increasing rate again. The shaping rate grows by 1/16 each Tc.

GTS

Generic Traffic Shaping is used on the interface. Shapes all traffic leaving the interface by default, can be modified with an access-list. GTS can also be used to do adaptive shaping.

Class-Based shaping

Can only shape on egress traffic. Configured with MQC.

Shape average vs shape peak

Shape average fills token bucket with Bc bits every Tc, shape peak fills the bucket with Bc+Be
tokens every Tc which means that it can burst in every Tc.

Shape peak

Shaping rate = configured_rate (1+Bc/Be)

FRTS

Frame Relay Traffic Shaping (FRTS) is only available for frame relay interfaces. Cannot classify traffic to shape a subset of traffic. FRTS can dynamically learn the Bc, Be and CIR by using Enhanced Local Management Interface (ELMI).

Policing

Policing can be done on ingress or egress. The policer meters the interface bandwidth. The difference between policing and shaping is that policing does not hold packets waiting for more tokens, it drops them or remarks them with a lower priority.

Single-rate two-color policing

Uses one bucket with Bc bits. Packets either conform to or exceed the configured rate. Does not use time intervals like shapers, replenishes tokens depending on when packets arrive in time.

(Current_packet_arrival_time – previous_packet_arrival_time) * Police_rate
————————————————————————————————
                                   8

Single-rate three-color
Has support for excess burst. Packets can either conform, exceed or violate the configured rate. Uses dual buckets, tokens that are over when the Bc bucket is filled goes into the Be bucket.

Two-rate three-color policer

Uses to policing rates, the lower one is the Commited Information Rate (CIR) and the higher
is Peak Information Rate (PIR). Packets that fall under the CIR conforms to the rate and packets that exceed the CIR but are below the PIR exceed. Packets that exceed the PIR are violating the policy. Tokens are filled into both buckets instead of Be bucket relying on spillage from the Bc bucket which means bursting is always available.

Categories: CCIE, Notes, QoS Tags: , ,