Archive

Posts Tagged ‘QoS’

QoS Terminology – Comparing Cisco to MEF and RFC Terminology

July 31, 2015 5 comments

Have you every thought that you knew a topic pretty well but then someone uses terminology that you aren’t used to? People that use Cisco a lot or live outside the MEF world use another terminology than people that are working on MEF certified networks. Even if we both know the concepts, if we don’t speak a common language it will be difficult to communicate and to the the right end result.

When I took the CCDE written at Cisco Live, some of the QoS related material felt a bit off to me. I feel quite confident with QoS so this took me by surprise. My theory is that some of the material was written by someone coming from another background and uses some wording that just felt a bit off to me. I thought that I would read through some of the MEF material to broaden my QoS horizon and see what other terms are being used. At the very least I will have learned something new.

If we start with the basics, we have flows in our networks and these flows have different needs regarding delay, jitter and packet loss. I will write different terms and I will indicate which belong to MEF terminology, the other terms will be related to what Cisco calls them or what they would be called in general outside of the MEF world.

Delay

Latency

Round Trip Time (RTT)

Frame Delay (MEF)

These all relate to how much delay that is acceptable in the network. It may be one-way or two-way requirements depending on the nature of the traffic. RTT always refers to the two-way delay.

Jitter

Frame Delay Variation (MEF)

The MEF term is actually a bit clearer here as jitter is the variation of delay.

Packet Loss

Frame Loss Ratio (MEF)

Once again, MEF term is a bit clearer because we are interested to see the packet loss as a ratio, such as 1/100 packets which we then use as a percentage for what is acceptable loss on a circuit.

Commited Burst (Bc)

Commited Burst Size (CBS)(MEF)

The Bc or CBS value is used to define how much traffic in bits or bytes can be sent during each time interval. Picking a too low value can lead to customer dropping a lot of packets and picking a too high value can lead to long time intervals which could affect high priority traffic. The formula Tc = Bc / CIR can be used for calculations.

Burst Excess (Be)

Excess Burst Size (EBS)(MEF)

Be or EBS is normally used to provide the customer a more “fair” use of a circuit by allowing them to send unused credits from one or more previous time intervals. This means that they can burst momentarily until they have used up the Bc + Be credits.

Committed Information Rate (CIR)

This is the rate that is guaranteed to the customer in the contract. The physical line rate could be 100 Mbit/s but the CIR is 50 Mbit/s. It should be noted that this is an average rate and that traffic is always sent at line rate which produces bursts of traffic. This means that the customer will for short periods of time send at above the CIR rate but on average they get CIR rate on the circuit.

Excess Information Rate (EIR)(MEF)

A provider/carrier may allow a customer to send at above CIR rate but only those packets that are within the CIR are guaranteed the performance characteristics as defined in the SLA. This is commonly implemented with a single rate Three Color Marker (srTCM) where packets that are within the CIR/CBS are marked as green, packets above CIR but within EIR/EBS are marked as yellow and packets that exceed the EIR/CBS are marked as red. Green packets are guaranteed performance as defined in the SLA, yellow packets get delivered according to best effort and red packets are dropped.

This illustration shows the concept of srTCM:

srTCM

Peak Information Rate (PIR)

As noted by Faisal in the comments. PIR is not the same as EIR. PIR is actually CIR + EIR which means that we have two token buckets filling at the same time and incoming packets are checked against both to see if it matches CIR rate or EIR rate which will then set the color of the packet to be green or yellow. One example could be where customer has CIR of 10 Mbit/s and EIR of 10 Mbit/s which gives a combined rate (PIR) of 20 Mbit/s. The first 10 Mbit/s is guaranteed and the other 10 Mbit/s is sent through the provider network as long as there is capacity available.

This is a short post on different QoS terminology. Which terminology are you most used to?

Categories: QoS Tags: , ,

QoS Design Notes for CCDE

January 17, 2015 14 comments

Trying to get my CCDE studies going again. I’ve finished the End to End QoS Design book (relevant parts) and here are my notes on QoS design.

Basic QoS

Different applications require different treatment, the most important parameters are:

  • Delay: The time it takes from the sending endpoint to reach the receiving endpoint
  • Jitter: The variation in end to end delay between sequential packets
  • Packet loss: The number of packets sent compared to the number of received as a percentage

Characteristics of voice traffic:

  • Smooth
  • Benign
  • Drop sensitive
  • Delay sensitive
  • UDP priority

One-way requirements for voice:

  • Latency ≤ 150 ms
  • Jitter ≤ 30 ms
  • Loss ≤ 1%
  • Bandwidth (30-128Kbps)

Characteristics for video traffic:

  • Bursty
  • Greedy
  • Drop sensitive
  • Delay sensitive
  • UDP priority

One-way requirements for video:

  • Latency ≤ 200-400 ms
  • Jitter ≤ 30-50 ms
  • Loss ≤ 0.1-1%
  • Bandwidth (384Kbps-20+ Mbps)

Characteristics for data traffic:

  • Smooth/bursty
  • Benign/greedy
  • Drop insensitive
  • Delay insensitive
  • TCP retransmits

Quality of Service (QoS) – Managed unfairness, measured numerically in latency, jitter and packetloss

Quality of Experience (QoE) – End user perception of network performance, subjective and can’t be measured

Tools

Classification and marking tools: Session, or flows, are analyzed to determine what class the packets belong to and what treatment they should receive. Packets are marked so that analysis happens a limited number of times, usually at ingress as close to the source as possible. Reclassification and remarking is common as the packets traverse the network.

Policing, shaping and markdown tools: Different classes of traffic are alotted portions of the network resources. Traffic may be selectively dropped, delayed or remarked to avoid congestion when it exceeds the available network resources. Traffic can be dropped (policing), slowed down (shaped) or remarked (markdown) to conform.

Congestion management or scheduling tools: When there is more traffic than available network resources it will be queued. For traffic classes that don’t react well to queueing they can be denied access by a scheduling tool to avoid lowering quality of the existing flows.

Link-specific tools: Link fragmentation and interleaving fits into this category.

Packet Header

IPv4 packet has 8-bit Type of Service (ToS) field, IPv6 packet has 8-bit Traffic Class field. The first three bits are IP Precedence (IPP) bits for a total of 8 classes. The first three bits in combination with the nex three is known as DSCP for a total of 64 classes.

At layer two the most common marking is 802.1p Class of Service (CoS) or MPLS EXP bits, each using three bits for a total of 8 classes.

QoS Deployment Principles

  1. Define business/organizational objectives of QoS deployment. This may including provisioning real-time services for voice/video traffic or guaranteeing bandwidth for critical business applications and also managing scavenger traffic. Seek executive endorsement of the business objectives to not derail the process later on.
  2. Based on the business objectives, determine how many classes of traffic is needed. Define an end-to-end strategy how to identify the traffic and treat it across the network.
  3. Analyze the requirements of each application class so that the proper QoS tools can be deployed to meet these requirements.
  4. Design platform-specific QoS policies to meet the requirements with consideration for appropriate Place In the Network (PIN).
  5. Test the QoS designs in a controlled environment.
  6. Begin deployment with a closely monitored and evaluated pilot rollout.
  7. The tested and pilot proven QoS designs can be deployed to the production network in phases during scheduled downtime.
  8. Monitor service levels to make sure that the QoS objectives are being met.

The common mistake is to make it a technical process only and not research the business objectives and requirements.

QoS Feature Sequencing

Classification: The identification of each traffic stream.

Pre-queuing: Admission decisions, and dropping and marking the packet, are best applied before the packet enters a queue for egress scheduling and transmission.

Queueing: Scheduling the order of packets before transmission.

Post-queueing: Usually optional, sometimes needed to apply actions that are dependent on the transmission order of packets, such as sequence numbering(e.g. compression and encryption), which isn’t known until the QoS scheduling function dequeues the packets based on the priority rules.

Security and QoS

Trust Boundaries

A trust boundary is a network location where packet markings are not accepted and may be rewritten. Trust domains are network locations where packet markings are accepted and acted on.

Network Attacks

QoS tools can mitigate the effects of worms and DoS attacks to keep critical applications available during an attack.

Recommendations and Guidelines

  • Classify and mark traffic as close to the source as technically and administratively feasible
  • Classification and marking can be done on ingress or egress but queuing and shaping are usually done on egress
  • Use an end-to-end Diffserv PHB model for packet marking
  • Less granular fields such as CoS and MPLS EXP should be mapped to DSCP as close to the traffic source as possible
  • Set a trust boundary and mark or remark traffic that comes in beyond the boundary
  • Follow standards based Diffserv PHB markings if possible to ensure interopability with SP networks, enterprise networks or merging networks together
  • Set dscp and set precedence should be used to mark all IP traffic, set ip dscp and set ip precedence only marks IPv4 packets
  • When using tunnel interfaces, think of feature sequencing to make sure that the inner or outer packet headers (or both) are marked as intended

Policing and Shaping Tools

Policer: Checks for traffic violations against a configured rate. Does not delay packets, takes immediate action to drop or remark packet if exceeding rate.

Shaper: Traffic smoothing tool with the objective to buffer packets instead of dropping them, smoothing out any peaks of traffic arrival to not exceed configured rate.

Characteristics of a policer:

  • Causes TCP resends when traffic is dropped
  • Inflexible and inadaptable;makes instantaneous packet drop decisions
  • An ingress or egress interface tool
  • Does not add any delay or jitter to packets
  • Rate limiting without buffering

Characteristics of a shaper:

  • Typically delays rather than drops exceeding traffic, causes fewer TCP resends
  • Adapts to congestion by buffering exceeding traffic
  • Typically an egress interface tool
  • Adds delay and jitter if rate exceeds the shaper
  • Rate limiting with buffering

Placing Policers and Shapers in the Network

Policers make instantaneous decisions and should be deployed ingress, don’t transport packets if they are going to be dropped anyway. Policers can also be placed on egress to limit a traffic class at the edge of the network.

Shapers are often deployed as egress tools, commonly on enterprise to SP links to not exceed the commited rate of the SP.

Tail Drop and Random Drop

Tail drop means dropping the packet that is at the end of an queue. The TX ring is always FIFO, if a voice packet is trying to get into the TX ring but it’s full it will get dropped because it’s at the tail of the queue. Random drop via Random Early Detection (RED) or Weighted Random Early Detection (WRED) tries to keep the queues from becoming full by dropping packets from traffic classes to cause TCP slowing down.

Recommendations and Guidelines

  • Police as close to the source as possible, preferably on ingress.
  • Single rate three color policer handles bursts better than single rate two color policer resulting in fewer TCP retransmissions
  • Use a shaper on interfaces where speed mismatches, such as buying a lower rate than physical speed or between a remote-end access link and the aggregated head-end link
  • When shaping on an interface carrying real-time traffic, set the Tc value to 10 ms

Scheduling Algorithms

Strict priority: Lower priority queues are only served when higher priority queues are empty. Can potentially starve traffic in lower priority queues.

Round robin: Queues are served in a set sequence, does not starve traffic but can add unpredictable delays in real-time, delay sensitive traffic.

Weighted fair: Packets in the queue are weighted, usually by IP precedence so that some queues get served more often than others. Does not provide bandwidth guarantee, the bandwidth per flow varies based on number of flows and the weight of each flow.

WRED is a congestion avoidance tool and manages the tail of the queue. The goal is to avoid TCP synchronization where all TCP flows speed up and slow down at the same time, which leads to poor utilization of the link. WRED has little or no effect on UDP flows. WRED can be used to set the RFC 3168 IP ECN bits to indicated that it is experiencing congestion.

Recommendations and Guidelines

  • Critical applications like VoIP requires service guarantees regardless of network conditions. This requires to enable queueing on all nodes with a potential for congestion.
  • A large number of applications end up in the default class, reserve 25% for this default Best Effort class
  • For a link carrying a mix of voice, video and data traffic, limit the priority queue to 33% of the link bandwidth
  • Enable LLQ if real-time, latency sensitive traffic is present
  • Use WRED for congestion avoidance on TCP flows but evalute if it has any traffic on UDP flows
  • Use DSCP-based WRED wherever possible

Bandwidth Reservation Tools

Measurement based: Counting mechanism to only allow a limited number of calls (sessions). Normally statically configured by an administrator.

Resource based: Based on the availability of resources in the network, usually bandwidth. Uses the current status of the network to base its decision.

Resource Reservation Protocol (RSVP) is a resource based protocol, commonly used with MPLS-TE. The drawback of RSVP is that it requires a lot of state in the devices.

AC functionality is most effectively deployed at aplication level such as with Cisco Unified Communications Manager (CUCM). It works well in networks with limited complexity and where flows are of predictable bandwidth.

RSVP can be used in combination with Diffserv in an Intserv/Diffserv model where RSVP is only responsible for admission control and Diffserv for the queuing.

A RSVP proxy can be used because end devices such as phones and video endpoints usually don’t support the RSVP stack. A router closest to the endpoint is then used as a proxy together with CUCM to act as an AC mechanism.

Recommendations and Guidelines

Cisco recommends using RSVP Intserv/Diffserv model with a router-based proxy device. This allows for scaling of policies together with a dynamic network aware AC.

IPv6 and QoS

IPv6 headers are larger in size so bandwidth consumption for small packet sizes is higher. IPv4 header is normally 20 bytes but IPv6 is 40 bytes. IPv6 has a 20-bit Flow Label field and 8-bit Traffic Class field.

Medianet

Modern applications can be difficult to classify and can consists of multiple types of traffic. Webex provides text, audio, instant messaging, application sharing and desktop video conferencing through the same application. NBAR2 can be used to identify applications.

Application Visibility Control (AVC)

Consists of NBAR2, Flexible Netflow (FNF) and MQC. NBAR2 is used to identify traffic through Deep Packet Inspection (DPI), FNF reports on usage and MQC is used for the configuration.

FNF uses Netflow v9 and IPFIX to export flow record information. It can monitor L2 to L7 and identify apps by port and through NBAR2. When using NBAR2, CPU usage may increase significantly as well as memory usage. This is also true for FNF. Consider the performance impact before deploying it.

QoS Requirements and Recommendations by Application Class

Voice requirements:

  • One-way latency should be no more than 150 ms
  • One-way peak-to-peak jitter should be no more than 30 ms
  • Per-hop peak-to-peak jitter should be no more than 10 ms
  • Packet loss should be no more than 1%
  • A range of 20 – 320 Kbps of guaranteed priority bandwidth per call (depends on sampling rate, codec and L2 overhead)

Voice recommendations:

  • Mark to Expedited Forwarding (EF) / DSCP 46
  • Treat with EF PHB (priority queuing)
  • Voice should be admission controlled

May use jitter buffers to reduce the effects of jitter, however it does add delay. Voice packets are constant in size which means bandwidth can be provisioned accurately. Don’t forget to account for L2 overhead.

Broadcast video requirements:

  • Packet loss should be no more than 0.1%

Broadcast video recommendations:

  • Mark to CS5 / DSCP 40
  • May be treated with EF PHB (priority queuing)
  • Should be admission controlled

Flows are usually unidirectional and include application level buffering. Does not have strict jitter or latency requirements.

Real-time interactive video requirements:

  • One-way latency should be no more than 200 ms
  • One-way peak-to-peak jitter should be no more than 50 ms
  • Per-hop peak-to-peak jitter should be no more than 10 ms
  • Packet loss should be no more than 0.1%
  • Provisioned bandwidth depends on codec, resolution, frame rates, additional data components and network overhead

Real-time interactive video recommendations:

  • Should be marked with CS4 / DSCP 32
  • May be treated with an EF PHB (priority queuing)
  • Should be admission controlled

Multimedia conferencing requirements:

  • One-way latency should be no more than 200 ms
  • Packet loss should be no more than 1%

Multimedia conferencing recommendations:

  • Mark to AF4 class (AF41/AF42/AF43 or DSCP 34/36/38)
  • Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED
  • Should be admission controlled

Multimedia streaming requirements:

  • One-way latency should be no more than 400 ms
  • Packet loss should be no more than 1%

Multimedia streaming recommendations:

  • Should be marked to AF3 class (AF31/AF32/AF33 or DSCP 26/28/30)
  • Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED
  • May be admission controlled

Data applications can be divided into Transactional Data (low latency) or Bulk Data (high throughput)

Transactional data recommendations:

  • Should be marked to AF2 class (AF21/AF22/AF23 or DSCP 18/20/22)
  • Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED

This class may be subject to policing and remarking. Applications in this class can be Enterprise Resource Planning (ERP) or Customer Relationship Management (CRM).

Bulk data recommendations:

  • Should be marked to AF1 class (AF11/AF12/AF13 or DSCP 10/12/14)
  • Treat with AF PHB with guaranteed bandwidth and DSCP-based WRED
  • Deployed in moderately provisioned queue to provide a degree of bandwidth constraint during congestion, to prevent long TCP session from dominating network bandwidth

Example applications are e-mail, backup operations, FTP/SFTP transfers, video and content distribution.

Best effort data recommendations:

  • Mark to DF (DSCP 0)
  • Provision in dedicated queue
  • May be provisioned with guaranteed bandwidth allocation and WRED/RED

Scavenger traffic recommendations:

  • Should be marked to CS1 (DSCP 8)
  • Should be assigned a minimally provisioned queue

Example traffic is Youtube, Xbox Live/360 movies, iTunes, Bittorrent.

Control plane traffic can be divided into Network Control, Signaling and Operations/Administration/Management (OAM).

Network Control recommendations:

  • Should be marked to CS6 (DSCP 48)
  • May be assigned a moderately provisioned guaranteed bandwidth queue

Do not enable WRED. Example traffic is EIGRP, OSPF, BGP, HSRP and IKE.

Signaling traffic recommendations:

  • Should be marked to CS3 (DSCP 24)
  • May be assigned a moderately provisioned guaranteed bandwidth queue

Do not enable WRED. Example traffic is SCCP, SIP and H.323.

OAM traffic recommendations:

  • Should be marked to CS2 (DSCP 16)
  • May be assigned a moderately provisioned guaranteed bandwidth queue

Do not enable WRED. Example traffic is SSH, SNMP, Syslog, HTTP/HTTPs.

QoS Design Recommendations:

  • Always enable QoS in hardware as opposed to software if possible
  • Classify and mark as close to the source as possible
  • Use DSCP markings where available
  • Follow standards based DSCP PHB markings
  • Police flows as close to source as possible
  • Mark down traffic according to standards based rules if possible
  • Enable queuing at every node that has potential for congestion
  • Limit LLQ to 33% of link capacity
  • Use AC mechanism for LLQ
  • Do not enable wred for LLQ
  • Provision at least 25% for Best Effort traffic

QoS Models:

Four-Class Model:

  • Voice
  • Control
  • Transactional Data
  • Best Effort

Eight-Class Model:

  • Voice
  • Multimedia-conferencing
  • Multimedia-streaming
  • Network Control
  • Signaling
  • Transactional Data
  • Best Effort
  • Scavenger

Twelve-Class Model:

  • Voice
  • Broadcast Video
  • Real-time interactive
  • Multimedia-conferencing
  • Multimedia-streaming
  • Network Control
  • Signaling
  • Management/OAM
  • Transactional Data
  • Bulk Data
  • Best Effort
  • Scavenger

This picture shows how different size models can be expanded or vice versa.

QoS models

Campus QoS Design Considerations and Recommendations:

The primary role of QoS is campus networks is not to control latency or jitter, but to manage packet loss. Endpoints normally connect to the campus at high speeds, it may only take a few milliseconds if congestion to overrun the buffers of switches/linecards/routers.

Trust Boundaries:

Conditionally trusted endpoints: Cisco IP phones, Cisco Telepresence, Cisco IP video surveillance cameras, Cisco digital media players.

Trusted endpoints: Centrally administered PCs and endpoints, IP video conferencing units, managed APs, gateways and other similar devices.

Untrusted endpoints: Unsecure PCs, printers and similar devices.

Port-Based QoS versus VLAN-based QoS versus Per-Port/Per-VLAN QoS

Design recommendations:

  • Use port-based QoS when simplicity and modularity are the key design drivers
  • Use VLAN-based QoS when looking to scale policies for classification, trust and marking
  • Do not use VLAN-based QoS to scale (aggregate) policing policies
  • Use per-port/per-VLAN when supported and policy granularity is the key design driver

EtherChannel QoS

  • Load balance based on source and destination IP or what is expected to give the best distribution of traffic
  • Be aware that multiple real-time flows may up on the same physical link and oversubscribing the real-time queue

EtherChannel QoS will vary by platform and some policies are applied to the bundle and some to the physical interface.

Ingress QoS Models:

Design recommendations:

  • Deploy ingress QoS models such as trust, classification and policing on all access edge ports
  • Deploy ingress queuing (if supported and required)

The probability for congestion on ingress is less than on egress.

Egress QoS Models:

Design recommendations:

  • Deploy egress queuing policies on all switch ports
  • Use a 1 priority queue and 3 normal queues or better queuing structure

Enable trust on ports leading to network infrastructure and similar devices.

Trusted Endpoint:

  • Trust DSCP
  • Optional ingress marking and/or poling
  • Minimum 1P3Q

Untrusted Endpoint:

  • No trust
  • Optional ingress marking and/or poling
  • Minimum 1P3Q

Conditionally Trusted Endpoint:

  • Conditional trust with trust CoS
  • Optional ingress marking and/or poling
  • Minimum 1P3Q

Switch to Switch/Router Port QoS:

  • Trust DSCP
  • Minimum 1P3Q

Control Plane Policing

Can be used to harden the network infrastructure. Packets handled by main CPU typically include the following:

  • Routing protocols
  • Packets destined to the local IP of the router
  • Packets from management protocols such as SNMP
  • Interactive access protocols such as Telnet and SSH
  • ICMP or packets with IP options may have to be handled by CPU
  • Layer two packets such as BPDUs, CDP, DTP and so on

Wireless QoS

802.11e Working Group (WG) proposed QoS enhancements to the 802.11 standard in 2007. This was also revised in IEEE 802.11-2012. Wi-Fi Alliance has a compatibility standard called Wireless Multimedia (WMM).

In Wi-Fi networks only one station may transmit at a time, physical constraints that are not in place on wired networks. The Radio Frequency (RF) is shared between devices. This is similar to a hub environment. Wireless networks operate at variable speeds.

Distributed Coordination Function (DCF) is responsible for scheduling and transmitting frames onto the wireless medium.

Wirless uses Carrier Sense Multiple Access/Collision Avoidance (CSMA/CA). It actively tries to avoid collisions. A wireless client has a random period where it may send traffic to try to avoid collisions.

DCF evolved to Enhanced Distributed Channel Access (EDCA) which is a MAC layer protocol. It has the following additions compared to DCF:

  • Four priority queues, or access categories
  • Different interframe spacing for each AC as compared to a single fixed value for all traffic
  • Different contention window for each AC
  • Transmission Opportunity (TXOP)
  • Call admission control (TSpec)

802.11e Ethernet frame uses 3-bit field known as User Priority (UP) for traffic marking. It is analogous to 802.1p CoS. One difference is that voice is marked with UP 6 as compared to CoS 5.

Interframe spacing is a time the client needs to wait before starting to send traffic, the wait time is lower for higher priority traffic.

The contention window is used when the wireless media is not free, higher priority traffic waits a shorter period of time before trying to send again than lower priority data.

TXOP is a period of time when the client is allowed to send to not make it hog up the media for a long period of time.

TSpec is used for admission control, the client sends it requirements such as data rate, frame size to the AP and the AP only admits it if there is available bandwidth.

Upstream QoS is packets from the wireless network onto the wired network. Downstream QoS is packets from the wired network onto the wireless network.

Wireless marking may not be consistent with wired markings so mapping may have to be done to map traffic into the correct classes on the wired network.

Upstream QoS:

  1. 802.11e UP marking on upstream frame from client to AP is translated to a DSCP valued on the outside of the CAPWAP tunnel. The inner DSCP marking is preserved
  2. After the CAPWAP packet is decapsulated at the WLC, the original IP headers DSCP value is used to derive the 802.1p CoS value

Downstream QoS:

  1. A frame with 802.1p CoS marking arrives a WLC wired interface. DSCP value of the IP packet is used to set the DSCP of the outer CAPWAP header.
  2. The DSCP value of the CAPWAP header is used to set the 802.11e UP value on the wireless frame

The 802.1p CoS value is not used in the above process.

Data Center QoS

Primary goal is to manage packet loss. A few milliseconds of traffic during congestion can cause buffer overruns.

Various data center designs have different QoS needs. These are a few data center architectures:

  • High-Performance Trading (HPT)
  • Big data architectures, including High-Performance Computing (HPC), High-Throughput Computing (HTC) and grid data
  • Virtualized Multiservice Data Center (VMDC)
  • Secure Multitenant Data Center (SMDC)
  • Massively Scalable Data center (MSDC)

High-Performance Trading:

Minimal or no QoS requirements because the goal of the architecture is to introduce as little delay as possible using low latency platforms such as the Nexus.

Big Data (HPC/HTC/Grid) Architectures

Have similar QoS needs as a campus network. The goal is to process large and complex data sets that are too difficult to handle by traditional data processing applications.

High-Performance Computing: Uses large amounts of computing power for a short period of time. Often measured in Floating-point Operations Per Second (FLOPS)

High-Throughput Computing: Also uses large amounts of computing power but for a larger period of time. More focused on operations per month or year.

Grid: A federation of computer resources from multiple locations to reach a common goal. A distributed system with noninteractive workloads that involve a large number of files. Compared to HPC, Grid is usually more heterogenous, loosely coupled and geographically dispersed.

Virtualized Multiservice Data Center (VMDC):

VMDC comes with unique requirements due to compute and storage virtualization, including provisioning a lossless Ethernet service.

  • Applications no longer map to physical servers (or cluster of servers)
  • Storage is no longer tied to a physical disk (or array)
  • Network infrastructure is no longer tied to hardware

Lossless compute and storage virtualization protocols such as RoCE and FCoE need to be supported as well as Live Migration/vMotion.

Secure Multitenant Data Center (SMDC):

Virtualization is leveraged to support multitenants over a common infrastructure and this affects the QoS design. SMDC has similar needs as VMDC but a different marking model.

Massively Scalable Data Center:

A framework used to build elastic data centers that host a few applications that are distributed across thousands of servers. Geographically distributed homogenous pools of compute and storage. The goal is to maximize throughput. Common to use a leaf and spine design.

Data Center Bridging Toolset

IEEE 802.1 Data Center Bridging Task Group has defined enhancements to Ethernet to support requirements of converged data center networks.

  • Priority flow control (IEEE 802.1Qbb)
  • Enhanced transmission selection (IEEE 802.1Qaz)
  • Congestion notification (IEEE 802.1Qau)
  • DCB exchange (DCBX) (IEEE 802.1Qaz combined with 802.1AB)

Priority Flow Control (802.1Qbb): PFC provides link level flow control mechanism that can be controlled independently for each 802.1p CoS priority. The goal is to provide zero frame loss due to congestion in DCB networks and mitigating Head of Line (HoL) blocking. Uses PAUSE frames.

Skid Buffers

Buffer management is critical to PFC, if transmit or receive buffers are overflowed, transmission will not be lossless. A switch needs sufficient buffers to:

  • Store frames sent during the time it takes to send the PAUSE frame across the network between stations
  • Store frames that are already in transit when the sender receives the PFC PAUSE frame

The buffers used for this are called skid buffers and usually engineered on a per port basis in hardware on ingress.

An incast flow is a flow from many senders to one receiver.

Virtual Output Queuing (VOQ)

Artifically induce congestion on ingress ports where there is an incast flow going to a host. This lessens the need for deep buffers on egress. VOQ consumes congestion at every ingress port and optimizes switch buffering capacity for incast flows. It does not consume fabric bandwidth only to be dropped on the egress port.

Enhanced Transmission Selection – IEEE 802.1Qaz

Uses a virtual lane concept on a DCB enabled NIC, also called Converged Network Adaptor (CNA). Each virtual interface queue is accountable for managing its alloted bandwidth for its traffic group. If a group is not using all its bandwidth it may be used by other groups.

ETS virtual interface queues can be serviced as follows:

  • Priority – a virtual lane can be assigned a strict priority service
  • Guaranteed bandwidth – a percentage of the physical link capacity
  • Best effort – the default virtual lane service

Congestion Notification IEEE 802.1Qau

Layer two traffic management system that pushes congestion to the edge of the network by instructing rate limiters to shape the traffic that is causing congestion. The congestion point such as a distribution switch connecting to several access switches can instruct these switches called reaction points to throttle the traffic by sending control frames.

Data Center Bridging Exchange (DCBX) IEEE 802.1Qaz + 802.1AB

DCB capabilities:

  • DCB peer discovery
  • Mismatched configuration detection
  • DCB link configuration of peers

The following DCB parameters can be exchanged by DCBX:

  • PFC
  • ETS
  • Congestion notification
  • Applications
  • Logical link-down
  • Network interface virtualization

DCBX can be used between switches and with some endpoints.

Data Center Transmission Control Protocol (DCTCP)

A goal of the data center is to maximize the goodput which is the application level throughput excluding protocol overhead. Goodput is reduced by TCP flow control and congestion avoidance, specifically TCP slow start.

DCTCP is based on two key concepts:

  • React in proportion to the extent of congestion, not its presence – this reduces variance in sending rates
  • Mark ECN base on instantaneous queue length – this enables fast feedback and corresponding window adjustments to better deal with bursts

Considerations affecting the marking model to be used in the data center include the following:

  • Data center applications and protocols
  • CoS/DSCP marking
  • CoS 3 overlapping considerations
  • Application-based marking models
  • Application- and tenant-based marking modelse

Data Center Applications and Protocols

Recommendations:

  • Consider what applications/protocols are present in the data center and may not already be reflected in the enterprise QoS model and how these may be integrated
  • Consider what applications/protocols may not be present or have a significantly reduced presence in the DC

Compute Virtualization Protocols:

Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE):
Supports direct memory access of one computer into another over converged Ethernet without involving either one’s operating system. Permits high-throughput low-latency networking, especially useful in massively parallel computer clusters. It’s a link layer protocol that allows communcation between any two hosts in the same broadcast domain. RoCE requires lossless service via PFC. When implemented along with FCoE, it should be assigned its own no-drop class/virtual lane, such as CoS 4. Other applications such as video using CoS 4 need to be reassigned to improve RoCE performance.

Internet Wide Area RDMA Protocol (iWARP):
Extends the reach of RDMA over IP networks. Does not require lossless service because it runs over TCP or STCP which uses reliable transport. It can be marked to unused CoS/DSCP or combined with internetwork control (CS6/CoS 6) or network control (CS7/CoS 7).

Virtual machine control and live migration protocols (VM control):
Virtual Machines (VMs) require control traffic to be passed between hypervisors. VM control is control plane traffic and should be marked to CoS 6 or CoS 7, depending on QoS model in use.

Live migration:
Protocols that support the process of moving a running VM (or application) between different phsycical machines without disconnecting the client or the application. Memory, storage and network connection are moved from original host matchine to the destination. A common example being vMotion. Can be argued to be a candidate for internetwork control (CoS 6) due to being a control plane protocol but sends too much traffic to be put in that class. Use an available marking or combine with CoS 4, CoS 2 or even CoS 1.

Storage Virtualization Protocols:

Fibre Channel over Ethernet (FCoE):
Encapsulates Fibre Channel (FC) frames over Ethernet networks, requires lossless service and is a layer two protocol that can’t be natively routed. Requires lossless service via PFC and usually marked with CoS 3 which should be dedicated for FCoE.

Internet Protocol Small Computer System Interface (iSCSI):
Encapsulates SCSI commands within IP to enable data transfers. Can be used to transmit data over LANS, WANS or even the Internet and can enable location independent data storage and retrieval. Does not require lossless service due to using TCP. Can be provisioned in dedicated class or in another class such as CoS 2 or CoS 1.

CoS/DSCP Marking:

Recommendations:

  • Some layer two protocols within the DC require CoS marking
  • CoS marking has limitations so consider a hybrid CoS/DSCP model (when supported)

CoS 3 Overlap Considerations and Tactical Options:

Recommendations:

  • Recognize the potential overlap of signaling (and multimedia streaming) markings with FCoE
  • Select a tactical option to address this overlap

Signaling is normally marked with CoS 3 but so is also FCoE. Some administrators prefer to dedicate CoS 3 to FCoE but that leaves the question what to do with signaling. Options to handle the overlap:

Hardware Isolation:
Some platforms and interface modules do not support FCoE such as Nexus 7k M-Series module but F-series do. M-series module can connect to CUCM and multimedia streaming servers and F-series modules to DCB extended fabric supporting FCoE.

Layer 2 Versus Layer 3 Classification:
Signaling and multimedia streaming can be classified by DSCP values (CS3 and AF3) to be assigned to queues and FCoE can be classified by CoS 3 to its own dedicated queue.

Asymmetrical CoS/DSCP Marking:
Asymmetrical meaning the that the three bits forming the CoS do not match the first three bits of the DSCP value. Signaling could be marked with CoS 4 but DSCP CS3.

DC/Campus DSCP Mutation:
Perform ingress and egress DSCP mutation on data center to campus links. Signaling and multimedia streams can be assigned DSCP values that map to CoS 4 (rather than CoS 3).

Coexistence:
Allow signaling and FCoE to coexist in CoS 3. The reasoning being that if the CUCM server has CNA then both signaling and FCoE will be provided a lossless service.

Data Center QoS Models:

Trusted Server Model:
Trust L2/L3 markings sent on application servers. Only approved servers should be deployed in the DC.

Untrusted Server Model:
Do not trust markings, reset markings to 0.

Single-Application Server Model:
Same as the untrusted server model but remarked to a non zero value.

Multi-Application Server Model:
Access-lists are used for classification and traffic is marked to multiple codepoints. Application server does not mark traffic at all or it marks it to different values than the enterprise QoS model.

Server Policing Model:
One or more application classes are metered via one-rate or two-rate policers, with conforming, exceeding and optionally violating traffic marked to different DSCP values.

Lossless Transport Model:
Provision lossless service to FCoE.

Trusted Server/Network Interconnect:

  • Trust CoS/DSCP
  • Ingress queuing
  • Egress queuing

Untrusted Server:

  • Set CoS/DSCP to 0
  • Ingress queuing
  • Egress queuing

Single-App Server:

  • Set CoS/DSCP to non zero value
  • Ingress queuing
  • Egress queuing

Multi-App Server:

  • Classify by ACL
  • Set CoS/DSCP values
  • Ingress queuing
  • Egress queuing

Policed Server:

  • Police flows
  • Remark/drop
  • Ingress queuing
  • Egress queuing

Lossless Transport:

  • Enable PFC
  • Enable ETS
  • Enable DCBX
  • Ingress queuing
  • Egress queuing

WAN & Branch QoS Design Considerations & Recommendations:

  • To manage packet loss (and jitter) by queuing policies
  • To enhance classification granularity by leveraging deep packet inspection engines

Packet jitter is most apparent at WAN/branch edge because of downshift in link speeds.

Latency and Jitter:
Recommendations:

  • Choose service provider paths to target 150 ms for one-way latency. If this target can’t be met, 200 ms is generally acceptable
  • Only queuing delay is managable by QoS policies

Network latency consists of:

  • Serialization delay (fixed)
  • Propagation delay (fixed)
  • Queuing delay (variable)

Serialization delay is the time it takes to convert a layer two frame into electrical or optical pulses onto the transmission media. The delay is fixed and a function of the line rate.

Propagation delay is also fixed and a function of the physical distance between endpoints. The gating factor is speed of light at 300 000km/s in vacuum but speed in fiber circuits is around a third of that. Propagation delay is then approximately 6.3 microseconds per km. Propagation delay is what makes up most of the network delay.

Queuing delay is variable and a function of whether a node is congested or not and if scheduling policies have been applied to resolve congestion events.

Tx-Ring:
Recommendation:

  • Be aware of the Tx-Ring function and depth;tune only if necessary

The Tx-Ring is the final IOS output buffer for an interface, it’s a relatively small FIFO queue that maximizes physical link bandwidth utilization by matching the outbound packet rate on the router with the physical interface rate. If the size of the Tx-Ring is too large, packets will be subject to latency and jitter while waiting to be served. If the Tx-Ring is too small the CPU will be continually interrupted, causing higher CPU usage.

LLQ:
Recommendations:

  • Use a dual-LLQ design when deploying voice and real-time video applications
  • Limit sum of all LLQs to 33% of bandwidth
  • Tune the burst parameter if needed

Some applications like Telepresence may be bursty by nature, the burst value may have to be adjusted to account for this.

WRED:
Recommendations:

  • Optionally tune WRED thresholds as required
  • Optionally enable ECN

To match behavior of AF PHB defined in RFC 2597 use these values:

  • Set minimum WRED threshold for AFx3 to 60% of queue depth
  • Set minimum WRED threshold for AFx2 to 70% of queue depth
  • Set minimum WRED threshold for AFx1 to 80% of queue depth
  • Set all WRED maximum thresholds to 100%

RSVP
Recommendations:

  • Enable RSVP for dynamic network-aware admission control requirements
  • Use the Intserv/Diffserv RSVP model to increase efficiency and scalability
  • Use application-identification RSVP policies for greater policy granularity

Ingress QoS Models
Recommendations:

  • DSCP is trusted by default in IOS
  • Enable ingress classification with NBAR2 on LAN edges, as required
  • Enable ingress/internal queuing, if required

Egress QoS Models
Recommendations:

  • Deploy egress queuing policies on all WAN edge interfaces
  • Egress queuing policies may not be required on LAN edge interfaces

Recommendation for queues:
LLQ:

  • Limit the sum of all LLQs to 33%
  • Use an admission control mechanism
  • Do not enable WRED

Multimedia/Data:

  • Provision guaranteed bandwidth according to application requirements
  • Enable fair-queuing presorters
  • Enable DSCP-based WRED

Control:

  • Provision guaranteed bandwidth according to control traffic requirements
  • Do not enable presorters
  • Do not enable WRED

Scavenger:

  • Provision with a minimum bandwidth allocation such as 1%
  • Do not enable presorters
  • Do not enable WRED

Default/Best effort:

  • Allocate at least 25% for the default/Best effort queue
  • Enable fair-queuing pre-sorters
  • Enable WRED

WAN and Branch Interface QoS Roles:

WAN aggregator LAN edge:

  • Ingress DSCP trust should be enabled
  • Ingress NBAR2 classification and marking policies may be applied
  • Ingress Medianet metadata classification and marking policies may be applied
  • Egress LLQ/CBWFQ/WRED policies may be applied (if required)

WAN aggregator WAN edge:

  • Ingress DSCP trust should be enabled
  • Egress LLQ/CBWFQ/WRED policies should be applied
  • RSVP policies may be applied
  • Additional VPN specific policies may be applied

Branch WAN edge:

  • Ingress DSCP trust should be enabled
  • Egress LLQ/CBWFQ/WRED policies should be applied
  • RSVP policies may be applied
  • Additional VPN specific policies may be applied

Branch LAN edge:

  • Ingress DSCP trust should be enabled
  • Ingress NBAR2 classification and marking policies may be applied
  • Ingress Medianet metadata classification and marking policies may be applied
  • Egress LLQ/CBWFQ/WRED policies may be applied (if required)

MPLS VPN QoS Design Considerations & Recommendations
The role of QoS over MPLS VPNs may include the following:

  • Shaping traffic to contracted service rates
  • Performing hierarchical queuing and dropping within these shaped rates
  • Mapping enterprise classes to the service provider classes
  • Policing traffic according to contracted rates
  • Restoring packet markings

MEF Ethernet Connectivity Services

E-line:
A service connecting two customer Ethernet ports over a WAN. It is based on point-to-point Ethernet Virtual Connection (EVC)

Ethernet Private Line(EPL):
A basic point-to-point service characterized by low frame delay, frame delay variation and frame loss ratio. Service multiplexing is not allowed. No CoS bandwidth profiling is allowed, only a Committed Information Rate (CIR).

Ethernet Virtual Private Line(EVPL):
Multiplexing of EVCs is allowed. The individual EVCs can be defined with different bandwidth profiles and layer two control processing methods.

E-LAN:
A multipoint service connecting customer endpoints and acting as a bridged Ethernet network. It is based on multipoint EVC and service multiplexing is allowed. It can be configured with a CIR, Committed Burst Size (CBS) and Excess Information Rate (EIR).

E-Tree:
A point-to-multipoint version of the E-LAN, essentialy it’s a hub and spoke topology where the spokes can only communicate with the hub but not each other. Common for franchise operations.

Sub-Line-Rate Ethernet Design Implications
Recommendations:

  • Sub line rate may require hierarchical shaping with nested queuing policies
  • Configure the CE shaper’s Committed Burst (Bc) value to be no more than half of the SP’s policer Bc

If the Bc of the shaper is set too high, packets may be dropped by the policer even though the shaper is shaping to CIR of the service.

When using sub line rate there will be no congestion on the interface, congestion is artificially induced by using a shaper and then a nested policy for the queuing. This may be referred to as Hierarchical QoS (HQoS).

QoS Paradigm Shift
Recommendation:

  • Enterprises and service providers most cooperate to jointly administer QoS over MPLS VPNs

MPLS VPNs offer a full mesh of connectivity between campus and branch networks. This fully meshed connectivity has implications for the QoS design. Previously WANs were usually point-to-point or hub and spoke which made the QoS design simpler. Branch to branch traffic would pass through the hub which controlled the QoS.

When using MPLS VPNs traffic from branch to branch will not pass the hub meaning that QoS needs to be deployed on all the branches as well. However, this is not enough, contending traffic may not be coming from the same site, it could be coming from any site. To overcome this the service provider needs to deploy QoS policies that are compatible with the enterprise policies on the PE routers. This is a paradigm shift in QoS administration and requires the enterprise and SP to jointly administer the QoS policies.

Service Provider Class of Service Models
Recommendations:

  • Fully understand the CoS models of the SP
  • Select the model that most closely matches your strategic end-to-end model

MPLS DiffServ Tunneling Modes
Recommendations:

  • Understand the different MPLS Diffserv tunneling modes and how they affect customer DSCP markings
  • Short pipe mode offers enterprise customers the most transparency and control of their traffic classes

Uniform Mode
Recommendation:

  • If provider uses uniform mode, be aware that your packets DSCP values may be remarked

Uniform mode is generally used when the customer and SP share the same Diffserv domain, which would be the case for an enterprise deploying MPLS.

Uniform mode is the default mode. The first three bits of the IP ToS field are mapped to MPLS EXP bits on the ingress PE when it adds the label. If a policer or other mechanism remarks the MPLS EXP value this value is copied to lower level labels and at the egress PE the MPLS EXP value is used to set the IPP value.

Short Pipe Mode

It is used when customer and SP are in different Diffserv domains. This mode is useful when the SP wants to enfore its own Diffserv policy but the customer wants its Diffserv information to be preserved across the MPLS VPN.

The ingress PE sets the MPLS EXP value based on the SPs policies. Any remarking will only propagate to the MPLS EXP bits of labels but not to the IPP bits of the customers IP packet. On egress the queuing is based on the IPP marking of the customers packet, giving the customer maximum control.

Pipe Mode

Pipe mode is the same as short pipe mode except for that the queuing is based on MPLS EXP bits at the egress PE and not on the customers IPP marking.

Enterprise-to-Service Provider Mapping
Recommendation:

  • Map the enterprise application classes to the SP CoS classes as efficiently as possible

Enterprise to service provider mapping considerations include the following:

  • Mapping real-time voice and video traffic
  • Mapping signaling and control traffic
  • Separating TCP-based applications from UDP-based applications (where possible)
  • Remarking and restoring packet markings (where required)

Mapping Real-Time Voice and Video
Recommendation:

  • Balance the service level requirements for real-time voice and video with the SP premium for real-time bandwidth
  • In either scenario, use a dual LLQ policy at CE egress edge

SPs often only a single real-time CoS, if you are deploying both real-time voice and video you will have to make a choice to put the video in the real-time class or not. Putting both voice and video into the real-time class may be costly or even cost prohibitive. You should still use a dual LLQ at the CE edge since that is under your control and that way you can protect voice from video. Downgrading video to a non real-time class may only produce slightly lower quality which could be acceptable.

Mapping Control and Signaling Traffic
Recommendation:

  • Avoid mixing control plane traffic with data plane traffic in a single SP CoS

Signaling should be separated from data traffic if possible since the signaling could get dropped if the class is oversubscribed and thus producing voice/video instability. If the SP does not offer enough classes to put signaling in its own, consider putting it in the real-time class since these flows are lightweight, but critical.

Separating TCP from UDP
Recommendation:

  • Separate TCP traffic from UDP traffic when mapping to SP CoS classes

It is generally best to not mix TCP-based traffic with UDP-based traffic (especially if the UDP traffic is streaming video such as broadcast video) within a single SP CoS. These protocols behave differently under congestion. Some UDP applications may have application-level windowing, flow control and retransmission capabilities but most UDP transmitters are oblivious to drops and don’t lower transmission rates due to dropping.

When TCP and UDP share a SP CoS and that class experiences congestion, the TCP flows continually lower their transmission rates, potentially giving up their bandwidth to UDP flows that are oblivious to drops. This is called TCP starvation/UDP dominance.

Even if enabling WRED the same behavior would be seen because WRED (primarily) manages congestion only on TCP-based flows.

Re-Marking and Restoring Markings
Recommendation:

  • Remark application classes on CE edge on egress (as required)
  • Restore markings on the CE edge on ingress via deep packet inspection policies (as required)

If packets need to be remarked to fit with the SP CoS model, do it at the CE edge on egress. This requires less of an effort than doing it in the campus.

To restore DSCP markings, traffic can be classified on ingress on the CE edge via DPI.

MPLS VPN QoS Roles

CE LAN edge:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Ingress NBAR2 classification and marking policies may be applied
  • Ingress Medianet metadata classification and marking policies may be applied
  • Egress LLQ/CBWFQ/WRED policies may be applied (if required)

CE VPN edge:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Ingress NBAR2 classification and marking policies may be applied (to restore markings lost in transit)
  • Ingress Medianet metadata classification and marking policies may be applied (to restore markings lost in transit)
  • RSVP policies may be applied
  • Egress LLQ/CBWFQ/WRED policies should be applied
  • Egress hierarchical shaping with nested LLQ/CBWFQ/WRED policies may be applied
  • Egress DSCP remarking policies may be applied (used to map application classes into specific SP CoS)

PE customer-facing edge:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Ingress policing policies to meter customer traffic should be applied
  • Ingress MPLS tunneling mode policies may be applied
  • Egress MPLS tunneling mode policies may be applied
  • Egress LLQ/CBWFQ/WRED policies should be applied

PE core-facing edge:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Ingress policing policies to meter customer traffic should be applied
  • Egress MPLS EXP-based LLQ/CBWFQ policies should be applied
  • Ergess MPLS EXP-based WRED policies may be applied

P edges:

  • Ingress DSCP trust should be enabled (enabled by default)
  • Egress MPLS EXP-based LLQ/CBWFQ policies may be applied
  • Egress MPLS EXP-based WRED policies may be applied

IPSEC QoS Design

Tunnel Mode

Default IPSEC mode of operation on Cisco IOS routers. The entire IP packet is protected by IPSEC, the sending VPN router encrypts the entire original IP packet and adds a new IP header to the packet. It supports multicast and routing protocols.

Transport Mode

Often used for encrypting peer-to-peer communications, does not encase the original IP packet into a new packet. Only the payload is encrypted while the original IP header is preserved, in effect being copied to outside of the new IP packet. Because the header is left intact its not possible to do multicast or routing protocols in transport mode.

IPSEC with GRE

GRE can be used to enable VPN services that connect disparate networks. It’s a key building block when using VRF Lite, a technology allowing related Virtual Routing and Forwarding (VRF) instances running on different routers to be interconnected across an IP network, while maintaining their separation from both the global routing table and other VRFs.

When using GRE as a VPN technology, it is often desirable to encrypt the GRE tunnel so that privacy and authentication of the connection can be ensured. GRE can be used with IPSEC tunnel mode or transport mode but if the tunnel transits a NAT or PAT device, tunnel mode is required.

Remote-Access VPNs

Cisco’s primary remote-access VPN client is AnyConnect Secure Mobility Client, which supports both IPSEC and Secure Sockets Layer (SSL) encryption.

Anyconnect uses Data Transport Layer Security (DTLS) to optimize real-time flows over SSL encrypted tunnel. Anyconnect connects to remote headend concentrator (such as an ASA firewall) through TCP-based SSL. All traffic from the client including voice, video and data traverses the SSL TCP connection. When TCP loses packets it pauses and waits for them to be resent, this is not good for real-time UDP based packets.

DTLS is a datagram technology, meaning it uses UDP packets instead of TCP. After Anyconnect establishes the TCP SSL tunnel it also establishes an UDP-based DTLS tunnel which is reserved for the use of real-time applications. This allows RDP voice and video packets to be sent unhindered. In case of packet loss, the session does not pause.

The decision on which tunnel to send the packets to is dynamic and made by the Anyconnect client.

QoS Classification of IPsec Packets
Recommendation:

  • Understand the default behavior of Cisco VPN routers to copy the ToS byte from the inner packet to the VPN packet header

Cisco routers by default copy the the ToS field from the original IP packet and write it into the new IPSEC packet header, thus allowing classification to still be accomplished by matching DSCP values. The same holds true for GRE packets as well. The IP packet is encrypted so it’s not possible to match on other fields such as IP addresses, ports, protocol and so on without using another feature.

The IOS Preclassify Feature
Recommendations:

  • Be aware of the limitations of QoS classification when using something other than the ToS byte
  • Use the IOS preclassify feature for all non ToS types of QoS classification
  • As a best practice, enable this feature for all VPN connections

Normally tunneling and encryption takes place before QoS classification in the order of operations, QoS preclassify reverses the order so that classification can be done on the IP header before it gets encrypted. Actually the order isn’t really reversed but the router clones the original IP header and keeps it in memory so that it can be used for QoS classification after tunneling and encryption.

This feature is only applicable on the encrypting routers outbound interface (physical or tunnel). Downstream routers can’t make decisions on the header because the packet will be encrypted at that point. Always enable the feature since tests have shown that it has very little impact on the routers performance to enable it.

MTU Considerations
Recommendations:

  • Be aware that MTU issues can severely impact network connectivity and the quality of user experience in VPN networks

When tunneling technologies are used there is always the risk of exceeding the MTU somewhere in the path. Unless jumbo frames are available end-to-end, MTU issues will almost always need to be addressed when dealing with any kind of VPN technology. Common symptoms when having MTU issues is that applications using small packets such as voice work but not e-mail, file server connections and many other applications.

Path MTU Discovery (PMTUD) can be used to discover what the MTU is along the path but it relies on ICMP messages which may be blocked on intermediary devices.

TCP Adjust-MSS

TCP Maximum Segment Size (MSS) is the maximum amount of payload data that a host is willing to accept in a single TCP/IP datagram. During a TCP connection setup between two hosts (TCP SYN), the MSS for each side of the connection is reported to each other. It’s the responsibility of the sending host to limit the size of the datagram to a value less than or equal to the receiving hosts MSS.

For an IP packet that is 1500 bytes and using TCP, the MSS is 1460 bytes, 20 bytes for IP and 20 bytes for TCP excluded from the 1500 byte packet.

Two hosts may not be aware they are communication through a tunnel and send a TCP SYN with MSS 1460 but the MTU may be lower. TCP Adjust-MSS can rewrite the MSS of the SYN packet so that when the receiving hosts gets it, the value is set to something lower to be able to send traffic through the tunnel without fragmentation. The receiving host will then reply with this value to the sender host. The router is acting as a middleman for the TCP session.

When using IPSEC over GRE, a MTU of 1378 bytes can be used:

  • Original IP packet = 1500 bytes
  • Subtract 20 bytes for IP header = 1480 bytes
  • Subtract 20 bytes for IP header = 1460 bytes
  • Subtract 24 bytes for GRE header = 1436 bytes
  • Subtract a maximum of 58 bytes for IPSEC = 1378 bytes

Adjusting MSS is a CPU intensive process. Enable it at remote sites rather than headend since it might be terminating a lot of tunnels. Adjusting MSS only needs to be done at one point in the path.

TCP Adjust-MSS only has impact on TCP packets, UDP packets are less likely to be of large size compared to TCP.

Compression Strategies Over VPN
Recommendations:

  • Compression can improve overall throughput, latency and user experience on VPN connections
  • Some compression technologies tunnel and may hide the fields used for QoS classification

TCP Optimization Using WAAS

Wide Area Application Services (WAAS) is a WAN accelerator, it uses compression technologies such as LZ compression, Date Redundancy Elimination (DRE) and specific Application Optimizers (AO). This significantly reduces the amount of data send over the WAN or VPN. For a technology like WAAS to work, the compression must take place before encryption.

Compression technologies can have a significant effect on the QoE but it works mainly for TCP traffic. Some WAN acceleration solution may break classification if the traffic is tunnel so that the original IP header is obfuscated. WAAS only compresses the data partion of the packet and keeps the header intact leaving the ToS byte available for classification.

Using Voice Codecs over a VPN Connection

To improve voice quality over bandwidth constrained VPN links, administrators may use compression codecs such as ILBC or G.729.

G.729 uses about a third of the bandwidth of G.711 but this also increases the effect of packet loss since more data is lost in every packet. To overcome this when the a packet is lost and the jitter buffer expires, the voice from the previous packet can be replayed to hide the gap, essentially tricking the listener. Through this technology, up to 5% of packet loss can be acceptable.

Internet Low Bitrate Codec (ILBC) uses 15.2 Kbit/s or 13.33 Kbit/s and performs similarly to G.729, the Mean Opinion Score (MOS) for ILBC is significantly better though when there is packet loss.

Compress Real-Time Protocol (cRTP) is not compatible with IPSEC because the packets are already encrypted when cRTP would try to compress them.

Antireplay Implications
Recommendation:

  • Antireplay drops may introduce in an IPSEC VPN network with QoS enabled

When ESP authentication is configured in an IPSEC transform set, every Security Association (SA) keeps a 64-packet sliding window where it checks the incoming sequence number of the encrypted packets. This is to stop from someone replaying packets and is called connectionless integrity. If packets arrive out of order due to queuing it must fit inside the window or the packet will be drop and seen as antireplay error. A data packet may get stuck behind voice in a queue so that it misses to fit inside its sliding window and then the packet would get dropped. To overcome this use a line in the ACL for every type of traffic such as voice, data, video. This will create a SA for each type of traffic.

TCP will be affected by packet loss, it will not know that the packets are dropped due to antireplay.

Antireplay drops are around 1 to 1.5% on congested VPN links with queuing enabled. A CBWFQ policy will often hold 64 packets per queue, decreasing this will lead to fewer antireplay drops as the packets are dropped before traversing the VPN but it may also increase the CPU usage.

DMVPN QoS Design

DMVPN offers some advantages regarding QoS compared to IPSEC, such as the following:

  • Reduction of overall hub router QoS configuration
  • Scalability to thousands of sites, with QoS for each tunnel on the hub router
  • Zero-touch QoS support on the hub router for new spokes
  • Flexibility of both hub and spoke and spoke to spoke (full mesh) deployment models

DMVPN Building Blocks

mGRE: Multi-point GRE allows a single tunnel interface to server a large number of remote spokes. One outbound QoS policy can be applied instead of one per tunnel as with normal GRE which is point-to-point.

Dynamic discover of IPSEC tunnel endpoints and crypto profiles: Dynamic creation of crypto maps, no need to statically build crypto map for each tunnel endpoint.

NHRP: Allows spoke to be configured with dynamically configured IP address. Also enables zero-touch deployment that makes DMVPN spokes easy to set up. Think of the hub router as a “next-hop server” rather than a traditional VPN router. NHRP is also used for per tunnel QoS feature.

The Per-Tunnel QoS for DMVPN Feature

Allows the administrator to enable QoS on a per-tunnel or per-spoke basis. QoS policy is applied to the mGRE tunnel interface. This protects spokes from each other and keeps one spoke from using all the BW so that there is none left for the others. The QoS policy at the hub is automatically generated for each tunnel when a spoke registers with the hub.

Queuing only kicks in when there is congestion, to signal to the routers QoS mechanism that there is congestion a shaper is used. Shape the traffic flows to the real VPN tunnel bandwidth to produce artificial back pressure. With per-tunnel QoS for DMVPN, a shaper is automatically applied by the system to each and every tunnel. This allows the router to implement differentiated services for the various data flows corresponding to each tunnel. This technique is called Hierarchical Queuing Framework (HQF).

Using NHRP, multiple spokes can be grouped together to use the same QoS policy.

This technique provides QoS in the egress direction of the hub towards the spokes. For QoS from the spokes to the hub, a QoS policy needs to be applied at the spokes.

At this time it is not possible to have an unique policy for traffic between spoke to spoke due to spokes not having access to the NHRP database.

GET VPN QoS Design

Group Encrypted Transport (GET) VPN is a technology to encrypt traffic between IPSEC endpoints without the use of tunnels. Packets transmitted use IPSEC tunnel mode but it is not defined by traditional IPSEC SA.

Because there are no tunnels, the QoS configuration is simplified.

GET VPN QoS Overview

DMVPN is suitable for hub and spoke VPNs over a public untrusted network such as the Internet, GET VPN is suitable for private networks such as a MPLS VPN. A MPLS VPN is private but not encrypted and GET VPN can encrypt the traffic between the MPLS sites. GET VPN has no real concept of hub and spoke, which simplifies the QoS architecture. There is not one major hub aggregating all the remote sites and being liable to massive oversubscription.

These are some of the major differences between DMVPN and GET VPN model:

Choosing VPN

Group Domain of Interpretation (GDOI)

GDOI is a technology that supports any to any IPSEC VPN without the use of tunnels. There is no concept of SA between specific routers, instead it uses a group SA which is used by all the encrypting nodes in the network. There is no per tunnel QoS needed since it does not use tunnels, QoS is simply applied egress on each GET VPN router.

GDOI control plane protocol uses UDP port 848 and ISAKMP on port UDP 500. These packets are normally marked DSCP CS6 by the router.

IP Header Preservation

Normally with IPSEC tunnel mode the ToS byte is copied to the new IP header but the original IP header is not preserved. On a public network such as the Internet it makes good sense to hide the source and destination IP addresses but GET VPN is deployed on MPLS networks which are private.

GET VPN keeps the original IP header intact which simplifies QoS, dynamic routing and multicast. The packet is still considered an ESP IPSEC packet, not TCP or UDP, so to classify based on port numbers the QoS preclassify feature will still be needed.

How and When to Use the QoS Preclassify Feature
Design principles:

  • If classification is based on source or destination IP, preclassify is not needed but still recommended
  • If classification is based on TCP or UDP port numbers, QoS preclassificy is needed
  • Enable the QoS preclassify feature in GET VPN deployments

A Case for Combining GET VPN and DMVPN

DMVPN has some drawbacks, spoke to hub tunnel is always up but spoke to spoke tunnels are dynamically brought up. This causes a delay which can take a second or two and may have negative impact on real-time traffic. The delay is not caused by NHRP or the packetization of the GRE tunnel but rather the exchange of ISAKMP messaging and the establishment of the IPSEC SAs between the routers.

DMVPN could then be used solely for setting up GRE tunnels and GET VPN for encryption of the packets going into the tunnel. This then allows for fast establishment of tunnels and encrypting the packets, increasing the overall user experience.

Working with Your Service Provider When Deploying GET VPN
Design principles:

  • Ensure that the service provider handles DSCP consistently troughout the MPLS WAN network
Categories: CCDE, QoS Tags: , , , , , ,

Book Review – End-to-End QoS Network Design: Quality of Service for Rich-Media & Cloud Networks, Second Edition

January 9, 2015 4 comments

As part of my CCDE studies, I needed a good resource on QoS. There have basically been two good books on QoS before, the first edition of End to End Qos Network Design and Qos-Enabled Networks: Tools and Foundations. The first edition of this book is good but very dated, it was released back in 2004. Qos-Enabled Networks is a great book but it’s written to not be vendor specific, so you will not get details on platforms or configuration snippets.

In my opinion, earlier books gave a good foundation to understand QoS concepts but there were too few design cases, they were lacking platform information and not enough examples to be able to act as a reference. Since the first edition of this book, a lot has happened, new products and new Places In the Network (PIN) such as Datacenter, Wireless and to some degree MPLS.

The book is written by Tim Szigeti, Christina Hattingh, Robert Barton and Kenneth Briley Jr. Tim is a long time CCIE, technical leader at Cisco. He is the QoS gury responsible for a lot of the Cisco Validated Designs (CVDs) and a frequent presenter at Cisco Live. Christina is a former Technical Marketing Engineer (TME) at Cisco now acting as an independant, writing books, teaching and consulting. Robert is a senior Systems Engineer (SE), dual CCIE and CCDE. Kenneth is a CCIE, technical lead at Cisco, focusing on convergence of QoS for wired and wireless networks.

This book was written of some of the best minds in the world on QoS, and it shows.

The book is divided into different parts, the first part consists of an QoS overview and describes Diffserv, Intserv, classification and marking, policing, shaping, congestion management and avoidance, QoS in IPv6 networks and more. The book does a very good job of laying a good foundation for the reader to build on. It has nice graphics to explain queueing, policing, shaping and so on. Every chapter also has a “Further Reading” part if you want to dive deeper into a subject.

The next part of the book is about business and application QoS requirements. What requirements does different applications have? How do you differentiate business critical apps on port 80 from bulk traffic? What are the design principles for QoS? How many classes should be deployed? The book tries to answer these questions, many books fall short on this part.

After that there is a part on Campus QoS. This is where the book really starts to shine. It shows the difference between Multi Layer Switching (MLS) QoS and Modular QoS CLI (MQC), how to apply QoS on 3750, 4500 and 6500. What are the different trust states, where should you trust, where should you mark. It also shows how to apply QoS on Etherchannels and how it behaves on different platforms, information that can be difficult to find and hidden through multiple documents otherwise. It ends with a design case and in my opinion all books should be written like this. This shows the reader how to apply the different concepts and to think of how all pieces fit together.

Then there is a part on wireless QoS, first an overview on how packets are scheduled on the radio, which standards that are relevant, why the earlier standards were not good enough and what has changed. QoS is shown on different platforms and controllers and at the end there is a case study. I don’t work much with wireless but if I did this would be a very good reference since earlier books don’t discuss wireless QoS. I was surprised to learn that there are some discrepancies in wireless QoS compared to 802.1p and DSCP.

Datacenter QoS is in the next part and this is definitely a great addition compared to earlier books. It discusses the different Nexus platforms, what additions are needed in the Datacenter to be able to deliver lossless Ethernet and also ends with a case study.

WAN and branch QoS design comes after that and this is probably what most readers will recognize as QoS. It has examples on the ISR G2 but also on the ASR1k and as usual ends with a case study.

I really like the next part which is on MPLS QoS. This is not easy to find in other books. It explains the difference between short pipe, pipe and uniform mode. It also has examples on QoS on the ASR9k, CRS and also examples on how the customer should configure QoS when connecting to a Service Provider (SP). As usual a case study at the end.

The final part of the book is on QoS in VPNs, such as IPSEC, GET VPN, DMVPN and connecting from a home office. This part is also difficult to find in other books so it’s great that it’s included in here. It also has a case study at the end.

This book is written on some of the best people out there. It has a nice flow to it, it covers all the relevant areas of QoS. It covers different platforms and shows examples on how to configure QoS on these platforms. It can serve as book for learning more or for a certification or simply as a reference for all of your needs on QoS. This book is VERY extensive but it is so for a reason. It’s not long just for the sake of it, it’s all relevant material. Read it end to end or pick the parts you are interested in. If you want to get one book for QoS, get this one! If you are studying for the CCIE, this should be your reference. I can’t recommend this book enough, you’ll see the ratings on Amazon, Safari etc that everyone agrees that this is an awesome book.

Categories: Announcement Tags: , ,

Borrowing Credits When Using Shaper on Cisco IOS

April 15, 2014 4 comments

Introduction

When using a shaper on IOS, the shaper allows a deficit to be created, borrowing
future credits. It’s common knowledge that a shaper queues or buffers packets but
it’s not common knowledge that the shaper allows a deficit to be created.

To demonstrate the concepts I have setup a very simple network with two routers
connected by a FastEthernet link.

Their clocks have been synchronized to show the timing of the events going on.

This post assumes prior knowledge of QoS with regard to concepts such as Bc, Be
CIR and Tc.

Using a Policer

A policer does not allowed a deficit to be created. This can be proven very easily.
To prove the concept a single rate, two color policer will be used. A two color
policer does not have a Be bucket so no tokens will be spilled over from the Bc
bucket.

The Bc bucket starts out full. When a packet arrives, the packet size is compared
to the number of tokens (bytes) in the Bc bucket. If the packet fits then the appropriate
number of tokens is taken from the Bc bucket and the packet is sent on its way.

The next time a packet arrives, the number of tokens in the bucket will depend on the
time interval between the packets. This is in contrast to a shaper that submits tokens
to the bucket at fixed intervals.

A policer does not allow a deficit to be created. A policer is created with a Bc value
of 1000 bytes. The CIR is set to 10 kbit/s. With such a low value for Bc it means that
any packets with a size over 1000 bytes will be dropped.

R1#sh policy-map
  Policy Map POLICER
    Class class-default
     police cir 10000 bc 1000
       conform-action transmit 
       exceed-action drop
R1#ping 10.0.0.2 size 1000

Type escape sequence to abort.
Sending 5, 1000-byte ICMP Echos to 10.0.0.2, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

No packets made it through due to the size of the packet being 1000 bytes payload,
20 bytes of IP and 8 bytes of ICMP which is more than 1000 bytes in total.
The policer does not allow a deficit to be created so all packets had to be dropped.

If we ping with a 972 byte payload some packets should make it through.

R1#ping 10.0.0.2 size 972 ti 1

Type escape sequence to abort.
Sending 5, 972-byte ICMP Echos to 10.0.0.2, timeout is 1 seconds:
!.!.!
Success rate is 60 percent (3/5), round-trip min/avg/max = 32/46/68 ms

The policer shows that some packets have exceeded.

R1#sh policy-map int f0/0
 FastEthernet0/0 

  Service-policy output: POLICER

    Class-map: class-default (match-any)
      35 packets, 12728 bytes
      5 minute offered rate 0 bps, drop rate 0 bps
      Match: any 
      police:
          cir 10000 bps, bc 1000 bytes
        conformed 5 packets, 3138 bytes; actions:
          transmit 
        exceeded 7 packets, 7042 bytes; actions:
          drop 
        conformed 0 bps, exceed 0 bps

While sending the packets I had debugs going on both devices. This is the timing
of the event.

Apr 15 12:08:10.183: IP: tableid=0, s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), routed via FIB
Apr 15 12:08:10.187: IP: s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), len 972, sending
Apr 15 12:08:10.247: IP: tableid=0, s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), routed via RIB
Apr 15 12:08:10.247: IP: s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), len 972, rcvd 3

A packet is sent at 10.183 and received at 10.247. A look at R2 confirms that
it sent the packet at 10.195.

Apr 15 12:08:10.195: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1

The next packet is sent at 10.255 but this does not make it through the policer.

Apr 15 12:08:10.255: IP: tableid=0, s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), routed via FIB
Apr 15 12:08:10.259: IP: s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), len 972, sending

With a CIR of 10 kbit/s, we can only send 1250 bytes every second.

The router then waits for the ICMP packet to timeout which was set to one second.
Then the next packet is sent at 11.255 and received at 11.287.

Apr 15 12:08:11.255: IP: tableid=0, s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), routed via FIB
Apr 15 12:08:11.255: IP: s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), len 972, sending
Apr 15 12:08:11.287: IP: tableid=0, s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), routed via RIB
Apr 15 12:08:11.287: IP: s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), len 972, rcvd 3

Output from the other router shows it was sent at 11.255.

Apr 15 12:08:11.255: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1

It is clear that a policer does not allow a deficit, either the packet makes it
through or it is dropped.

Using a Shaper

A shaper allows a deficit to be created. This can be proven by creating a shaper
that uses only Bc and no Be. If a packet is sent with a size larger than Bc it
should in theory be dropped. This is however not the case. The following shaper
is used.

R1#sh policy-map
  Policy Map SHAPER
    Class class-default
      Traffic Shaping
         Average Rate Traffic Shaping
         CIR 10000 (bps) Max. Buffers Limit 1000 (Packets)
         Bc 8000 Be 0

If a shaper does not allow a deficit then all packets larger than 1000 bytes should
be dropped.

R1#ping 10.0.0.2 size 972 ti 1

Type escape sequence to abort.
Sending 5, 972-byte ICMP Echos to 10.0.0.2, timeout is 1 seconds:
!!!!.
Success rate is 80 percent (4/5), round-trip min/avg/max = 32/409/808 ms

Almost all packets made it through which could be due to buffering but let’s
have a look at the timing of what happened.

Apr 15 12:19:45.683: IP: tableid=0, s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), routed via FIB
Apr 15 12:19:45.687: IP: s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), len 972, sending
Apr 15 12:19:45.775: IP: tableid=0, s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), routed via RIB
Apr 15 12:19:45.775: IP: s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), len 972, rcvd 3

The Bc bucket starts out full so the packet is immediately transmitted.
Packet was sent at 45.683 and received at 45.775. We confirm with output
from the other router.

Apr 15 12:19:45.714: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1

The interesting part is that R1 sent its second packet at 45.783.

Apr 15 12:19:45.783: IP: tableid=0, s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), routed via FIB
Apr 15 12:19:45.787: IP: s=10.0.0.1 (local), d=10.0.0.2 (FastEthernet0/0), len 972, sending
Apr 15 12:19:45.811: IP: tableid=0, s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), routed via RIB
Apr 15 12:19:45.815: IP: s=10.0.0.2 (FastEthernet0/0), d=10.0.0.1 (FastEthernet0/0), len 972, rcvd 3

This packet was then received at 45.811. Once again output from the other router.

Apr 15 12:19:45.782: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1

R1 should not have been allowed to send this packet so quickly after the first one.
With our shaper applied it should have had to wait around 800ms before sending the
next one. However a deficit was created to allow sending the packet more quickly.

If we look at the five packets that R2 replied to we can see a pattern.

Apr 15 12:19:45.714: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1
Apr 15 12:19:45.782: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1
Apr 15 12:19:46.498: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1
Apr 15 12:19:47.298: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1
Apr 15 12:19:48.918: ICMP: echo reply sent, src 10.0.0.2, dst 10.0.0.1

The first two packets came in very quickly. Between packet two and three there is
a 716ms gap. Between three and four there is a 800ms gap. Between four and five
there is a 1620ms gap.

It is clear that at the end the router had to pay its dues.

Conclusion

Shapers on Cisco IOS allows a deficit to be created. This means that packets larger
than the size of the Bc bucket can be sent. The internals of this mechanism is only
known by Cisco.

What is the reason for this behavior? I can only speculate but it could be to try to
send packets rather than dropping them. What are your ideas?

Categories: QoS Tags: , , , ,

Cisco releases new switch – Catalyst 3850

January 30, 2013 8 comments

Cisco is releasing a new Catalyst switch and it will be called 3850. Some of
you might already have heard about it. I’ve seen some presentations about it
but now it’s offical. Lets look at some of the highlights of this new switch.
If you want the full info check out BRKCRS-2887 from Cisco Live in London.

  • Integrated wireless controller
  • Clean Air
  • Radio Resource Management (RRM)
  • 802.11ac ready
  • Stacking, Stackpower
  • Flexible Netflow
  • Granular QoS
  • Energywise

It terminates CAPWAP and DTLS in hardware. One switch/stack can support up to
50 APs and 2000 clients. Wireless capacity is 40G/switch. Supports IPv4 and
IPv6 client mobility. IP base license level is required to use wireless capabilities.

The stack supports 480Gbps and the fans and power supplies are field replacable.
It also has support for stackpower and linerate on all ports. The switch supports
SFP and SFP+ modules. Different network modules can be inserted with different
capabilities. WS-C3850-NM4-1G has 4x 1G ports (SFP). WS-C3850-NM-2-10G has 2x 10G
OR 4x 1G OR 2x 1G AND 1x 10G. WS-C3850-NM-4-10G is autosensing and supports all
combinations up to 4x 10G, only supported on WS-C3850-48.

Power modules are available at 350W, 715W and 1100W and are called PWR-C1-350WAC,
PWR-C1-715WAC, PWR-C1-1100WAC.

Here is a comparison to the 3750-X.

Comparision_3750X

As you can see it’s a pretty good improvement compared to the 3750.

Some more features:

  • Cavium 6230 800 MHz 4-core CPU
  • IOS XE
  • 2GB flash, 4GB DRAM
  • 84Mpps per ASIC
  • Line rate for 64-byte packets
  • 8 queues per port (wired), 4 queues per port (AP ports)
  • Flexible Netflow

 

So one major thing here is that it is actually running IOS XE and it has an
IOS daemon supporting IOS. This enables support for the multicore CPU. It
allows for hosted applications like Wireshark. Here is a look under the hood
of the switch.

3850_under_the_hood

Finally the Catalyst 3850 uses MQC and not MLS QoS which is nice to see. This
means the QoS features will be more comparable to those of a router. A nice
features is that you can apply different QoS settings depending on the SSID.

All in all this looks like a very intesting switch for the enterprise that has
both wired and wireless needs.

Catalyst QoS – A deeper look at the egress queues

October 8, 2012 2 comments

I’ve done a post earlier on Catalyst QoS. That described how to
configure the QoS features on the Catalyst but I didn’t describe
in detail how the buffers work on the Catalyst platform. In this
post I will go into more detail about the buffers and thresholds
that are used.

By default, QoS is disabled. When we enable QoS all ports
will be assigned to queue-set 1. We can configure up to two
different queue-sets.

sh mls qos queue-set 
Queueset: 1
Queue     :       1       2       3       4
----------------------------------------------
buffers   :      25      25      25      25
threshold1:     100     200     100     100
threshold2:     100     200     100     100
reserved  :      50      50      50      50
maximum   :     400     400     400     400
Queueset: 2
Queue     :       1       2       3       4
----------------------------------------------
buffers   :      25      25      25      25
threshold1:     100     200     100     100
threshold2:     100     200     100     100
reserved  :      50      50      50      50
maximum   :     400     400     400     400

These are the default settings. Every port on the Catalyst has
4 egress queues (TX). When a port is experiencing congestion
it needs to place the packet into a buffer. If a packet gets
dropped it is because there were not enough buffers to store it.

So by default each queue gets 25% of the buffers. The value is
in percent to make it usable across different versions of the Catalyst
since they may have different size of buffers. The ASIC will have
buffers of some size, maybe a couple of megs but this size is not known
to us so we have to use the percentages.

Of the buffers we assign to a queue we can make the buffers reserved.
This means that no other queue can borrow from these buffers. If we
compare it to CBWFQ it would be the same as the bandwidth percent command
because that guarantees X percent of the bandwidth but it may use more
if there is bandwidth available. The buffers work the same way. There is
a common pool of buffers. The buffers that are not reserved go into the
common pool. By default 50% of the buffers are reserved and the rest go
into the common pool.

There is a maximum how much buffers the queue may use and by default this
is set to 400% This means that the queue may use up to 4x more buffers than
it has allocated (25%).

To differentiate between packets assigned to the same queue the thresholds
can be used. You can configure two thresholds and then there is an implicit
threshold that is not configurable (threshold3). It is always set to the maximum the queue
can support. If a threshold is set to 100% that means it can use 100% of
the buffers allocated to a queue. It is not recommended to put a low value
for the thresholds. IOS enforces a limit of at least 16 buffers assigned
to a queue. Every buffer is 256 bytes which means that 4096 bytes are
reserved.

	 Q1% Q1buffer Q2% Q2buffer Q3% Q3buffer Q4% Q4buffer
buffers  25           25           25           25
Thresh1  100 50       100 50       100 50       100 50
Thresh2  100 50       100 50       100 50       100 50
Reserved 50  25       50  25       50  25       50  25
maximum  400 200      400 200      400 200      400 200

This table explains how the buffers works. Lets say that this port
on the ASIC has been assigned 200 buffers. Every queue gets 25% of the
buffers which is 50 buffers. However out of these 50 buffers only 50%
are reserved which means 25 buffers. The rest of the buffers go to the
common pool. The thresholds are set to 100% which means they can use 100%
of the allocated buffers to the queue which was 50 buffers. For packets
that go to threshold3 400% of the buffers can be used which means 200 buffers.
This means that a single queue can use up all the non reserved buffers
if the other queues are not using them.

To see which queue packets are getting queued to we can use the show
platform port-asic stats enqueue command.

Switch#show platform port-asic stats enqueue gi1/0/25
Interface Gi1/0/25 TxQueue Enqueue Statistics
Queue 0
Weight 0 Frames 2
Weight 1 Frames 0
Weight 2 Frames 0
Queue 1
Weight 0 Frames 3729
Weight 1 Frames 91
Weight 2 Frames 1894
Queue 2
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Queue 3
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 577

In this output we have the four queues with three thresholds. Note that queue 0
here is actually queue 1. Queue 1 is queue 2 and so on. Weight 0 is
threshold1, weight 1 is threshold2 and weight 3 is the maximum threshold.

We can also list which frames are being dropped. To do this we use the
show platform port-asic stats drop command.

Switch-38#show platform port-asic stats drop gi1/0/25
Interface Gi1/0/25 TxQueue Drop Statistics
Queue 0
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Queue 1
Weight 0 Frames 5
Weight 1 Frames 0
Weight 2 Frames 0
Queue 2
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0
Queue 3
Weight 0 Frames 0
Weight 1 Frames 0
Weight 2 Frames 0

The queues are displayed in the same way here where queue 0 = queue 1.
This command can be good to find out if you are having packet loss for important
traffic like IPTV traffic or such that is dropping in a certain queue.

The documentation for Catalyst QoS can be a bit shady and by this post I
hope that you know have a better understanding how the egress queueing works.

Categories: Catalyst, CCIE, QoS Tags: , , ,

INE 10 day bootcamp – Review

September 4, 2012 10 comments

I’m back from London and it’s been a great experience. Many readers are interested in what
the bootcamp is like. It is a big investment to go for so it is understandable that you
want to know if it will be worth it. I’ll start by describing the teacher and his teaching
methods.

Brian Dennis is a well known and respected man in the network industry. He is CCIE #2210
and has 5x CCIEs. That is among the very best in the world. Brian is not one of those
academic guys that only knows what is written in a book. He as a solid background in the
industry which means he can explain WHY things are the way they are and not just stating
facts without any reasoning behind it. There will be NO powerpoints, it is CLI only and
although he has a topology he is using the configurations are not prebuilt. He will do
them live which means there will be issues, which is GOOD. You get to see a 5x CCIE
troubleshooting and since he hasn’t prepared the faults before you will see how he would
troubleshoot a live problem which is very good practice for the TS lab in the CCIE lab.
Brian is a strong believer in that there are no tips and tricks. If you have an
instructor teaching you all these tips and tricks then that instructor is a fake.
If you know the technology there are no tips and tricks. Sure he can teach you some
useful commands but there are no tips and tricks in routing protocols.

Jeremy Brown is the bootcamp coordinator. He’s a very nice guy and he will help you
with any queries you have about the bootcamp. If you are attending you will be
talking to him for sure.

When you start the class the first day you will be handed a folder with paper and
a pen and some contact information. Brian will introduce himself and give some
general guidelines and explain how the real lab works with TS section and
configuration section etc. Then everyone gets to introduce themselves. My class
had a lot of nationalities, Bolivia, France, Venezuela, Sweden, UK, Ireland,
Norway, Hungary were all represented.

The bootcamp runs from 9 AM in the morning to about 19-20 PM in the evening.
There will be some 15 minute breaks and a lunch break for 1.5h. It is long
days indeed so make sure to get enough sleep in the evening. This is a pure
learning experience, leave the partying for another time. If you want to
have some fun there will be time in the weekend for that.

The first day is about layer two. Since the configuration is built from
scratch it makes sense to start out with layer two. The topology used
is based on Cisco 360 with 5 routers and 4 switches. The routers are ISR
routers and the switches are 3560’s. It is good that this topology is
used since that is very similar to what is being used in the real lab.
When attending the bootcamp you are expected to have a good knowledge
of protocols and that you have watched the INE ATC videos. This is so
that you don’t get overwhelmed by the information in the bootcamp.
The layer two section focused on MST, PPP and frame relay and
spanning tree features like BPDU guard, BPDU filter etc. One advice
that Brian gave is to try to mix in things like PPP, PPPoE, PPPoFR
etc in your labs so that you get used to using these technologies.

Later in the week we moved on to IGPs. OSPF will be the main topic.
This is natural since OSPF will guaranteed be in your lab and you
REALLY need to know OSPF to pass the lab. Brian is an OSPF
machine, he knows the LSDB like the back of his hand. He is very
methodical and will confirm each step and show you in the LSDB
what we are seeing and why we are seeing it. He’s not one of
those guys that clears the routing table when he runs into a
glitch, he will explain how and why it is there. He had a very
good section about the forwarding address, this is an important
part of OSPF and Brian explained why it is used. He had a very
good analogy with BGP where basically if the FA is not set then
you are using next-hop-self and if it is set then the next-hop
is preserved. He also had a good explanation of the capability
transit feature and he did some great diagrams showing which
LSAs go where. This is basic knowledge but he put it so well in
that diagram. We also talked about virtual links and things like
that. One good command he showed was the show ip ospf rib
command. EIGRP and RIP will be shorter sections, he will only
show some more advanced configuration since these protocols are
a bit simpler to understand. For EIGRP he showed hot do do
unequal cost load balancing and how to calculate the metric
if you want to get a certain ratio. He showed how to do
offset-list, leak maps and authentication.

After we were done with IGPs we moved on to route redistribution.
This topic alone is enough to provide a good bootcamp experience.
Brian will in detail explain the difference between control plane
and data plane loops and why loops can occur. The important thing
to remember is that we are trying to protect the routes with a
high AD from being learned in a protocol with a lower AD. Usually
RIP is involved or EIGRP external routes since those have a high
AD. Brian will show you how to take any INE Vol2 lab topology
diagram and just look at it and identify potential issues.
This is a very good practice and when you can look at a diagram
and know what to do without even thinking about configuration
yet then you are in a good place. Brian will with his diagrams
show you where every command lives like the OSPF LSDB, OSPF RIB,
RIB, FIB etc. This is very good practice to make sure you have
a full understanding of what is going on.

BGP is of course an important topic and Brian is covering that
for sure. Brian starts by describing peering and goes through
some common misconceptions. BGP has no authentication,
wait for it…TCP has, this is a common misconception. It is
TCP providing the authentication of packets and not BGP.
He will explain concepts like hot potato vs cold potato routing.
He will show you the difference between disable-connected-check and
ebgp-multihop. He will teach you about route reflectors and
confederations and why you want to use the one or the other.
He will also explain MED in detail, something I found very useful,
explaining how deterministic MED works and always-compare-med.
He has such knowledge of everything and one thing I didn’t know
before is that networks in the BGP table are sorted by age where
the youngest network is listed first.

Building on BGP means MPLS comes naturally. These go hand in
hand and for the v4 CCIE lab you need to know MPLS. Brian
will of course explain the use of RD and RT. Remember that RD
only has a use in BGP. He shows where all the commands and
routes live and how to do troubleshooting for MPLS. The good
thing is that you will run into things that you didn’t maybe
think about and that will provide great troubleshooting. OSPF
is the most complicated PE-CE protocol and he will give you all
the details how to use Domain-ID, sham links and how the
external route tag and DN bit works.

First week is over. Time for some recovery. Have some fun and
go for some sightseeing or just do labs, the choice is yours.
Just make sure that you are well rested for when monday comes.

The second week started out with multicast. This was maybe my
favourite topic and I learned a lot from this section.
As I mentioned earlier Brian doesn’t believe in tips and tricks
and multicast is one of those topics where people have a lack
of understanding and that is why they go looking for tricks.
Multicast is 90% about PIM, you need to know PIM if you want
to be good with multicast. Brian shows common errors like having
a broken SPT or RPF failures and things like that. These usually
occur when hub and spoke frame relay is involved. With just a
few commands you can become very good with analyzing multicast.
Show ip pim interface, show ip pim neighbor,
show ip rpf x.x.x.x and show ip pim rp mapping will give you most
of the information you will need. The best thing about the
multicast section was that when we ran into errors Brian was very
methodical, instead of just pinging over and over he showed us
what was wrong and then cleared the mroute table, this will
make the mtree build again so that you always go back to a
well known state. It is probably common to have the correct
configuration but move away from it due to lack of patience
or lack of understanding of what is really going on.

Time for the killer topic, probably the most hated topic in
the entire blueprint for most candidates. You guessed it, it is
time for PfR. Where does this hate come from? Well it comes
from the fact that the 12.4 implementation of PfR is just so
incredibly bad. If I were to select one topic that is difficult
to study on your own and that you can really benefit from going
to a bootcamp then that would be PfR. Brian starts out with some
basic topologies and then moves on to some more advanced scenarios.
This topic runs for one day or even a bit more. You WILL run into
a lot of issues due to the implementation of PfR in 12.4. If you
have seen the PfR Vseminar then this will be a lot like that
with the added benefit that you can ask Brian questions of course.

The next big topic is QoS. Brian goes through frame relay
traffic shaping using both legacy syntax and MQC. He will go
through how to use policing and shaping. The coolest thing
about this part was how we configured values for policing
like Bc and then Brian showed by sending ICMP packets how the
token buckets are really working. You might be in for some
surprises here! No powerpoints here for sure! He will explain
the difference between single rate and dual rate policers and
why you would configure them for which scenarios. Then he will
go through the Catalyst QoS. This is a confusing section for
many since the Catalyst QoS is a bit convoluted. Brian shows
how the L2 QoS is very similar to MQC but the syntax is just
a bit strange. He shows how to use the priority queue and how
to use the share and shape queues for the SRR queues.

Whatever time is left will be spent on topics like EEM and
services that you would like to go through. If you feel that
you are weak in some service then this would be a good time to
ask Brian to go through it. I left the bootcamp at 3 PM on
friday and I probably missed a couple of hours in the end.
If you can find a later flight or go home on saturday that
could be a good option.

So now you have gone through a wall of text and you are
whondering what I think about it? Well if it wasn’t obvious
from my text then Yes! Go for it! Yes it costs to go and
with everything to account for like living expense and hotel,
yes it is costly. However if you look just at the price for the
bootcamp which is around 5990$. That is actually a good price,
if you consider that you can get 1500$ paid for your lab then
the cost is actually 4500$ Where I live one week of training
at Global Knowledge is usually around 3000$ for a week and
then often you get some Power Point guy reading slides or you
doing labs while the instructor is watching. The one thing I
found best about the bootcamp was that you learn how to think
at a higher level. Being a CCIE is not about knowing a lot of
commands, it is about thinking at a high level. You get to pick
the brain of a 5x CCIE with real world experience, you won’t
find many guys like that in the world and from what I’ve seen
I would rank Brian among the very best of them. The IGP, Multicast,
Redistribution, PfR sections were very good and you will learn a lot
for sure even if you were strong in these areas before.

Hopefully in class you will meet some new friends. I met some
people people in class I had only seen online before and also made
some new friends. I had a great time with David Rothera, Gian Paolo,
Jose Leitao, Susana and Harald. I also met Darren
for the first time, we have known each other online for a while now
but never met. I also had the chance to meet Patrick Barnes which is
another of my online friends 🙂

I’ve tried to cover as much as I can remember but always feel free
to ask questions in the comments section if you have anything you are
still thinking about.