Archive

Archive for the ‘Spanning tree’ Category

STP Notes for CCDE

February 8, 2015 Leave a comment

These are my study notes for CCDE based on “CCIE Routing and Switching v5.0 Official Cert Guide, Volume 1, Fifth Edition” and “Designing Cisco Network Service Architectures (ARCH) Foundation Learning Guide: (CCDP ARCH 642-874), Third Edition“, “INE – Understanding MSTP” and “Spanning Tree Design Guidelines for Cisco NX-OS Software and Virtual PortChannels“. This post is not meant to cover STP and all its aspects, it’s a summary of key concepts and design aspects of running STP.

STP

STP was originally defined in IEEE 802.1D and improvements were defined in amendments to the standard. RSTP was defined in amendment 802.1w and MSTP was defined in 802.1s. The latest 802.1D-2004 standard does not include “legacy STP”, it covers RSTP. MSTP was integrated into 802.1Q-2005 and later revisions.

STP has two types of BPDUs: Configuration BPDUs and Topology Change Notification BPDUs. To handle topology change, there are two flags in the Configuration BPDU: Topology Change Acknowledgment flag and Topology Change flag.

MessageAge is an estimation of the age of BPDU since it was generated by root, root sends it with an age of 0 and other switches increment this value by 1. The lifetime of a BPDU is MaxAge – MessageAge. MaxAge, HelloTime and ForwardDelay are values set by the root and locally configured values will only be used if that switch becomes the root.

STP works by comparing which Configuration BPDU is superior according to the following ordered list where lower values are better:

  1. Root Bridge ID (RBID)
  2. Root Path Cost (RPC)
  3. Sender Bridge ID (SBID)
  4. Sender Port ID (SPID)
  5. Receiver Port ID (RPID; not included in the BPDU, evaluated locally)

Each port stores the superior BPDU that has been sent or received, depending on the port role. Root and blocking ports store the received BPDU, designated ports store the sent BPDU.

To determine port roles and which ports forward and block, the following three-step process is used:

  1. Elect the root switch
  2. Determine each switch’s Root port
  3. Determine the Designated port for each segment

Root bridge is elected based on lowest bridge ID, which consists of 4 bits Priority, 12 bits System ID Extension and 6 bytes System ID (MAC address). Before 802.1t, a lot of MAC addresses were consumed to make the BID unique when using PVST+ or MST.

BPDUs are only forwarded on designated ports, root ports and blocking ports do not send them since they would be inferior on the segment. A designated port is a port with a superior BPDU on a segment.

Topology Change

A topology change event occurs when:

  • A TCN BPDU is received by a Designated Port of a switch
  • A port moves to the Forwarding state and the switch has at least one Designated Port
  • A port moves from Learning or Forwarding to Blocking
  • A switch becomes the root switch

STP is slow to converge, especially with indirect failures where a link fails between a root switch and an intermediary switch. When inferior BPDUs are received, MaxAge has to expire before a switch will act on it.

When the topology has changed, CAM table needs to be updated on all switches, a timer equivalent to ForwardDelay is used to time out unused entries.

A topology change starts at a switch and it sends TCN BPDU out its root port. The designated switch sets TCA bit in the field of the configuration BPDU to acknowledge the TCN. The TCN then travels upstream until it reaches the root. The root will then send configuration BPDU with TC bit set for MaxAge + ForwardDelay seconds and all switches will shorten the aging time for the CAM table to ForwardDelay seconds.

PVST+

PVST+ runs one spanning tree instance per VLAN. This does not scale well for a large number of VLANs and normally there will only be a few logical topologies anyway.

Switches that do not support PVST+ run Common Spanning Tree (CST) which has one instance of STP for all VLANs. Cisco switches can interact with CST through VLAN 1 by sending untagged BPDUs. All other VLANs in the PVST+ region will tag their BPDUs and tunnel the BPDUs through the CST region by using a special destination MAC address. The CST region is treated as a loop-free shared segment from the viewpoint of the PVST+ region. The destination MAC address is a multicast address that will get flooded by the CST switches.

RPVST+

RSTP has four different port roles:

  • Root Port
  • Designated Port
  • Alternate Port
  • Backup port

The first two are the same as in legacy STP and the last two are new. An alternate port is a port that is a potential backup for the Root Port. A backup port is a replacement for a Designated Port, you would rarely, if ever, see a Backup Port because it is only used on shared segments.

RSTP uses synchronization process to achieve fast convergence. This only works on links that are point to point and is detected by the duplex mode of an interface. The link type can be hard coded in the rare case where a port is half duplex but still not on a shared segment.

RSTP uses more bits in the Configuration BPDU to encode additional information. These are the Proposal bit, Port Role bits, Learning bit, Forwarding bit and Agreement bit.

RSTP switches send their own BPDUs as opposed to only relaying the roots BPDU as in legacy STP. If no BPDU is heard for 3x hello interval, the BPDU is expired. RSTP does not rely on the MaxAge timer to expire BPDUs. RSTP can also act on inferior BPDUs directly instead of waiting for MaxAge to expire. This speeds up indirect link failure scenarios.

RSTP uses a proposal/agreement process where switches negotiate which port that will become Designated. If proposal bit is set, the switch is proposing that its port should become Designated and the other switch will reply with Agreement to immediately allow this. When ports first come up they are in Designated Discarding state. To not create a temporary loop during the synchronization process, all Non-Edge Designated ports are put into a Discarding state. I have in detail described this process in an earlier post.

With RSTP, only ports moving to a Forwarding state will cause a topology change. RSTP sets the TC bit in the BPDU to notify of a topology change and sends it out its Root Port and Designated Ports that are Non-Edge. MAC addresses are immediately flushed on these ports.

MST

MST uses the same underlying structure such as RSTP with regards to BPDU parameters but it decouples VLANs from spanning tree instances, multiple VLANs can be mapped to a single instance. MST is more efficient because the operator can define the number of instances needed and map the VLANs to these instances. MST is the only standard that supports VLANs and is suitable in a multi vendor environment.

MST switches organize the network into regions, switches within a region use MST in a consistent way. For switches to be in the same region, the name, revision and instance to VLAN mapping must match.

The System ID in MST uses the Instance ID instead of the VLAN ID to create the BID, used in BPDUs. MST sends a single BPDU containing information about all instances. In MST, a port sends BPDUs if it is Designated for at least one MST instance.

MST instance 0 is special and contains all VLANs by default, it is called the Internal Spanning Tree (IST). IST interacts with STP switches that are outside the region. The port role and state determined by the interaction of IST with a neighboring switch will be inherited by all VLANs on that port, not just the VLANs mapped to the IST. This behavior makes the region appear as a single switch to the outside of the region. If running multiple regions, each region can be seen as a single switch from the outside. The resulting network can still contain loops if there are multiple inter region links. MST blocks these loops by building a Common Spanning Tree (CST) running between the regions. CST is also used to interact with non MST switches. The tree built by CST will be used for all VLANs. The IST and CST is then merged together and called the Common and Internal Spanning Tree (CIST).

The CIST Root switch is elected based on the lowest BID from all switches that in any region. This switch will also become the root for the IST (instance 0) within the region, this is called the CIST Regional Root.

In regions that do not contain the CIST Root, only boundary switches are allowed to become the IST Root. A boundary switch is a switch that has a link (or several) to other MST regions. The IST Root is elected based on external root path cost, which is the cost of using the inter region links between MST regions. If there is a tie in cost, the lowest BID is used as a tiebreaker to elect the CIST Regional Root. Cost inside a region is not taken into account.

The CIST Regional Root switch will have its Root Port towards the CIST Root, this is called the master port and this port is used by all MST instances to reach the CIST Root.

The following pictures show the different concepts of MST, starting with a physical topology:

MST1

The IST runs within the region to block ports, to break up the physical loop. One switch will be the CIST root and one switch will be the CIST Regional root.

MST2

In reality, all these things tie in together and happen simultaneously but to solidify the understanding, we divide them into steps. The IST has run internally and blocked ports. This is what the CST looks like:

MST3

The CST runs between regions and/or non MST devices and makes sure there is no loop between regions or to non MST domains. If we combine the CST and the IST, we get the CIST which is the final topology:

MST4

Interopability Between MST and Other STP Versions

When communicating with IEEE STP or RSTP switch, the MST switch must share the role and state on the port towards the non MST switch for all VLANs. STP or RSTP can’t see into the MST region so it is treated as a single logical switch. The MST switch will speak by using the IST (instance 0) on boundary ports and format the BPDU to be STP or RSTP. The IST will also process inbound BPDUs from the non MST switch.

When communicating with PVST+ or RPVST+ region, things get a bit more complex. One STP instance is run for each VLAN and port role and state is individually calculated per VLAN. The IST will communicate with the non MST switch and must make sure that the information it sends to each PVST+/RPVST+ instance gets the same information to make a consistent choice. MST and PVST+ must arrive at the same port role and state for all instances even though a single MST instance and PVST+ instance directly interact with each other. This is also known as PVST Simulation mechanism.

The IST will replicate BPDUs for all active VLANs towards the PVST+ switch, meaning that the PVST+ switch will make a consistent choice for port role and state for all VLANs. The IST does this by formatting the BPDUs as PVST+ BPDUs.

In the opposite direction, the IST takes the BPDU from VLAN 1 as a representative for the entire PVST+ region and processes this in the IST. The boundary ports role and state will be binding for all active VLANs on that port. The MST switch must make certain that the result of the IST interaction with VLAN 1 STP instance is consistent with the state of STP instances run in other VLANs.

An MST boundary port will become a Designated Port if the BPDUs it sends out are superior to incoming VLAN 1 PVST+ BPDUs. The port will then be forwarding for all VLANs. To make sure that other PVST+ instances make a consistent decision, the MST switch must check that all incoming PVST+ BPDUs are inferior to its own outgoing BPDUs. If not, the PVST Simulation mechanism will fail.

The CIST Root can be located in the PVST+ region and the boundary port can have a port role of Root if the incoming VLAN 1 PVST+ BPDUs are not only superior to the MST switch but also better than any other VLAN 1 PVST+ BPDUs received on any other boundary port. Once again, to check the consistency of of port role, all Root bridges must be located in the PVST+ region and use the same boundary port to reach these switches. The PVST Simulation mechanism will check that incoming PVST+ BPDUs for VLANs other than VLAN 1 are identical or superior to the VLAN 1 PVST+ BPDUs.

An MST boundary port will become Non-Designated if it receives superior VLAN 1 PVST+ BPDUs compared to its own but not superior enought to make it a Root Port.

It is recommended to have the MST region appear as a Root switch to all PVST+ instances by lowering the IST root’s priority below the priorities of all PVST+ switches in all VLANs.

When an MST switch is communicating to a PVST+ or RPVST+ switch it will always revert back to PVST+. There is less state involved with PVST+ due to not having a Proposal/Agreement process which simplifies the interworking of MST and PVST+.

Portfast Ports

  • Transitions directly to Forwarding state, saving 2x ForwardDelay
  • Does not generate topology change events
  • Does not flush CAM due to topology change
  • DOES send BPDUs
  • Does not expect to receive BPDUs
  • Not influenced by the Sync step in Proposal/Agreement procedure(RSTP)

Portfast enabled ports may also be referred to as Edge ports. If a Portfast enabled port receives BPDUs it will lose its Portfast status until the port has gone up and down. RSTP uses Proposal/Agreement process and when going through Sync, it will put all Non-Edge Designated ports into a Discarding state. Unless enduser ports are configured as Edge ports they will be affected and lose connectivity briefly during the Sync process. Portfast is also important so that when a PC boots up and requests an IP address via DHCP, it gets one assigned before the process times out, waiting for the port to go into a Forwarding state. Portfast can be enabled per port or globally for all access ports.

  • BPDU Guard: Enabled per port of globally for all Portfast enabled ports, will error-disable the port upon receiving ANY BPDU
  • Root Guard: Only enabled per port, ignores any superior BPDUs received to prevent the port from becoming a Root Port. If a superior BPDU is received, the port is put into a root-inconsistent blocking state, cease forwarding and receiving data frames until the superior BPDUs cease

After BPDU-Guard has error-disabled a port, it must manually be recovered or by using error-disable recovery feature.

Root Guard will block the port if a superior BPDU comes in, this does not have to be the best BPDU, simply better than what the local switch is originating. Root-Guard will recover the port after the superior BPDU has expired which would be MaxAge – MessageAge or 3x Hello for STP and RSTP respectively.

BPDU Filter

  • If enabled on a port it will unconditionally stop sending and receiving BPDUs
  • If enabled globally for Edge ports, it will send 11 BPDUs after enabling the feature and then stop sending BPDUs. If a BPDU is received at any point in time, BPDU Filter is operationally disabled on the port and will revert to normal STP rules, sending and receiving BPDUs.

Protecting Against Unidirectional Link Issues

Several mechanism are available to protects against unidirectional links such as Loop Guard, UDLD, RSTP Dispute mechanism and Bridge Assurance.

UDLD

UDLD is a Cisco-proprietary layer 2 protocol that serves as an echo mechanism between a pair of devices. It sends UDLD messages advertising its identity and port identifier pair as well as a list of all neighboring switch/port pairs heard on the same segment. The following explicit conditions are used by UDLD to detect an unidirectional link:

  • UDLD messages arriving from a neighbor that do not contain the exact switch/port pair matching the receiving switch and its port in the list of detected neighbors. This would suggest that either the neighbor does not hear this switch at all (fiber cut) or that neighbor’s port sending these UDLD messages is different from the neighbor’s port receiving the UDLD messages. This could be the case if the TX fiber is plugged into a different port than the RX fiber.
  • If the incoming UDLD messages contain the same switch/port originator pair as the receiving switch, which would indicated that the port is self-looped.
  • A switch has detected only a single neighbor but the neighbor’s UDLD messages contain several switch/port pairs in the list of neighbors, this would indicated shared media and lack of visibility between all connected devices.

The above are explicit examples which will error-disable a port due to it being unidirectional. UDLD runs either in normal or aggressive mode. In normal mode, UDLD tries to reconnect with its neighbor(s) up to 8 times if there is a loss of incoming UDLD messages. Normal mode does not react to this implicit condition if not successfull, aggressive mode will error-disable the port if it stops receiving UDLD messages and the reconnect(s) fails. UDLD can be enabled globally or per port, globally enabling it will only enable UDLD on fiber ports.

Loop Guard prevents Root and Alternate ports from becoming Designated in the case of loss of incoming BPDUs. When the stored BPDU on a port expires, Loop Guard will put the port into a loop-inconsistent state. Loop-Guard can be configured clobally or per port.

Bridge Assurance is another mechanism that is available on select platforms and works with RPVST+ and MST on point-to-point links. A port will send BPDUs regardless of state if Bridge Assurance is enabled. If BPDUs are not received, the port will be put into a BA-inconsistent state. This protects from unidirectional links as well as malfunctioning switches that stop participating in RPSVT+/MST.

Finally the Dispute mechanism available in RPVST+/MST works by checking the incoming BPDU flags. If an inferior BPDU is received but the flags are Designated Learning or Forwarding, the local port will move into a Discarding state.

Port Channel

Interfaces can be bundled into a Port Channel which increases the available bandwidth by carrying multiple frames over multiple links. A hashing mechanism run over selected frames address fields will determine which physical link to send the frame over. The hashing is deterministic, meaning that frames of the same flow will travel the same physical link.

Load sharing can be based on MAC address, IP address or on some platforms even port numbers. A choice needs to be made depending on the type of flow, which load sharing mechanism will be most beneficial. Normally only one type of load sharing can be used for all flows on a switch. Normally load sharing will be more balanced if using a number of links divisible by 2. This varies by platform and the number of hash buckets.

To bring interfaces into a bundle, several parameters must match, such as speed, duplex, trunk/access, allowed VLANs, STP cost and so on.

It is recommended to run a dynamic protocol such as LACP to setup the bundle, this will prevent from failure modes where a switching loop is created where one side is unconditionally bundling links and the other side has not yet formed the bundle. Portchannels are treated as a single logical interface by STP and a single physical interface will be responsible for transmitting BPDUs for the bundle. Etherchannel misconfig guard can protect against failures where multiple BPDUs are incoming with different source MAC on ports in the bundle.

STP Scalability and vPC

MST offers greater scalability than RPVST+ due to sending only one BPDU and the decoupling of VLANs from instances. Normally two instances is enough with MST. With MST, VLANs can be created without affecting the STP instances. MST can also better support stretched layer 2 domains through the use of regions.

To achieve load balancing with MST, at least two STP instances need to be defined and different switches will be the root for each of these instances.

Recommendations for MST:

  • Define a region configuration to be copied to all the switches that are part of the Layer 2 topology
  • As part of the region configuration, define to which instances all the VLANs belong. Normally two instances would be enough
  • Define primary and secondary root switches for all the instances that you have defined, also for instance 0. Typically one switch would be the root for instance 0 and instance 1 and a redundant aggregation switch for instance 2
  • Preprovision all VLAN mapppings and topologies and later create VLANs as needed

Special Considerations for Spanning Tree with vPCs

Virtual Port Channel (vPC) is a technology used on Nexus switches where to switches act as if they were one by having the primary switch generate BPDUs, LACP messages and so on. The two switches use a link between them to synchronize state and to pass traffic over, this link is called the vPC peer link. Ports that are not configured for vPC behave as normal ports, meaning that BPDUs get generated by the local switch.

Some modifications have been done to STP to be used in combination with vPC, they are the following:

  • The peer link should never be blocking because it carries important traffic such as Cisco Fabric Services over Ethernet (CFSoE) Protocol. The peer link is always forwarding
  • On vPC ports, only the primary switch generates BPDUs. The secondary switch will relay incoming BPDUs to the primary switch

The following picture shows the behavior of Spanning Tree on Nexus switches:

VPC1

The operational primary switch sends BPDUs towards Access1 even though it is not the STP Root. BPDUs that come from Access1 are relayed by Agg2. On ports that are not member of a vPC, normal rules apply, meaning that both Agg switches will send BPDUs towards Access2.

It is recommended to align the operational primary role with the STP Root role. If the peer-link fails, the vPC ports on the secondary switch will be shutdown. To keep SVIs up for non vPC VLANs if the peer-link fails, use a backup link between the switches that is independent from the peer-link or the dual-active exclude command. If using an extra link, remove all the non vPC VLANs from the vPC peer-link.

MST and vPC Best Practices

  • Associate the root and secondary root role at the aggregation layer and match the vPC primary and secondary roles with the STP root role.
  • One MST instance is enough
  • Configure regions during the deployment phase
  • If changing the VLAN to instance mapping, change both the primary and secondary vPC to avoid global inconsistency
  • Use dual-active exclude command to not isolate non vPC VLANs when the peer-link is lost

If using RPVST+, use pathcost method so that lower speed interfaces do not get the same metric as higher speed interfaces. This should be the default for MST but may vary by platform.

Scaling Considerations

Scaling may be affected by the following parameters:

  • The number of PortChannels
  • The number of VLANs supported by the switch
  • Logical interface count
  • Oversubscription rate

A logical port is the sum of the number of physical ports times the number of VLANs on each port. When vPC is used, the secondary device passes BPDUs to the primary device which increases the scale of logical interfaces. A PortChannel is a logical interface so it counts as a single logical port regardless of the number of links it contains. To calculate the logical ports, multiply the number of vPCs times the number of VLANs on each vPC. For non vPC switches, the logical ports is the number of trunks * number of VLANs + number of access ports. For a switch with 10 trunks with 100 VLANs and 10 access ports that is 1010 logical ports.

Virtual ports is a line card limitation where a line card can support a maximum number of logical ports per line card. Virtual ports are calculated the same way but for a PortChannel, all physical interfaces count individually.

To reduce the number of logical ports, the following concepts are important:

  • Implement multiple aggregation modules
  • Perform manual pruning on trunks
  • Use MST instead of (R)PVST+
  • Distribute trunks and access ports across line cards
  • Remove unused VLANs going to Content Switching Modules (CSM) – The CSM automatically has all VLANs defined in the system configuration

This post describes key concepts of STP, different STP optimizations and which scaling factors are important in designing a layer 2 network.

Ethernet, STP, Topology change and the behaviour of Ethernet

June 24, 2014 2 comments

Introduction

This post is inspired by a post at IEOC about Uplinkfast and TCN which
can be found here.

Before we get to those parts, let’s recap how Ethernet and STP work together.

Spanning Tree

The Spanning Tree Algorithm builds a loop free tree by comparing Bridge ID(BID) and
least cost paths to the root bridge. By doing this it blocks all links not leading
to the root.

STP1

MAC Learning

Switches learn where to forward frames by looking at the source MAC address of the frame
on the port that the frame was received on. This learning is done in the data plane
as opposed to routing where the routes are learned in control plane. I will come back
to this later in the post.

MAC learn1

S4 learns that A is located on port 1 after A has sent a frame. This is stored in
the MAC address table located in Content Addressable Memory (CAM). The CAM is a
fast memory optimized for quick lookups in the table. By default there is a 300
second aging timeout for learned MAC addressesm, meaning that if the switch
does not see any traffic from a source MAC within five minutes the entry will
age out of the table. This is used to remove stale entries and to keep the
MAC address table from becoming too large.

Potential Issues

As I mentioned briefly earlier in the post, MAC learning is done in the data plane.
When we exchange routes through protocols such as OSPF, EIGRP and BGP, this is
done in the control plane. If there is a /24 route in the routing table pointing
at a router, then those up to 254 hosts are behind that router. With MAC learning
every source MAC has its own entry, which would be the same as if we had /32 routes
for every host in the network. Not very effecient! This can also become a scalibility
issue in large networks if there are more hosts than the CAM can hold.

There are also other issues such as not being able to use all the links in the
network. Spanning tree will block the redundant links so we don’t get more bandwidth
if we add more links unless we put them into an Etherchannel or use technologies
such as vPC. In datacenter designs, using STP will lead to low bisectional bandwidth,
meaning that even if there are lots of links between a section in the network, most of
them will actually be blocked.

Another issue is that broadcast and unknown unicast traffic is flooded in the network.
Imagine a scenario as below where A is sending unicast traffic to B and it’s
an unidirectional flow. B rarely sends any traffic so its entry has been aged out
of the MAC address table.

Unknown unicast

In this scenario the unknown unicast will be flooded to all the switches and
all servers will have to receive the 300 Mbit/s stream and then discard the
traffic until the switches have learned the MAC of B again!

There is also a potential for black holing of traffic. In the topology below there
are four switches connected together and the primary path is through S4-S1-S2-S3.

Linkfail1

Then the link between S1 and S2 fails.

Linkfail2

When using 802.1D, there is no synchronization of the topology. It will take up to
50 seconds for the link between S3 and S4 to come up unless Backbonefast has been
deployed. When traffic is going from A to B, it will be blackholed. S4 still has an
entry for B towards S1. When the traffic reaches S1 it has nowhere to go.
Without aging of stale entries, this would take up to five minutes. This is
the purpose of topology change in STP, to faster age out stale entries.

Topology Change

Like I described above, without a mechanism for topology change, traffic could
potentially be black holed for quite a while. In 802.1D, when a link goes up
or down, the switch will generate a TCN BPDU which is a special BPDU sent out
the root port. Normally switches only relay BPDUs from the root on their designated
ports but this is a special case. A switch that receives a TCN BPDU will reply
to it with a configuration BPDU with the TC Acknowledge bit set.

TCN1

The TCN BPDU will eventually reach the root which will then send out a configuration
BPDU with the TC bit set. This is done for a duration of MaxAge + FwDelay
seconds which is 20 + 15 seconds by default.

TCN2

When switches receive this BPDU from the root with the TC bit set, they will age out
entries in the CAM at a faster pace. The aging timeout will be set to 15 seconds.
This will age out any stale entries in the CAM. If there are active flows they will
not be aged out because the age will be reset as the switch sees frames coming in
with the source MAC in question. As I described earlier there could be unidirectional
flows leading to flooding. Also flows that are inactive for a while and then resume
can get flooded if their entries time out during the period that the root bridge is
sending out these configuration BPDUs with TC set.

Uplinkfast

Uplinkfast is a feature deployed on access switches which have dual links to
the distribution layer. Because the switches are located at the edge of the network
it is safe to bring up an alternate port immediately without going through the regular
listening and learning phase, saving up to 30 seconds.

After a switch has failed over to the alternate link it will start to send out
dummy multicast frames. This is to speed up convergence. Even if a configuration
BPDU with TC set is sent by the root, it can still take up to 15 seconds before
stale entries age out.

Uplinkfast

So based on the thread at IEOC, what is the consequence of Uplinkfast and TC together?
The configuration BPDU with TC is sent for 35 seconds by default. Dummy multicast frames
will be sent out for a duration that is unknown. It depends on how many entries there are
in the CAM and the rate that the packets are sent at. So depending on when the multicast
frame is sent and if you have an unidirectional flow or a host gone silent, then yes
the configuration BPDU with TC could be counter productive. Traffic would reach its
destination though but it would be through flooding of the traffic.

In reality I doubt this would be much of an issue and most networks would be running
RSTP today. RSTP works differently by synchronizing the topology and when the TC bit
is set in BPDUs the entire CAM is flushed on all ports except where the BPDU was
received.

RSTP synchronization – behind the scenes

August 8, 2013 20 comments

Intro

It is well known that RSTP uses synchronization to speed up convergence in
switched networks. Not many articles or books give the full picture how this
process really works. The synchronization process is often oversimplified
and readers are left with the IEEE standard if they want to understand all
of the details. This post will give you a better understanding of how the
RSTP synchronization really works.

Initial synchronization

In regular 802.1D when switches first boot up ports are brought online
the switch claims to be root because it has not yet heard any better BPDUs.
This is no different in RSTP or RPVST+ which is Ciscos implementation.
Take a look at the following topology.

RSTP-synch-1

The goal here is to make SW1 the root bridge. But until better BPDUs have
been heard all switches will claim root. That is how STP works, it stores
the best BPDU received in on a port. To emulate a network coming online
to begin we will have all ports shutdown and then try to bring them up
at the same time. Debugs and captures will be run to show how the synchronization
process works. The following debugs have been enabled.

SW1#sh debug
Spanning Tree:
  Spanning Tree event debugging is on
  Spanning Tree state sync support debugging is on

So we start at looking at the debugs from each switch in order.

SW1

setting bridge id (which=3) prio 16385 prio cfg 16384 sysid 1 (on) id 4001.aabb.cc00.0100
RSTP(1): initializing port Et0/0
RSTP(1): Et0/0 is now designated
%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/0, changed state to up
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): received an agreement on Et0/0
STP[1]: Generating TC trap for port Ethernet0/0

SW1 assumes its port is designated and sends out a proposal. SW2 will agree to this
proposal.

SW2

%LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
%LINK-3-UPDOWN: Interface Ethernet0/1, changed state to up
setting bridge id (which=3) prio 32769 prio cfg 32768 sysid 1 (on) id 8001.aabb.cc00.0200
RSTP(1): initializing port Et0/0
RSTP(1): Et0/0 is now designated
RSTP(1): initializing port Et0/1
RSTP(1): Et0/1 is now designated
%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/1, changed state to up
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): transmitting a proposal on Et0/1
%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/0, changed state to up
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): received an agreement on Et0/1
STP[1]: Generating TC trap for port Ethernet0/1
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): updt roles, received superior bpdu on Et0/0 
RSTP(1): Et0/0 is now root port
RSTP(1): syncing port Et0/1
RSTP(1): synced Et0/0
STP[1]: Generating TC trap for port Ethernet0/0
RSTP(1): transmitting an agreement on Et0/0 as a response to a proposal
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): received an agreement on Et0/1

SW2 initializes all ports as designated and starts sending out proposals. It
then receives a better BPDU from SW1 so it has to sync its downstream ports (Et0/1).

SW3

%LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
%LINK-3-UPDOWN: Interface Ethernet0/1, changed state to up
%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/0, changed state to up
%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/1, changed state to up
setting bridge id (which=3) prio 32769 prio cfg 32768 sysid 1 (on) id 8001.aabb.cc00.0300
RSTP(1): initializing port Et0/0
RSTP(1): Et0/0 is now designated
RSTP(1): initializing port Et0/1
RSTP(1): Et0/1 is now designated
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): updt roles, received superior bpdu on Et0/0 
RSTP(1): Et0/0 is now root port
RSTP(1): syncing port Et0/1
RSTP(1): synced Et0/0
STP[1]: Generating TC trap for port Ethernet0/0
RSTP(1): transmitting an agreement on Et0/0 as a response to a proposal
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): received an agreement on Et0/1
STP[1]: Generating TC trap for port Ethernet0/1
RSTP(1): updt roles, received superior bpdu on Et0/0 
RSTP(1): syncing port Et0/1
RSTP(1): synced Et0/0
RSTP(1): transmitting an agreement on Et0/0 as a response to a proposal
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): received an agreement on Et0/1

SW3 goes through the same process. Claims root at first, then hears a better BPDU
and must sync its downstream port.

SW4

%LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
setting bridge id (which=3) prio 32769 prio cfg 32768 sysid 1 (on) id 8001.aabb.cc00.0400
RSTP(1): initializing port Et0/0
RSTP(1): Et0/0 is now designated
%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/0, changed state to up
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): updt roles, received superior bpdu on Et0/0 
RSTP(1): Et0/0 is now root port
RSTP(1): synced Et0/0
STP[1]: Generating TC trap for port Ethernet0/0
RSTP(1): transmitting an agreement on Et0/0 as a response to a proposal
RSTP(1): updt roles, received superior bpdu on Et0/0 
RSTP(1): synced Et0/0
RSTP(1): transmitting an agreement on Et0/0 as a response to a proposal

SW4 also claims root, hears a better BPDU. It has not downstream ports to
synchronize so the process ends there.

To visualize the process this is what happens at time 0.

RSTP-synch-2

This can be seen in the BPDUs as well. This is the BPDU that SW1 sends out.

BPDU-SW1-1

The BPDU is a proposal and the designated bit is set. It’s not yet trying to learn or
forward on the port.

SW2 sends out the following BPDU.

BPDU-SW2-1

When SW1 has received agreement BPDU it can start forwarding on its designated port.
SW2 can forward on its root port as soon as it decides that it is root. When SW2
has learned better root information it must synchronize downstream ports so the port
to SW3 is still blocking.

RSTP-synch-3

This is the agreement BPDU that SW2 sends to SW1.

BPDU-SW2-2

The learning and forwarding bits are set and the role is root. The agreement
bit is also set as well as TC so that MAC address tables can be updated. The
TC bit is set for 2x the hello time and is called TcWhile.

The next segment to be synchronized is the one between SW2 and SW3.
At first SW3 claims to be root.

BPDU-SW3-1

Then SW2 sends out a better BPDU.

BPDU-SW2-3

SW2 sends a BPDU with TC set because for a brief period of time SW2 was believed to
be root before SW2 heard a better BPDU from SW1. Then SW3 sends agreement BPDU.

BPDU-SW3-2

After SW2 has received the agreement BPDU it can bring its downstream port (Et0/1)
to forwarding making the topology look like this.

RSTP-synch-4

Finally the segment between SW3 and SW4 is synchronized. SW3 sends out the BPDU
and then SW4 agrees to it. TC is set because for a brief period SW3 as believed
to be root.

BPDU-SW3-3

BPDU-SW4-1

The final topology is then that all links are forwarding because we have no physical
loop in this topology.

RSTP-synch-5

Receiving better root information

So far we had no physical loop in the topology. This is not a very realistic
scenario and to see how RSTP works when receiving better root information we
will add a link between SW1 and SW4 meaning that SW4 has a direct path to the
root like this.

RSTP-synch-6

Before we look at what happens when bringing up the port between SW1 and SW4
let us assign port roles to all the ports on the drawing. This is good practice
to understand how STP works and you should be able to do this manually if you
fully understand STP. We are expecting the topology to converge like this.

RSTP-synch-7

After SW4 receives better root information, which ports do we need to synchronize
to converge the topology? SW1 does not receive better information, it is the root.
SW4 has a designated port towards SW3 so it needs to synchronize that segment.
SW3 has no designated ports so we except the synchronization process to stop
there. Lets look at debugs and I’ll do a play by play with the drawings.

SW1

%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/1, changed state to up
RSTP(1): initializing port Et0/1
RSTP(1): Et0/1 is now designated
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): received an agreement on Et0/1
STP[1]: Generating TC trap for port Ethernet0/1

SW1 initializes the port and waits for agreement BPDU from SW4 before it can
bring the port into forwarding.

SW4

%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/1, changed state to up
RSTP(1): initializing port Et0/1
RSTP(1): Et0/1 is now designated
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): updt roles, received superior bpdu on Et0/1 
RSTP(1): Et0/1 is now root port
RSTP(1): Et0/0 blocked by re-root
RSTP(1): synced Et0/1
RSTP(1): Et0/0 is now designated
STP[1]: Generating TC trap for port Ethernet0/1
RSTP(1): transmitting an agreement on Et0/1 as a response to a proposal
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): received an agreement on Et0/0
STP[1]: Generating TC trap for port Ethernet0/0

SW4 initializes the port but then receives a better BPDU. Et0/0 is then blocked by
reroot because Et0/1 is now the root port. Et0/0 must be synchronized because it
is now a designated (downstream) port. SW3 then sends an agreement. So looking at
the topology this is what has happened so far.

RSTP-synch-8

Then SW3 sends agreement so that SW4 can bring Et0/0 into forwarding.

RSTP-synch-9

So now the question is, what happens at SW3?

RSTP(1): updt roles, received superior bpdu on Et0/1
RSTP(1): Et0/1 is now alternate

SW3 did not receive any better root information and it has no designated ports.
This means that the synchronization process can stop. Making the final topology
look like this.

RSTP-synch-10

What happens when synchronization fails?

RSTP synchronization is dependant on that all links in the topology are
point to point. This is decided by if the link is running in full duplex or not.
It is possible to force a link to point to point but if you are running your
interfaces in half duplex STP is not your biggest problem!

In the case that the proposal and agreement process fails RSTP has to fall
back on relying on the old timers used in regular STP. There is a timer called
FdWhile which is the same as the forward delay, defaulting to 15s. After trying to
send proposals for 15s it will start to bring the port through discarding, learning
and then to forwarding. I simulated this scenario below by blocking BPDUs between
SW3 and SW4.

20:02:23.338: RSTP(1): Et0/1 is now root port
20:02:23.338: RSTP(1): Et0/0 blocked by re-root
20:02:23.338: RSTP(1): Et0/0 is now designated
20:02:23.338: STP[1]: Generating TC trap for port Ethernet0/1
20:02:23.339: RSTP(1): transmitting a proposal on Et0/0
20:02:23.509: RSTP(1): transmitting a proposal on Et0/0
20:02:25.509: RSTP(1): transmitting a proposal on Et0/0
20:02:27.509: RSTP(1): transmitting a proposal on Et0/0
20:02:29.517: RSTP(1): transmitting a proposal on Et0/0
20:02:31.517: RSTP(1): transmitting a proposal on Et0/0
20:02:33.517: RSTP(1): transmitting a proposal on Et0/0
20:02:35.517: RSTP(1): transmitting a proposal on Et0/0
20:02:37.521: RSTP(1): transmitting a proposal on Et0/0
20:02:38.338: RSTP(1): Et0/0 fdwhile Expired

Every 2 seconds it tries to send a proposal but gets no agreement back.
After 15 seconds the timer expires and RSTP has to go through the regular
phases instead of immediately bringing the port online.

Conclusion

RSTP is a rapid protocol that works by synchronizing the topology. This process
is often overlooked in books on switching and spanning tree. This post describes
in detail how the synchronization process actually works. RSTP is a distance vector
protocol since the cost is learned by listening to BPDUs from other switches. In
some cases this can lead to issues like counting to infinity. For detail on this
refer to INE STP convergence PDF by Petr Lapukhov.
RSTP converges fast as long as the synchronization process works. This process relies
on all links running in full duplex and all switches are running in the same STP mode.

Busting myths – Spanning tree portfast on the interface

August 4, 2013 6 comments

I would like to bust a common myth. One that I’m probably guilty of preaching
before as well. We all know that when portfast is enabled the port will bypass
the listening and learning phase of STP. Portfast can be enabled globally for
all access ports or be configured directly under the interface.

The myth is as follows. When enabling portfast globally the port will lose its
portfast status when receiving BPDUs and go through the regular STP phases
but if BPDUs are received on interface where portfast was enabled specifically
it will continue to use portfast and bridging loops may appear.

Part of the myth is also that portfast enabled ports do not send BPDUs.

So we start out by busting the second part of the myth. To do this I will
simply enable portfast on a port and we will see the number of BPDUs
incrementing.

SW1#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
SW1(config)#int fa0/1
SW1(config-if)#span portfast
%Warning: portfast should only be enabled on ports connected to a single
 host. Connecting hubs, concentrators, switches, bridges, etc... to this
 interface  when portfast is enabled, can cause temporary bridging loops.
 Use with CAUTION

%Portfast has been configured on FastEthernet0/1 but will only
 have effect when the interface is in a non-trunking mode.
SW1(config-if)#^Z
SW1#sh span int fa0/1 portfast
VLAN0001            enabled

Portfast is definitely enabled. Are any BPDUs going out?

SW1#sh span int fa0/1 det | i BPDU
   BPDU: sent 43, received 0
SW1#sh span int fa0/1 det | i BPDU
   BPDU: sent 46, received 0

So clearly that was easy to bust. BPDUs are still sent on portfast enabled ports.
Anyone claiming STP is disabled and that portfast enabled ports don’t send BPDUs
is clearly wrong. All it does is to bypass the listening and learning phase.

So what happens when BPDUs are received? I will enable IRB on a router to
start sending BPDUs. I will set the priority to something better than the
switch so that the router side will become designated.

R1#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R1(config)#bridge irb
R1(config)#bridge 1 protocol ieee
R1(config)#bridge 1 priority 8192
R1(config)#int fa0/0
R1(config-if)#bridge-group 1

R1 is now running STP:

R1#sh span

 Bridge group 1 is executing the ieee compatible Spanning Tree protocol
  Bridge Identifier has priority 8192, address 000d.bc01.3400
  Configured hello time 2, max age 20, forward delay 15
  We are the root of the spanning tree
  Topology change flag set, detected flag set
  Number of topology changes 1 last change occurred 00:00:13 ago
          from FastEthernet0/0
  Times:  hold 1, topology change 35, notification 2
          hello 2, max age 20, forward delay 15 
  Timers: hello 0, topology change 24, notification 0, aging 15

 Port 4 (FastEthernet0/0) of Bridge group 1 is listening
   Port path cost 19, Port priority 128, Port Identifier 128.4.
   Designated root has priority 8192, address 000d.bc01.3400
   Designated bridge has priority 8192, address 000d.bc01.3400
   Designated port id is 128.4, designated path cost 0
   Timers: message age 0, forward delay 0, hold 0
   Number of transitions to forwarding state: 0
   BPDU: sent 3, received 3

So now we should be receiving BPDUs on SW1.

SW1#sh span int fa0/1 det | i BPDU
   BPDU: sent 221, received 41
SW1#sh span int fa0/1 det | i BPDU
   BPDU: sent 221, received 43

And we are. So what happened to portfast now that we are receiving BPDUs?

SW1#sh span int fa0/1 portfast
VLAN0001            disabled

It’s disabled! Myth busted! Even if portfast is enabled under the interface it
will still lose its portfast status if BPDUs are received. If we debug we
can see that the port did not have to go back through blocking -> listening
-> learning. It stayed in forwarding even though BPDUs were received.

STP CFG: found port cfg FastEthernet0/1 (2E9E1E0)
STP: VLAN0001 heard root  8192-000d.bc01.3400 on Fa0/1
    supersedes 32769-000c.3058.7c80
STP: VLAN0001 new root is 8192, 000d.bc01.3400 on port Fa0/1, cost 19
STP: VLAN0001 Fa0/16 -> listening
STP: VLAN0001 Fa0/17 -> listening
STP: VLAN0001 Fa0/18 -> listening

So bridging loops can form because the listening and learning phase is bypassed
if we enable portfast on Inter Switch Links. They would only be temporary though
because as I showed the ports would lose their portfast status when BPDUs are
received.

Assume that we have a topology like this:

Portfast

Portfast has been enabled on all Inter Switch Links. Which ports keep their portfast
status and which would not? Give me suggestions in the comments section.

I’m doing two lectures at Cisco Learning Network on spanning tree

July 15, 2013 8 comments

Hi guys,

I am going to be doing two lectures on spanning tree over at the Cisco Learning Network.
The first one will be aimed for CCNA level plus and the second one will be aimed for
CCNP level plus.

Please post in comments if you are attending and if you have something that you want
me to bring up during the lecture. The session will be ending with a Q&A session.

STP convergence – MST

May 8, 2013 4 comments

In the comments I received a wish to compare RPVST+ with MST.
RPVST+ is Ciscos proprietary STP running one instance per VLAN over
802.1Q trunks. MST is an industry standard which can run multiple
instances but not one per VLAN. MST does run RSTP as underlying
protocol so in theory there should be no difference at all. Let’s
give it a try. The topology is very similar to last time but a couple
of extra routers are involved. We’ll get back to these later. This is
the topology:

STP-convergence-MST

These are the current port roles:

STP-port-roles-MST

I just have put some basic MST configuration and NTP on the switches.

SW3(config)#ntp server 13.13.13.1
SW3(config)#span mode mst
SW3(config)#span mst 0 prio 16384
SW3(config)#span mst 1 prio 16384
SW3(config)#span mst conf
SW3(config-mst)#name TST       
SW3(config-mst)#revision 1

Verify initial reachability between the routers.

R1#ping 13.13.13.3

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 13.13.13.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/1 ms

R2#ping 25.25.25.5

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 25.25.25.5, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/4 ms

Now let’s shutdown Gi0/21 on SW3 which is leading to SW2 root port.
Debug spanning-tree events will show the sequence of events.

May  7 20:32:18.975: MST[0]: Fa0/21 state change forwarding -> disabled
May  7 20:32:18.975: MST[0]: updt roles, root port Fa0/21 going down
May  7 20:32:18.975: MST[0]: Fa0/23 is now root port
May  7 20:32:18.975: MST[0]: Fa0/21 state change disabled -> blocking
May  7 20:32:18.975: MST[0]: Fa0/23 state change blocking -> forwarding
May  7 20:32:18.979: MST[0]: sending proposal on Fa0/3
May  7 20:32:18.983: MST[0]: sending proposal on Fa0/5

The switchover is immediate as expected. Now let’s try to simulate passive
error by implementing BPDU filter.

SW3(config-if)#span bpdufilter enable
SW3(config-if)#do sh clock
20:36:14.354 UTC Tue May 7 2013

This is from SW2:

May  7 20:36:20.008: MST[0]: updt roles, information on root port Fa0/21 expired
May  7 20:36:20.008: MST[0]: Fa0/23 is now root port
May  7 20:36:20.008: MST[0]: Fa0/21 state change forwarding -> blocking
May  7 20:36:20.008: MST[0]: Fa0/3 state change forwarding -> blocking
May  7 20:36:20.008: MST[0]: Fa0/5 state change forwarding -> blocking
May  7 20:36:20.008: MST[0]: Fa0/23 state change blocking -> forwarding
May  7 20:36:20.008: MST[0]: Fa0/21 is now designated
May  7 20:36:20.012: MST[0]: sending proposal on Fa0/21
May  7 20:36:20.012: MST[0]: sending proposal on Fa0/3
May  7 20:36:20.012: MST[0]: sending proposal on Fa0/5

So it took roughly 6 seconds which was expected. Because MST runs
RSTP the results are exactly the same. The only thing that’s really different
with MST is that all BPDUs are piggybacked in the CIST (instance 0). If you have
VLANs mapped to instance 0 and there is a change then the other ISTs may have
to recalculate as well.

So using MST is no different than using RPVST+ from a convergence standpoint.
In future posts I will look at running a mix of RPVST+ and MST and see how
they interconnect.

Spanning tree convergence

May 7, 2013 10 comments

Someone asked the other day how fast STP converges depending on PVST+ or
RPVST+ or MST is running. Usually the answer for PVST+ is 30-50 seconds
and for RPVST+ it’s fast, maybe less than a second. I thought I would
explore on this and check difference between PVST+ and RPVST+ and also
using PVST+ with features like uplinkfast.

This post assumes you already have a good basic understanding of STP. This
is not an introductory post on STP.

This is the topology being used:

STP-convergence

SW1 is the root and ports towards the routers have been configured with VLAN 23
and portfast. I will run NTP to have the clocks properly synchronized. Currently
the port roles look like this:

STP-port-roles

I will configure the routers in 23.23.23.0/24 subnet and do a ping to verify connectivity.

R2#ping 23.23.23.3

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 23.23.23.3, timeout is 2 seconds:
.!!!!
Success rate is 80 percent (4/5), round-trip min/avg/max = 1/3/4 ms

Working fine so far. Now let’s take a look at some different failure scenarios.
We turn on logging to a buffer to not flood the console. We will be looking at
spanning tree events.

SW1(config)#logging con 6
SW1(config)#logging buff 7
SW1(config)#logging buff 32768
SW1(config)#do debug spanning-tree events
Spanning Tree event debugging is on

What happens when the root port is shutdown? In theory when the carrier detects
that the link is down it should look at alternate BPDU and start to take that
port through the different port states. This should take around 30 seconds.

This is output from SW2.

May  7 10:27:03.314: STP: VLAN0023 new root port Fa0/16, cost 38
May  7 10:27:18.321: STP: VLAN0023 Fa0/16 -> learning
May  7 10:27:33.329: STP: VLAN0023 sent Topology Change Notice on Fa0/16
May  7 10:27:33.329: STP: VLAN0023 Fa0/16 -> forwarding

The timing is almost perfect. The port goes through listening and learning
at 15 seconds each before it goes to forwarding almost exactly 30 seconds after
the port was shutdown.

What happens when there is an indirect failure? The switch has to expire the root BPDU
before it believes other BPDUs with worse cost. This should take around 20 seconds. By
default Maxage will be set to 20 seconds.

SW1#sh span | i Age
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec
SW2#sh span int f0/13 det | i age
   Timers: message age 1, forward delay 0, hold 0

We will this time simulate a passive error by configuring BPDU filter on SW1 towards
SW2.

SW1(config-if)#span bpdufilter enable   
SW1(config-if)#do sh clock
10:39:05.598 UTC Tue May 7 2013

This has created a bridging loop but in this case we just want to see how long it
takes before the alternate port is coming up as root.

May  7 10:39:24.046: STP: VLAN0023 new root port Fa0/16, cost 38
May  7 10:39:24.046: STP: VLAN0023 Fa0/16 -> listening
May  7 10:39:39.053: STP: VLAN0023 Fa0/16 -> learning
May  7 10:39:54.061: STP: VLAN0023 sent Topology Change Notice on Fa0/16
May  7 10:39:54.061: STP: VLAN0023 Fa0/16 -> forwarding

So it took almost 20 seconds for the BPDU to expire. Then the port goes through
the ordinary state changes. Roughly 48.5 seconds after the filter was applied
the port went into forwarding. For passive failures when running PVST+ the
maximum recovery time should be 50 seconds.

Now let’s look at PVST+ with Uplinkfast configured. The theory is that when a
root port fails the Alternate port should be bypass listening and learning
states and go direct to forwarding. Let’s try this out.

SW2(config)#spanning-tree uplinkfast
May  7 10:46:37.260: STP: VLAN0023 new root port Fa0/16, cost 3038
May  7 10:46:38.249: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/13, changed state to down
May  7 10:46:39.264: %LINK-3-UPDOWN: Interface FastEthernet0/13, changed state to down
May  7 10:46:39.264: STP: VLAN0023 sent Topology Change Notice on Fa0/16

It took only 2 seconds from realizing the port was down to putting the alternate
port into forwarding. For PVST+ this is a great enhancement. What if there is
a passive error?

SW1(config-if)#span bpdufilter enable
SW1(config-if)#do sh clock
10:51:11.870 UTC Tue May 7 2013
May  7 10:51:30.216: STP: VLAN0023 new root port Fa0/16, cost 3038
May  7 10:51:30.216: STP: VLAN0023 sent Topology Change Notice on Fa0/16

There is nothing to be done about the Maxage expiring but the port is
brought up after that. So instead of 50 seconds it takes about 20 seconds.

That’s it for PVST+. Now let’s move on to RPVST+. RPVST+ works by synchronizing
the topology and it has optimizations builtin. If a port fails then it should
converge almost instantly.

May  7 10:56:34.421: RSTP(1): updt roles, root port Fa0/13 going down
May  7 10:56:34.421: RSTP(1): Fa0/16 is now root port
May  7 10:56:34.421: RSTP(1): syncing port Fa0/4
May  7 10:56:34.421: RSTP(1): syncing port Fa0/6
May  7 10:56:34.421: RSTP(1): syncing port Fa0/24
May  7 10:56:34.421: RSTP(23): updt roles, root port Fa0/13 going down
May  7 10:56:34.421: RSTP(23): Fa0/16 is now root port
May  7 10:56:34.438: RSTP(1): transmitting a proposal on Fa0/4
May  7 10:56:34.438: RSTP(1): transmitting a proposal on Fa0/6
May  7 10:56:34.438: RSTP(1): transmitting a proposal on Fa0/24
May  7 10:56:35.419: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/13, changed state to down
May  7 10:56:35.578: RSTP(1): transmitting a proposal on Fa0/4
May  7 10:56:35.578: RSTP(1): transmitting a proposal on Fa0/6
May  7 10:56:35.578: RSTP(1): transmitting a proposal on Fa0/24
May  7 10:56:36.434: %LINK-3-UPDOWN: Interface FastEthernet0/13, changed state to down

It instantly failovers to the Alternate port and then starts synchronizing
the topology by sending out proposals. What if there was a passive failure?
In theory after RPVST+ misses 3 BPDUs it should realize that it needs to
start using the alternate path. Let’s try it out.

SW1(config-if)#span bpdufilter enable
SW1(config-if)#do sh clock
11:01:12.960 UTC Tue May 7 2013
May  7 11:01:16.648: RSTP(1): Fa0/13 rcvd info expired
May  7 11:01:16.648: RSTP(1): updt roles, information on root port Fa0/13 expired
May  7 11:01:16.648: RSTP(1): Fa0/16 is now root port
May  7 11:01:16.648: RSTP(1): Fa0/13 blocked by re-root
May  7 11:01:16.648: RSTP(1): syncing port Fa0/4
May  7 11:01:16.648: RSTP(1): syncing port Fa0/6
May  7 11:01:16.648: RSTP(1): syncing port Fa0/24
May  7 11:01:16.648: RSTP(1): Fa0/13 is now designated
May  7 11:01:16.648: RSTP(23): Fa0/13 rcvd info expired
May  7 11:01:16.648: RSTP(23): updt roles, information on root port Fa0/13 expired
May  7 11:01:16.648: RSTP(23): Fa0/16 is now root port
May  7 11:01:16.648: RSTP(23): Fa0/13 blocked by re-root
May  7 11:01:16.648: RSTP(23): Fa0/13 is now designated

Already around 4 seconds later the topology has converged. It should take
maximum 6 seconds depending on when the last BPDU was received before the
failure.

As you can see it’s very important to detect carrier down. If you do detect it
and are running RPVST+ then convergence is almost immediate. So when designing your
network try to avoid use fiber converts and such that won’t shut down the RJ45 side
if the optical goes down. Designing for convergence is just not about protocols, you
also need to consider the physical infrastructure.

I hope this post has given you a good insight to the convergence of STP.