Archive

Archive for the ‘Switching’ Category

Ethernet, STP, Topology change and the behaviour of Ethernet

June 24, 2014 2 comments

Introduction

This post is inspired by a post at IEOC about Uplinkfast and TCN which
can be found here.

Before we get to those parts, let’s recap how Ethernet and STP work together.

Spanning Tree

The Spanning Tree Algorithm builds a loop free tree by comparing Bridge ID(BID) and
least cost paths to the root bridge. By doing this it blocks all links not leading
to the root.

STP1

MAC Learning

Switches learn where to forward frames by looking at the source MAC address of the frame
on the port that the frame was received on. This learning is done in the data plane
as opposed to routing where the routes are learned in control plane. I will come back
to this later in the post.

MAC learn1

S4 learns that A is located on port 1 after A has sent a frame. This is stored in
the MAC address table located in Content Addressable Memory (CAM). The CAM is a
fast memory optimized for quick lookups in the table. By default there is a 300
second aging timeout for learned MAC addressesm, meaning that if the switch
does not see any traffic from a source MAC within five minutes the entry will
age out of the table. This is used to remove stale entries and to keep the
MAC address table from becoming too large.

Potential Issues

As I mentioned briefly earlier in the post, MAC learning is done in the data plane.
When we exchange routes through protocols such as OSPF, EIGRP and BGP, this is
done in the control plane. If there is a /24 route in the routing table pointing
at a router, then those up to 254 hosts are behind that router. With MAC learning
every source MAC has its own entry, which would be the same as if we had /32 routes
for every host in the network. Not very effecient! This can also become a scalibility
issue in large networks if there are more hosts than the CAM can hold.

There are also other issues such as not being able to use all the links in the
network. Spanning tree will block the redundant links so we don’t get more bandwidth
if we add more links unless we put them into an Etherchannel or use technologies
such as vPC. In datacenter designs, using STP will lead to low bisectional bandwidth,
meaning that even if there are lots of links between a section in the network, most of
them will actually be blocked.

Another issue is that broadcast and unknown unicast traffic is flooded in the network.
Imagine a scenario as below where A is sending unicast traffic to B and it’s
an unidirectional flow. B rarely sends any traffic so its entry has been aged out
of the MAC address table.

Unknown unicast

In this scenario the unknown unicast will be flooded to all the switches and
all servers will have to receive the 300 Mbit/s stream and then discard the
traffic until the switches have learned the MAC of B again!

There is also a potential for black holing of traffic. In the topology below there
are four switches connected together and the primary path is through S4-S1-S2-S3.

Linkfail1

Then the link between S1 and S2 fails.

Linkfail2

When using 802.1D, there is no synchronization of the topology. It will take up to
50 seconds for the link between S3 and S4 to come up unless Backbonefast has been
deployed. When traffic is going from A to B, it will be blackholed. S4 still has an
entry for B towards S1. When the traffic reaches S1 it has nowhere to go.
Without aging of stale entries, this would take up to five minutes. This is
the purpose of topology change in STP, to faster age out stale entries.

Topology Change

Like I described above, without a mechanism for topology change, traffic could
potentially be black holed for quite a while. In 802.1D, when a link goes up
or down, the switch will generate a TCN BPDU which is a special BPDU sent out
the root port. Normally switches only relay BPDUs from the root on their designated
ports but this is a special case. A switch that receives a TCN BPDU will reply
to it with a configuration BPDU with the TC Acknowledge bit set.

TCN1

The TCN BPDU will eventually reach the root which will then send out a configuration
BPDU with the TC bit set. This is done for a duration of MaxAge + FwDelay
seconds which is 20 + 15 seconds by default.

TCN2

When switches receive this BPDU from the root with the TC bit set, they will age out
entries in the CAM at a faster pace. The aging timeout will be set to 15 seconds.
This will age out any stale entries in the CAM. If there are active flows they will
not be aged out because the age will be reset as the switch sees frames coming in
with the source MAC in question. As I described earlier there could be unidirectional
flows leading to flooding. Also flows that are inactive for a while and then resume
can get flooded if their entries time out during the period that the root bridge is
sending out these configuration BPDUs with TC set.

Uplinkfast

Uplinkfast is a feature deployed on access switches which have dual links to
the distribution layer. Because the switches are located at the edge of the network
it is safe to bring up an alternate port immediately without going through the regular
listening and learning phase, saving up to 30 seconds.

After a switch has failed over to the alternate link it will start to send out
dummy multicast frames. This is to speed up convergence. Even if a configuration
BPDU with TC set is sent by the root, it can still take up to 15 seconds before
stale entries age out.

Uplinkfast

So based on the thread at IEOC, what is the consequence of Uplinkfast and TC together?
The configuration BPDU with TC is sent for 35 seconds by default. Dummy multicast frames
will be sent out for a duration that is unknown. It depends on how many entries there are
in the CAM and the rate that the packets are sent at. So depending on when the multicast
frame is sent and if you have an unidirectional flow or a host gone silent, then yes
the configuration BPDU with TC could be counter productive. Traffic would reach its
destination though but it would be through flooding of the traffic.

In reality I doubt this would be much of an issue and most networks would be running
RSTP today. RSTP works differently by synchronizing the topology and when the TC bit
is set in BPDUs the entire CAM is flushed on all ports except where the BPDU was
received.

Advertisements

Busting myths – PAgP desirable runs in silent mode by default

August 14, 2013 5 comments

Time to bust another myth! Supposedly PAgP runs in silent mode by default
in both desirable and auto mode. So what does silent mode do?

“Use the silent mode when the switch is connected to a device that is not PAgP-capable
and seldom, if ever, sends packets. An example of a silent partner is a file server
or a packet analyzer that is not generating traffic. In this case, running PAgP on a
physical port connected to a silent partner prevents that switch port from ever becoming
operational. However, the silent setting allows PAgP to operate, to attach the port to a
channel group, and to use the port for transmission.”

So now for the myth itself. The first quote is from David Hucabys CCNP SWITCH 642-813
Official Certification Guide.

“By default, PAgP operates in silent submode with the desirable and auto modes, and
allows ports to be added to an EtherChannel even if the other end of the link is silent
and never transmits PAgP packets. This might seem to go against the idea of PAgP,
in which two endpoints are supposed to negotiate a channel. After all, how can two
switches negotiate anything if no PAgP packets are received?”

And then from the Cisco configuration guide for 3560 12.12(58)SE.

“If your switch is connected to a partner that is PAgP-capable, you can configure the
switch port for nonsilent operation by using the non-silent keyword. If you do not specify
non-silent with the auto or desirable mode, silent mode is assumed.”

So even Cisco themselves claim that both modes operate in silent mode. Surely Cisco
can’t be wrong?! Doesn’t it seem strange to operate in silent mode by default? The
most common use must be to connect to other switches?

To test this we setup two switches with two trunks between themselves. One side is
set to auto and one side is set to desirable. Then debug pagp all is run to check
what mode they are running in.

SW1(config)#int range f0/13 - 14
SW1(config-if-range)#sh
SW1(config-if-range)#channel-group 1 mode des
SW2(config)#int range f0/13 - 14
SW2(config-if-range)#sh
SW2(config-if-range)#channel-group 1 mode auto
PAgP: Fa0/13 enabling PAgP with mode desirable-nonsl
PAgP: set hello interval from 0 to 1000 for port Fa0/13 
PAgP: Fa0/14 enabling PAgP with mode desirable-nonsl
PAgP: set hello interval from 0 to 1000 for port Fa0/14 
PAgP: Fa0/13 enabling PAgP with mode auto-sl
PAgP: set hello interval from 0 to 1000 for port Fa0/13
PAgP: Fa0/14 enabling PAgP with mode auto-sl
PAgP: set hello interval from 0 to 1000 for port Fa0/14

Myth busted! Desirable runs in non silent mode but auto runs in silent mode.
So this myth exist book in official certification books and in Ciscos documents
which is bad. I’ll look into getting it update if I can. The point of forming
an etherchannel is to negotiate with the other side before forming it to make
sure that the links are not unidirectional and that they agree on all parameters.

RSTP synchronization – behind the scenes

August 8, 2013 20 comments

Intro

It is well known that RSTP uses synchronization to speed up convergence in
switched networks. Not many articles or books give the full picture how this
process really works. The synchronization process is often oversimplified
and readers are left with the IEEE standard if they want to understand all
of the details. This post will give you a better understanding of how the
RSTP synchronization really works.

Initial synchronization

In regular 802.1D when switches first boot up ports are brought online
the switch claims to be root because it has not yet heard any better BPDUs.
This is no different in RSTP or RPVST+ which is Ciscos implementation.
Take a look at the following topology.

RSTP-synch-1

The goal here is to make SW1 the root bridge. But until better BPDUs have
been heard all switches will claim root. That is how STP works, it stores
the best BPDU received in on a port. To emulate a network coming online
to begin we will have all ports shutdown and then try to bring them up
at the same time. Debugs and captures will be run to show how the synchronization
process works. The following debugs have been enabled.

SW1#sh debug
Spanning Tree:
  Spanning Tree event debugging is on
  Spanning Tree state sync support debugging is on

So we start at looking at the debugs from each switch in order.

SW1

setting bridge id (which=3) prio 16385 prio cfg 16384 sysid 1 (on) id 4001.aabb.cc00.0100
RSTP(1): initializing port Et0/0
RSTP(1): Et0/0 is now designated
%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/0, changed state to up
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): received an agreement on Et0/0
STP[1]: Generating TC trap for port Ethernet0/0

SW1 assumes its port is designated and sends out a proposal. SW2 will agree to this
proposal.

SW2

%LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
%LINK-3-UPDOWN: Interface Ethernet0/1, changed state to up
setting bridge id (which=3) prio 32769 prio cfg 32768 sysid 1 (on) id 8001.aabb.cc00.0200
RSTP(1): initializing port Et0/0
RSTP(1): Et0/0 is now designated
RSTP(1): initializing port Et0/1
RSTP(1): Et0/1 is now designated
%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/1, changed state to up
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): transmitting a proposal on Et0/1
%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/0, changed state to up
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): received an agreement on Et0/1
STP[1]: Generating TC trap for port Ethernet0/1
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): updt roles, received superior bpdu on Et0/0 
RSTP(1): Et0/0 is now root port
RSTP(1): syncing port Et0/1
RSTP(1): synced Et0/0
STP[1]: Generating TC trap for port Ethernet0/0
RSTP(1): transmitting an agreement on Et0/0 as a response to a proposal
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): received an agreement on Et0/1

SW2 initializes all ports as designated and starts sending out proposals. It
then receives a better BPDU from SW1 so it has to sync its downstream ports (Et0/1).

SW3

%LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
%LINK-3-UPDOWN: Interface Ethernet0/1, changed state to up
%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/0, changed state to up
%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/1, changed state to up
setting bridge id (which=3) prio 32769 prio cfg 32768 sysid 1 (on) id 8001.aabb.cc00.0300
RSTP(1): initializing port Et0/0
RSTP(1): Et0/0 is now designated
RSTP(1): initializing port Et0/1
RSTP(1): Et0/1 is now designated
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): updt roles, received superior bpdu on Et0/0 
RSTP(1): Et0/0 is now root port
RSTP(1): syncing port Et0/1
RSTP(1): synced Et0/0
STP[1]: Generating TC trap for port Ethernet0/0
RSTP(1): transmitting an agreement on Et0/0 as a response to a proposal
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): received an agreement on Et0/1
STP[1]: Generating TC trap for port Ethernet0/1
RSTP(1): updt roles, received superior bpdu on Et0/0 
RSTP(1): syncing port Et0/1
RSTP(1): synced Et0/0
RSTP(1): transmitting an agreement on Et0/0 as a response to a proposal
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): received an agreement on Et0/1

SW3 goes through the same process. Claims root at first, then hears a better BPDU
and must sync its downstream port.

SW4

%LINK-3-UPDOWN: Interface Ethernet0/0, changed state to up
setting bridge id (which=3) prio 32769 prio cfg 32768 sysid 1 (on) id 8001.aabb.cc00.0400
RSTP(1): initializing port Et0/0
RSTP(1): Et0/0 is now designated
%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/0, changed state to up
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): updt roles, received superior bpdu on Et0/0 
RSTP(1): Et0/0 is now root port
RSTP(1): synced Et0/0
STP[1]: Generating TC trap for port Ethernet0/0
RSTP(1): transmitting an agreement on Et0/0 as a response to a proposal
RSTP(1): updt roles, received superior bpdu on Et0/0 
RSTP(1): synced Et0/0
RSTP(1): transmitting an agreement on Et0/0 as a response to a proposal

SW4 also claims root, hears a better BPDU. It has not downstream ports to
synchronize so the process ends there.

To visualize the process this is what happens at time 0.

RSTP-synch-2

This can be seen in the BPDUs as well. This is the BPDU that SW1 sends out.

BPDU-SW1-1

The BPDU is a proposal and the designated bit is set. It’s not yet trying to learn or
forward on the port.

SW2 sends out the following BPDU.

BPDU-SW2-1

When SW1 has received agreement BPDU it can start forwarding on its designated port.
SW2 can forward on its root port as soon as it decides that it is root. When SW2
has learned better root information it must synchronize downstream ports so the port
to SW3 is still blocking.

RSTP-synch-3

This is the agreement BPDU that SW2 sends to SW1.

BPDU-SW2-2

The learning and forwarding bits are set and the role is root. The agreement
bit is also set as well as TC so that MAC address tables can be updated. The
TC bit is set for 2x the hello time and is called TcWhile.

The next segment to be synchronized is the one between SW2 and SW3.
At first SW3 claims to be root.

BPDU-SW3-1

Then SW2 sends out a better BPDU.

BPDU-SW2-3

SW2 sends a BPDU with TC set because for a brief period of time SW2 was believed to
be root before SW2 heard a better BPDU from SW1. Then SW3 sends agreement BPDU.

BPDU-SW3-2

After SW2 has received the agreement BPDU it can bring its downstream port (Et0/1)
to forwarding making the topology look like this.

RSTP-synch-4

Finally the segment between SW3 and SW4 is synchronized. SW3 sends out the BPDU
and then SW4 agrees to it. TC is set because for a brief period SW3 as believed
to be root.

BPDU-SW3-3

BPDU-SW4-1

The final topology is then that all links are forwarding because we have no physical
loop in this topology.

RSTP-synch-5

Receiving better root information

So far we had no physical loop in the topology. This is not a very realistic
scenario and to see how RSTP works when receiving better root information we
will add a link between SW1 and SW4 meaning that SW4 has a direct path to the
root like this.

RSTP-synch-6

Before we look at what happens when bringing up the port between SW1 and SW4
let us assign port roles to all the ports on the drawing. This is good practice
to understand how STP works and you should be able to do this manually if you
fully understand STP. We are expecting the topology to converge like this.

RSTP-synch-7

After SW4 receives better root information, which ports do we need to synchronize
to converge the topology? SW1 does not receive better information, it is the root.
SW4 has a designated port towards SW3 so it needs to synchronize that segment.
SW3 has no designated ports so we except the synchronization process to stop
there. Lets look at debugs and I’ll do a play by play with the drawings.

SW1

%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/1, changed state to up
RSTP(1): initializing port Et0/1
RSTP(1): Et0/1 is now designated
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): received an agreement on Et0/1
STP[1]: Generating TC trap for port Ethernet0/1

SW1 initializes the port and waits for agreement BPDU from SW4 before it can
bring the port into forwarding.

SW4

%LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/1, changed state to up
RSTP(1): initializing port Et0/1
RSTP(1): Et0/1 is now designated
RSTP(1): transmitting a proposal on Et0/1
RSTP(1): updt roles, received superior bpdu on Et0/1 
RSTP(1): Et0/1 is now root port
RSTP(1): Et0/0 blocked by re-root
RSTP(1): synced Et0/1
RSTP(1): Et0/0 is now designated
STP[1]: Generating TC trap for port Ethernet0/1
RSTP(1): transmitting an agreement on Et0/1 as a response to a proposal
RSTP(1): transmitting a proposal on Et0/0
RSTP(1): received an agreement on Et0/0
STP[1]: Generating TC trap for port Ethernet0/0

SW4 initializes the port but then receives a better BPDU. Et0/0 is then blocked by
reroot because Et0/1 is now the root port. Et0/0 must be synchronized because it
is now a designated (downstream) port. SW3 then sends an agreement. So looking at
the topology this is what has happened so far.

RSTP-synch-8

Then SW3 sends agreement so that SW4 can bring Et0/0 into forwarding.

RSTP-synch-9

So now the question is, what happens at SW3?

RSTP(1): updt roles, received superior bpdu on Et0/1
RSTP(1): Et0/1 is now alternate

SW3 did not receive any better root information and it has no designated ports.
This means that the synchronization process can stop. Making the final topology
look like this.

RSTP-synch-10

What happens when synchronization fails?

RSTP synchronization is dependant on that all links in the topology are
point to point. This is decided by if the link is running in full duplex or not.
It is possible to force a link to point to point but if you are running your
interfaces in half duplex STP is not your biggest problem!

In the case that the proposal and agreement process fails RSTP has to fall
back on relying on the old timers used in regular STP. There is a timer called
FdWhile which is the same as the forward delay, defaulting to 15s. After trying to
send proposals for 15s it will start to bring the port through discarding, learning
and then to forwarding. I simulated this scenario below by blocking BPDUs between
SW3 and SW4.

20:02:23.338: RSTP(1): Et0/1 is now root port
20:02:23.338: RSTP(1): Et0/0 blocked by re-root
20:02:23.338: RSTP(1): Et0/0 is now designated
20:02:23.338: STP[1]: Generating TC trap for port Ethernet0/1
20:02:23.339: RSTP(1): transmitting a proposal on Et0/0
20:02:23.509: RSTP(1): transmitting a proposal on Et0/0
20:02:25.509: RSTP(1): transmitting a proposal on Et0/0
20:02:27.509: RSTP(1): transmitting a proposal on Et0/0
20:02:29.517: RSTP(1): transmitting a proposal on Et0/0
20:02:31.517: RSTP(1): transmitting a proposal on Et0/0
20:02:33.517: RSTP(1): transmitting a proposal on Et0/0
20:02:35.517: RSTP(1): transmitting a proposal on Et0/0
20:02:37.521: RSTP(1): transmitting a proposal on Et0/0
20:02:38.338: RSTP(1): Et0/0 fdwhile Expired

Every 2 seconds it tries to send a proposal but gets no agreement back.
After 15 seconds the timer expires and RSTP has to go through the regular
phases instead of immediately bringing the port online.

Conclusion

RSTP is a rapid protocol that works by synchronizing the topology. This process
is often overlooked in books on switching and spanning tree. This post describes
in detail how the synchronization process actually works. RSTP is a distance vector
protocol since the cost is learned by listening to BPDUs from other switches. In
some cases this can lead to issues like counting to infinity. For detail on this
refer to INE STP convergence PDF by Petr Lapukhov.
RSTP converges fast as long as the synchronization process works. This process relies
on all links running in full duplex and all switches are running in the same STP mode.

Cisco updates the Catalyst 2960 – Catalyst 2960-X and Catalyst 2960-XR

June 12, 2013 1 comment

The Catalyst 2960 is a very common switch in any environment that has
Cisco devices. A couple of years ago the 2960 got stacking via the
2960-S model. It also got the ability to do static routes which
was a nice feature. I used it in some deployments to do routing
locally in 2960 and then add a default route towards WAN provider.
That way I didn’t have to go through a slow CPE to route my local
VLANs.

The 2960-X and -XR are available in 24 or 48 port configurations.
Uplinks are either 2x 10 Gbit SPF+ or 4x 1 Gbit SFP. The PoE models
can support 370W or 740W of power.

The 2960-X provides up to 80 Gbps of stack bandwidth which is 2x more
compared to the 2960-S. It is now also possible to stack up to 8 switches
compared to the earlier maximum of 4. The 2960-S model uses FlexStack while
the newer -X and -XR models uses FlexStack-Plus. FlexStack-Plus supports
detecting stack port operational state in hardware and change the forwarding
according to it. This takes 100 ms or less. The older model does it in CPU
which can take 1 or 2 seconds.

Here are some notable differences between 2960-X and -XR compared to 2960-S.

  • Dual core CPU @ 600 MHz. 2960-S has single core
  • 2960-XR has support for dual power supplies
  • 256 MB of flash for -XR, 128 MB for -X. The S model has 64 MB
  • 512 MB of DRAM compared to 256 for 2960-S
  • 1k active VLANs compared to 255 for 2960-S
  • 48 Etherchannel groups for -XR, 24 for -X and 6 for -S
  • 4 MB of egress buffers instead of 2 MB
  • 4 SPAN sessions instead of 2
  • 32k MACs for -XR, 16k for -X and 8k for -S
  • 24k unicast routes for -XR, 16 static routes for -X and -S

The newer models also support Netflow lite, hibernation mode and EEE.

The 2960-XR does support dynamic routing. It has support for RIP, OSPF stub,
OSPFv3 stub, EIGRP stub, HSRP, VRRP and PIM.

Here are some performance numbers:

2960-X Lan Lite has 100 Gbps of switching bandwidth and 64 active VLANS.
2960-X Lan Base has 216 Gbps of switching bandwidth and 1023 active VLANs.
The same holds true for 2960-XR with IP Lite feature set. The 2960-S had
a maximum of 255 VLANs and 176 Gbps switching bandwidth. Depending on
model the 2960-X tops out at 130.9 Mpps compared to 101.2 for 2960-S.

The switches also have added support for IPv6. Notable features are:

  • IPv6 MLDv1 and v2 snooping
  • IPv6 First Hop Security (RA guard, source guard, and binding integrity guard
  • IPv6 ACLs
  • IPv6 QoS
  • HTTP/HTTPs over IPv6
  • SNMP over IPv6
  • Syslog over IPv6

I’m expecting more information to come out as it gets presented during Cisco Live
in Orlando.

CCIE link #11

July 14, 2011 Leave a comment

To be a CCIE we need a good grasp of switching and STP. This post by Petr Lapukhov (again) is one of the best I’ve ever seen on STP. This post describes PVST+ in detail. Read it here.

Integrated Routing and Bridging

March 30, 2011 7 comments

Sorry for the lack of updates lately but I spent the whole last week skiing and recharging my
batteries and now I’m back fully motivated to continue my path to the lab.

This time we will be talking about Integrated Routing and Bridging (IRB). Before studying for
the lab I had never used this feature. I’m not sure why we would use this feature in a
production network, maybe because we need to bridge two networks instead of routing
them due to some badly written application. If you have used it in real networks please post
in the comments. It is fair game for the lab so we need to know about it.

IRB is a feature used on routers that lets us bridge between a bridged domain and a
routed domain. Remember that in order for a VLAN to span a router the router must
be able to forward frames from one interface to another while maintaining the VLAN
header. If a network protocol is configured on a router interface (IP) it will terminate
the VLAN. This means that the VLAN header will not be maintained. When configuring
IRB we will be using a Bridged Virtual Interface (BVI), this can be compared to a SVI
on a switch. A BVI gives the bridged interfaces a connection to the routed world.

When IRB is configured and traffic comes in on a routed interface (IP address configured)
that is destined for a host in the bridge group the traffic will first be routed to the BVI.
The packet will then be forwarded to the bridging engine which forwards it through a
bridged interface, the forwarding is based on the destination MAC address. If a packet
comes in on a bridged interface destined for a host in a routed network the traffic will
first go to the BVI and then be sent to the routing engine before it sends it out the
routed interface. If bridging between two interfaces with no routed protocols the traffic
will not pass the BVI interface. Think of the bridge-group as an external switch and
the BVI lets us connect this external switch to the router.

The image below describes the scenario. R1 and R3 are in different VLANs but in
the same subnet, we need communication between the two routers. Between the
routers we have a couple of switches.

The configuration on R1 and R3 is straightforward. They have physical interfaces
with an IP address.

R1:

interface FastEthernet0/0
ip address 136.1.136.1 255.255.255.0

R3:

interface FastEthernet0/1
ip address 136.1.136.3 255.255.255.0

R1 is connected to SW1 and R3 to SW3. The switch configuration is just a basic access port.

SW1:

interface FastEthernet0/1
switchport access vlan 16

SW3:

interface FastEthernet0/3
switchport access vlan 36

Router R6 is connected to SW2 and it needs a trunk port.

SW2:

interface FastEthernet0/6
switchport trunk encapsulation dot1q
switchport trunk allowed vlan 16,36
switchport mode trunk

Now we need to configure R6 to bridge between the two different VLANs. We start by activating IRB.

bridge irb

Then we need to tie the interfaces to the bridge-group.

interface FastEthernet0/0.16
bridge-group 1
!
interface FastEthernet0/0.36
bridge-group 1

Now we create a BVI interface in the subnet.

interface BVI1
ip address 136.1.136.6 255.255.255.0

Lastly we need to activate spanning-tree and activate routing for the bridged interfaces.

bridge 1 protocol ieee
bridge 1 route ip

So using IRB we can both bridge and route between interfaces on a
router, something that is not possible otherwise.

Finally, these are some useful commands to show what is going on when using IRB.

show interfaces irb
show bridge
show spanning-tree

Categories: Routing, Switching Tags: , ,

Private VLANs

March 7, 2011 2 comments

Private VLANs is a method to segment devices at layer 2 that are in the same IP network. Different VLANs are used but they share a common IP network.

The most common scenario for a private VLAN is a residential network where customers
connect to a switch provisioned by the ISP and the ISP wants to provision only one
subnet but the customers should not be able to reach each other at layer 2.
The reason to disallow layer two intercommunication is for security, to prevent someone
from interfearing or eavesdropping on another customers traffic. Another scenario could
be a hosting environment where servers are connected to a switch and a common VLAN
is used instead of provisioning one VLAN for every new customer.

Take a look at this picture.

PC’s in the grey VLAN can only communicate with each other and the router. The same goes for the PC’s in the green VLAN. PC’s in the blue VLAN can ONLY communicate with the router not with each other. The picture shows only one PC but if there was another PC it would not be able to communicate with the other PC in the same VLAN.

Lets look at some of the building blocks of private VLANs.

Types of VLAN:

Primary VLAN – The VLAN that is used for receiving traffic from the device connected to the promiscous port.

Community VLAN – Everybody that is located in a community VLAN may communicate with others in the same
community VLAN and with the primary VLAN but not with other VLANs.

Isolated VLAN – Can only reach the device on the promiscous port, can not reach any other devices.

Types of ports:

Promiscous port – A port that is connected to the primary VLAN where a promiscous device is connected. This device will route traffic between the different VLANs. Requires mapping between primary VLAN and all secondary VLANs.

Host port – Hosts are connected to host ports, requires a association between the secondary VLAN in use on the port and the primary VLAN.

This picture shows the traffic flow.

When communicating in the same community VLAN the traffic forwarding is direct (layer 2) but it traffic is sent between different secondary VLANs the traffic must pass through the router which allows us to do packet filtering at layer 3 and it also means that ARP can not be sent directly between hosts even though they are in the same IP subnet. The arrows from the PC in the blue VLAN to the PC in the black VLAN shows the traffic flow with numbering. First the PC in the blue VLAN sends a packet, this packet is always source with the VID from the secondary VLAN. The router receives the traffic and if no filtering is done it sends the packet out sourcing with the primary VLAN. The PC in the black VLAN receives the packet from the primary VLAN and sends it response with its secondary VLAN. Finally the router sends the packet back to the blue VLAN with the VID of the primary VLAN.

Lets have a look at what needs to be configured, lets start with the VLAN configuration. The scenario is that there are two switches connected by a trunk and routers are connected to the switchports (INE topology).

vlan 100
 name PRIMARY
  private-vlan primary
  private-vlan association 1000,2000,3000
!
vlan 1000
 name COMMUNITY_1
  private-vlan community
!
vlan 2000
 name COMMUNITY_2
  private-vlan community
!
vlan 3000
 name ISOLATED
  private-vlan isolated

We create the VLANs and configure them to be primary, community or isolated. The primary VLAN needs to know the secondary VLANs it should be be associated to. Next is the interface configuration.

interface FastEthernet0/1
 switchport private-vlan mapping 100 1000,2000,3000
 switchport mode private-vlan promiscuous
!
interface FastEthernet0/3
 switchport private-vlan host-association 100 1000
 switchport mode private-vlan host
!
interface FastEthernet0/5
 switchport private-vlan host-association 100 2000
 switchport mode private-vlan host

One port is configured as promiscous and the others as hosts. The host ports with secondary VLANs need to know what primary VLAN is used and the promiscous port needs to know what the secondary VLANs are.

Show vlan private-vlan will show what has been configured.

SW1#show vlan private-vlan
Primary Secondary Type              Ports
——- ——— —————– ——————————————
100     1000      community         Fa0/1, Fa0/3
100     2000      community         Fa0/1, Fa0/5
100     3000      isolated          Fa0/1

We also need configuration for SW2.

vlan 100
 name PRIMARY
  private-vlan primary
  private-vlan association 1000,2000,3000
!
vlan 1000
 name COMMUNITY_1
  private-vlan community
!
vlan 2000
 name COMMUNITY_2
  private-vlan community
!
vlan 3000
 name ISOLATED
  private-vlan isolated
!
interface FastEthernet0/2
 switchport private-vlan host-association 100 1000
 switchport mode private-vlan host
!
interface FastEthernet0/4
 switchport private-vlan host-association 100 2000
 switchport mode private-vlan host
!
interface FastEthernet0/6
 switchport private-vlan host-association 100 3000
 switchport mode private-vlan host

Show interface switchport will show how the port is configured.

SW1#show interfaces f0/1 switchport
Name: Fa0/1
Switchport: Enabled
Administrative Mode: private-vlan promiscuous
Operational Mode: private-vlan promiscuous
Administrative Trunking Encapsulation: negotiate
Operational Trunking Encapsulation: native
Negotiation of Trunking: Off
Access Mode VLAN: 1 (default)
Trunking Native Mode VLAN: 1 (default)
Administrative Native VLAN tagging: enabled
Voice VLAN: none
Administrative private-vlan host-association: none
Administrative private-vlan mapping: 100 (PRIMARY) 1000 (COMMUNITY_1) 2000 (COMMUNITY_2) 3000 (ISOLATED)
Administrative private-vlan trunk native VLAN: none
Administrative private-vlan trunk Native VLAN tagging: enabled
Administrative private-vlan trunk encapsulation: dot1q
Administrative private-vlan trunk normal VLANs: none
Administrative private-vlan trunk associations: none
Administrative private-vlan trunk mappings: none
Operational private-vlan:
  100 (PRIMARY) 1000 (COMMUNITY_1) 2000 (COMMUNITY_2) 3000 (ISOLATED)
Trunking VLANs Enabled: ALL
Pruning VLANs Enabled: 2-1001
Capture Mode Disabled
Capture VLANs Allowed: ALL
Protected: false
Unknown unicast blocked: disabled
Unknown multicast blocked: disabled
Appliance trust: none

Lets try the configuration, we will start at R1 which is on the promiscous port and see if it can ping R2-R6.

R1#ping 255.255.255.255 re 1
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 255.255.255.255, timeout is 2 seconds:
Reply to request 0 from 100.0.0.5, 4 ms
Reply to request 0 from 100.0.0.2, 4 ms
Reply to request 0 from 100.0.0.3, 4 ms
Reply to request 0 from 100.0.0.4, 4 ms
Reply to request 0 from 100.0.0.6, 4 ms

As expected we can ping all the devices. R2 should only be able to ping R3 and R1.

R2#ping 255.255.255.255 re 1
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 255.255.255.255, timeout is 2 seconds:
Reply to request 0 from 100.0.0.3, 4 ms
Reply to request 0 from 100.0.0.1, 4 ms

 

Working as expected. R6 should only be able to ping R1 since it is in an isolated VLAN.

R6#ping 255.255.255.255 re 1
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 255.255.255.255, timeout is 2 seconds:
Reply to request 0 from 100.0.0.1, 4 ms

The configuration is working. What if we want to create a SVI in one of the switches? This is the configuration.

SW1(config)#int vlan 100
SW1(config-if)#ip add 100.0.0.7 255.255.255.0
SW1(config-if)#no sh

Lets try to ping.

SW1#ping 100.0.0.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 100.0.0.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/8 ms
SW1#ping 100.0.0.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 100.0.0.2, timeout is 2 seconds:
…..
Success rate is 0 percent (0/5)

Why can’t we ping R2? We have no mapping to the secondary VLAN!

SW1(config)#int vlan 100
SW1(config-if)#private-vlan mapping 1000
SW1(config-if)#^Z
SW1#
*Mar  1 01:08:47.983: %PV-6-PV_MSG: Created a private vlan mapping, Primary 100, Secondary 1000
SW1#
*Mar  1 01:08:49.267: %SYS-5-CONFIG_I: Configured from console by console
SW1#ping 100.0.0.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 100.0.0.2, timeout is 2 seconds:
…..
Success rate is 0 percent (0/5)
SW1#sh run int vlan 100
Building configuration…
Current configuration : 88 bytes
!
interface Vlan100
 ip address 100.0.0.7 255.255.255.0
 private-vlan mapping 1000
end

Still no success, why?

SW1#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
SW1(config)#ip routing
SW1(config)#^Z
SW1#
*Mar  1 01:14:26.858: %SYS-5-CONFIG_I: Configured from console by console
SW1#ping 100.0.0.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 100.0.0.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/8 ms

IP routing was needed! If you need to find documentation @ Cisco here is how you find it:

Support -> Configure -> Products -> Switches -> LAN Switches -> Access -> Cisco Catalyst 3560 Series Switches -> Configuration Guides -> Catalyst 3560 Software Configuration Guide, Release 12.2(52)SE -> Configuring Private VLANs

Categories: CCIE, Switching Tags: , ,