Archive

Archive for the ‘Network Design’ Category

Network Design Webinar With Yours Truly at CLN

February 12, 2015 3 comments

I’m hosting a network design webinar at the Cisco Learning Network on Feb 19th, 20.00 UTC+1.

As you may know, I am studying for the CCDE so I’m focusing on design right now but my other reason for hosting this is to remind people that with all the buzzwords around SDN and NfV going around, the networking fundamentals still hold true. TCP/IP is as important as ever, building a properly designed network is a must if you want to have overlays running on it. If you build a house and do a sloppy job with the foundation, what will happen? The same holds true in networking.

I will introduce the concepts of network design. What does a network designer do? What tools are used? What is CAPEX? What is OPEX? What certifications are available? What is important in network design? We will also look at a couple of design scenarios and reason about the impact of our choices. There is always a tradeoff!

If you are interested in network design or just want to tune in to yours truly, follow this link to CLN.

I hope to see you there!

Advertisements

Routing Considerations in DDoS Protection Environments

July 7, 2014 2 comments

Lately I have done some studying for the CCDE and one of the things I was
looking at is how to protect against DDoS attacks. I’m not expecting it
to be a big topic for the CCDE but it has some interesting concepts relating
to routing. Take a look at the following topology:

Main

There is an attacker at the top left. R1 is the edge device and then there are a
few more routers, all peering BGP with the RR, which is R5. The server of interest
is 100.100.100.100 and there is a scrubbing device to the far right. All routers
peer iBGP from their loopbacks to the RR, including the scrubbing device.

Normally traffic to 100.100.100.100 would flow through R1 to R4 and then to the
server.

Normal_flow

The attacker now starts to flood the server with malicious traffic. This is detected
by the DDoS scrubbing device which starts to announce via BGP a more specific route
than the one advertised by R4. R4 normally advertises 100.100.100.0/24 but the
scrubbing device advertises 100.100.100.100/32. All the other routers will start
to forward traffic to 100.100.100.100 towards the scrubbing device. The traffic flow
is then like this:

Bad_flow1

The scrubbing device does it job, sends the traffic towards R3 and BAM!
Traffic is looped… Why? R3 has a route to 100.100.100.100/32 pointing at 6.6.6.6.
When the scrubbing device sends the traffic to R3 it gets looped back.

Loop

So this is what I find interesting. What can we do to bypass normal forwarding rules
and to keep the traffic from looping?

One method is to apply policy based routing. Put a route-map on the interface that
is facing the scrubbing device with an ACL for traffic that needs special treatment.

PBR

Create an extended ACL called “PBR_SPECIAL_TRAFFIC” where traffic that is destined
for 100.100.100.100 gets a next-hop of 2.2.2.2. The problem is though that this
would not work in this case because R2 has a route back towards R3. For this to
work, R2 would have to be excluded from the forwarding path and the server would
then be connected to R2. R2 would need a static route to override the BGP /32 route.

PBR2

This solution is a bit of a kludge and traffic can easily get looped if you are
not careful. Also, PBR always has the risk of becoming CPU forwarded if you configure
unsupported actions in the route-map.

Another solution is to create a GRE tunnel between the scrubbing device and the router
where the server resides.

GRE

The traffic would then be tunneled from the scrubbing device to the attaching router.
Because R4 has a /32 route in BGP, a static route should be added to overcome this.
Make sure that GRE is forwarded in hardware and account for the increased packet
size which could lead to fragmentation of not careful. Make sure that the MTU
supports GRE tunneling and a payload of at least 1460 bytes.

There is also the possibility of reinjecting the traffic into another VRF.
The VRF can have other routing than the global table. Either VRF Lite or
MPLS would be needed to support this solution.

VRF

The final solution seems like the best to me but it requires that you have
an infrastructure supporting the use of VRFs.

This post gives you a brief overview of how a DDoS protection appliance works
and how we sometimes must overcome the normal forwarding rules of the network.

Why fast IGP timers aren’t always beneficial

March 31, 2014 4 comments

Introduction

When tuning your IGP of choice, the first thing people look at is usually the hello
and dead interval. This is a flawed logic, it is true that it can help in certain
cases but convergence consists of much more than just hello timers.

Why tune timers?

Detecting that the other side of the link is down is an important part of converging.
That’s why your design should avoid putting any bump in the wires such as converters
or a L2 cloud between the L3 endpoints. If you avoid such things when one end of the
link goes down the other end will as well which provides fast detection of failure.

In rare cases you can have the link being up but traffic is not passing over it. For
such cases or for those cases where there was no chance of avoiding a converter or
L2 cloud, tuning the hello timers can help with failure detection. The answer is almost
always BFD though, if the platform supports it.

Topologies where tuning timers is bad

When using a topology where VSS is involved such as Catalyst 6500 or Catalyst 4500,
tuning the timers is very bad. A common topology might look like this:

VSS1

The L3 switches are dually connected to the VSS. These L3 switches might be in the
distribution layer and the VSS is part of the core. The distribution switches run
LACP towards the VSS which acts as one device from an outside perspective.

The VSS runs Stateful Switchover (SSO) which syncs configuration, boots the standby
supervisor with the software and has the line cards ready to go in case of failure
of the primary chassis. Hardware forwarding tables are also synchronized, SSO
switchover takes somewhere up to 10 seconds.

SSO

The active VSS chassis runs the control plane. Routing protocols such as OSPF are not
HA aware, meaning that the state of the routing protocols is not synchronized between
the chassis.

When using fast timers and a switchover occurs, what happens is that OSPF detects that the
neighbor is not replying and tears down the adjacency. The secondary chassis then has to go
bring the adjacency back up by sending out hello packets, exchanging LSAs and updating
RIB/FIB. This may take as long as 20 seconds with the time included from the switchover.

VSS_failure

Non Stop Forwarding (NSF)

NSF combined with graceful restart is a technology used to forward packets when
a switchover has occured. The goal of NSF is to delay the failure detection which
may sound strange from a convergence perspective. Remember though that the VSS acts
as one device.

With NSF the forwarding is done according to the last known FIB entries. After a
switchover the secondary VSS will use graceful restart to inform its neighbors that
it has restarted and needs to synchronize its LSDB. This is done by sending hello packets
with a special bit set and the synchronization is done Out Of Band (OOB) to not tear
down the existing adjacency. The neighbors exchange LSAs and run SPF as normal. The
RIB and FIB can then be updated and and normal forwarding ensues.

This process is dependant on that the neighbors are also NSF aware otherwise they
would tear down the adjacency when the secondary VSS is restarting its routing
processes. So the key here is that the adjacency must stay up and that’s why timers
should be left at default if running VSS. This goes for both the VSS and any routers
that are neighbors to the VSS.

Conclusion

When using VSS always leave IGP timers at the default. Fast timers ruins the NSF
process and will lead to much higher convergence times than leaving them at the
default.

Network Campus Design

October 18, 2013 13 comments

Introduction

Modern networks need to be enabled for voice and video. These applications
do not tolerate a lot of loss before quality becomes unacceptable. This
requires us to build networks that are scalable, resilient and converge
quickly. This post will describe key points of building a network that
fullfills those requirements.

Hierarchical Network Design

It’s important to think of the network in terms of building blocks and
hierarchy. Define different building blocks like campus, small remote site,
medium remote site and large remote site so that not every new network
needs an unique design. Building a hierarchical network will:

  • Make it easier to understand, grow and troubleshoot
  • Create small fault domains, clear separation of layer 2 and 3.
  • Allows for load sharing and redundancy
  • Deterministic traffic patterns and convergence

The key point is to not end up with a network looking like this:

Bad_design

Like anyone working in the real world I know that budget constraints, lack of
available fibre or many other factors can limit our network designs. We can’t
always win but we should make it clear to the company/management that we can’t
support voice and video unless we design the network as it should be. Do you want
to be on call for a network that is poorly designed and maybe you are the only
one that knows how it works? In Optimal Routing Design, Russ White talks about
the 2 AM test. If someone calls you up at 2 AM do you know how your network works?
If you don’t it’s a sign that it is too complex.

The different building blocks of a hierarchical network

Traditionally networks have been built in a three tier model. This model
consists of access, distribution and core. In smaller networks it may be
acceptable to have a layer that acts as both distribution and core and
this is called the collapsed core model.

Access Layer

The role of the access layer is to:

  • Provide connectivity into the network
  • Enforce security to prevent ARP/IP spoofing
  • Boundary for trust for QoS model
  • Provide PoE for phones and access points

Distribution layer

The role of the distribution layer is to:

  • Aggregate wiring closets(access layer) and uplinks to core
  • Provide high availability, load sharing and QoS
  • Protect the core from high density peering and problems in access layer
  • Summarize routes towards core and fast convergence
  • Provide first hop redundancy towards the access layer

Core Layer

The role of the core layer is to:

  • Provide connectivity between all the building blocks
  • Provide high performance and high availability
  • Aggregate the distribution layer
  • A separate core layer helps with scalability

When do I need a core layer?

There is no text book answer to this question but consider the following
topology:

2blocks

There are currently two building blocks. Every distribution layer switch has
three IGP neighbors and 3 links. What happens if we add another building
block?

3blocks

The network went from three IGP peers to five IGP peers. Also the total number
of links went to 15 from previously 6. You can see that this gets out of hand
pretty quickly.

Different Campus Designs

There are a few common designs that can be used to build the campus access and
distribution layer. Which design that fits best will depend on if there is a
need to span VLANs and how modern equipment you have in your network.

Layer 3 Distribution

Layer 3 distribution

This design has no VLANs spanning the switches. This is what we want to have
but usually there is some requirement that keeps us from doing this. It could be
some application, Vmotion or maybe one common wireless network across the entire
campus. There is no layer 2 loop in this design which means we are not relying
on STP for convergence. Here are some points to consider for this design:

  • Tune CEF to avoid polarization leading to underusing links
  • Summarize routes towards the core
  • Don’t peer IGP across links unless you intend to use them
  • Set the trunk mode to on/nonegotiate
  • Ports toward users hardcoded to access and enable portfast
  • Configure root guard or BPDU guard towards users
  • Enable security features such as DHCP snooping and DAI

Layer 2 Distribution

Layer 2 distribution

It’s quite common that some VLANs need to span the campus. This means that there
must exist an L2 link between distribution switches. There is now a loop in the
topology so convergence is dependant on spanning tree. Some points to consider
for this design:

  • Tune CEF to avoid polarization leading to underusing links
  • Summarize routes towards the core
  • Don’t peer IGP across links unless you intend to use them
  • Set the trunk mode to on/nonegotiate
  • Ports toward users hardcoded to access and enable portfast
  • Configure root guard or BPDU guard towards users
  • Enable security features such as DHCP snooping and DAI
  • Align STP Root and HSRP primary on the same distribution switch
  • Put Root Guard on downlinks (facing access switches)
  • Put Loop Guard on uplinks (facing distribution switches)

Routed Access

Routed access

The routed access design has no layer 2 links. It’s all routing which means
convergence is fast, no links are blocking and equal cost routing can be used.
The drawback is that no VLANs can span the topology. If MPLS is enabled in the core
some VLANs could still be able to span through the use of EoMPLS, VPLS etc. Key
points to consider for this design:

  • How much more will routed access cost me?
  • Do I need the performance/convergence gain?
  • Do I have the need to span any VLANs?
  • How many routes do my access layer devices support?
  • Summarize routes towards the core
  • Summarize routes towards the access
  • Tune CEF to avoid polarization
  • Don’t peer IGP across links unless you intend to use them
  • Ports toward users hardcoded to access and enable portfast
  • Configure root guard or BPDU guard towards users
  • Enable security features such as DHCP snooping and DAI

Layer 2 Distribution with MLAG

Layer 2 MLAG

Newer designs can utilize newer features like stacking, VSS and vPC. This means that
VLANs can span access switches but there is no physical loop because MLAG is used.
This gives us the advantage of a layer 2 distribution without the disadvantage of
relying on spanning tree for convergence. There is no need to run HSRP becaues the
distribution layer is acting as one device. The key points are similar to the layer 2
distribution:

  • Tune CEF to avoid polarization leading to underusing links
  • Summarize routes towards the core
  • Set the trunk mode to on/nonegotiate
  • Ports toward users hardcoded to access and enable portfast
  • Configure root guard or BPDU guard towards users
  • Enable security features such as DHCP snooping and DAI
  • Put Root Guard on downlinks (facing access switches)
  • Put Loop Guard on uplinks (facing distribution switches)

If doing a new design I would definitely go with some form of stacking or VSS or vPC
if deploying Nexus switches. This gives us the flexibility of using layer 2 in distribution
but still not needing to rely on STP and FHRP for convergence.

Recommendations for Fast Convergence

  • Use only point to point interconnections
  • Use fiber between all devices for fast convergence (debounce timer)
  • Tune the carrier delay timer
  • When possible use IP configuration on phsyical interface over SVI

I did a separate post on Detecting Network Failure which goes into more detail
on detecting failure.

Why should physical interfaces be used over SVI? The following steps take place when
converging on a physical interface:

  1. Link Down
  2. Interface Down
  3. Routing Update

When using an SVI there are some additional steps however:

  1. Link Down
  2. Interface Down
  3. Autostate
  4. SVI Down
  5. Routing Update

When using an SVI when the link goes down the switch must check if there are any
other ports up with that VLAN configured. If it is the SVI won’t be brought down.
Even if there isn’t it takes time to go through all the interfaces before declaring
the SVI down. This can worsen convergence by a good 200 ms. If you do use an SVI
then make sure that it is point to point so that it’s not allowed on any other
links than the link connecting the two switches.

Recommendations for Spanning Tree

  • Don’t span VLANs across switches unless neccessary
  • Use RSTP or MST for best convergence
  • Even if you have no loops, STP is needed to protect against user side loops
  • STP can protect against misconfiguration or hardware failures creating loops

Layer 2 Hardening

Cisco recommends the following features for hardening layer 2 in a campus design:

Layer 2 hardening

I agree with most of this but there are some caveats.

One issue is with Root Guard. Why do we run Root Guard? To protect against another
switch dictating the bridging topology. I would set the root to a priority of 0 and the
secondary root to a priority of 4096. That should provide protection and if you don’t
trust your employees to not mess up the network that is an education problem or
management problem. Restrict user accounts in Tacacs what they can’t do to remove
potentially dangerous commands such as switchport trunk allowed vlan, no router bgp,
no router ospf and so on.

So what is the issue with Root Guard? The STP root will also be the HSRP primary device.
Because the network is designed with Equal Cost Multi Paths (ECMP) some traffic may
arrive at the standby HSRP router. This is the network without blocking links:

Root Guard step 1

No issues so far except that the crosslink is being used but it’s not a major deal.
But what happens if the links between the HSRP primary and standby fails?

Root Guard step 2

The access switches are sending superior BPDUs but the secondary distribution switch will
block the link due to Root Guard being implemented. This means that any traffic arriving
at the secondary distribution switch destined for the access layer switches will be
black holed. That is why I would not implement Root Guard towards the access layer.

Recommendations for Layer 3

Here are some recommendations for Layer 3:

  • Build triangles not squares for deterministic convergence
  • Use passive-interface default and only peer on links used for transit
  • Design the network with dual Layer 3 paths for resiliency
  • Summarize from the distribution to the core to cut down on flooding and Active queries
  • Tune CEF to avoid polarization of linkz

What happens if we design with squares?

Routing Square

If a device goes down the network has to rely on flooding of updates or LSAs before it
can converge. There is no secondary path that can be immediately installed.

But if the design is a triangle instead:

Routing Triangle

There are already dual paths so losing one won’t affect convergence and the other route
is already in the FIB so traffic can keep flowing.

Conclusion

There are many network designs out there. Learn the strengths and weaknesses of
different designs. Look at best practice designs from the vendors but don’t follow
them blindly. As I have shown sometimes recommendations will not work for all
scenarios.

Finding the right design depends on business needs, budget and what kind of applications
that are running. Read more on campus design in BRKCRS-2031 and also look at the Cisco
Validated Network Designs
.