Archive

Archive for the ‘MPLS’ Category

Unique RD per PE in MPLS VPN for Load Sharing and Faster Convergence

January 11, 2015 3 comments

This post describes how load sharing and faster convergence in MPLS VPNs is possible by using an unique RD per VRF per PE. It assumes you are already familiar with MPLS but here is a quick recap.

The Route Distinguisher (RD) is used in MPLS VPNs to create unique routes. With IPv4, an IP address is 32 bits long but several customers may and probably will use the same networks. If CustomerA uses 10.0.0.0/24 and CustomerX also uses 10.0.0.0/24, we must in some way make this route unique to transport it over MPBGP. The RD does exactly this by prepending a 64 bit value and together with the IPv4 address, creating a 96-bit VPNv4 prefix. This is all the RD does, it has nothing to do with the VPN in itself. It is common to create RD consisting of AS_number:VPN_identifier so that a VPN has the same RD on all PEs where it exists.

The Route Target (RT) is what defines the VPN, which routes are imported to the VPN and the topology of the VPN. These are extended communities that are tagged on to the BGP Update and transported over MPBGP.

MPLS uses labels, the transport label which is used to transport the packet through the network is generated by LDP. The VPN label which is used to make sure the packets make it to the right VPN is generated by MPBGP and can be per prefix or per VRF.

Below is a configuration snipper for creating a VRF with the newer syntax that is used.

PE1#sh run vrf
Building configuration...

Current configuration : 401 bytes
vrf definition CUST1
 rd 11.11.11.11:1
 !
 address-family ipv4
  route-target export 64512:1
  route-target import 64512:1
 exit-address-family
!
!
interface GigabitEthernet1
 vrf forwarding CUST1
 ip address 111.0.0.0 255.255.255.254
 negotiation auto
!
router bgp 64512
 !
 address-family ipv4 vrf CUST1
  neighbor 111.0.0.1 remote-as 65000
  neighbor 111.0.0.1 activate
 exit-address-family
!         
end

The values for the RD and RT are defined under the VRF. Now the topology we will be using is the one below.

MPLS1

This topology uses a Route Reflector (RR) like most decently sized net works will to overcome the scalability limitations of a BGP full mesh. The negative part of using a RR is that we will have less routes because only the best routes will be reflected. This means that load sharing may not take place and that convergence takes longer time when a link between a PE and a CE goes down.

This diagram shows PE1 and PE2 advertising the same network 10.0.10.0/24 to the RR. The RR then picks one as best and reflects that to PE3 (and others). This means that the path through PE2 will never be used until something happens with PE1. This is assuming that they are both using the same RD.

MPLS BGP1

MPLS BGP2

When PE1 loses its prefix it sends a BGP WITHDRAW to the RR, the RR then sends a WITHDRAW to PE3 and then it sends an UPDATE which is the prefix via PE2. The path via PE2 is not used until this happens. This means that load sharing is not taking place and that all traffic destined for 10.0.10.0/24 has to converge.

If every PE is using unique RD for the VRF per PE then they become two different routes and both can be reflected by the RR. The RD is then usually written in the form PE_loopback:VPN_identifier. This also helps with troubleshooting to see where the prefix originated from.

MPLS BGP3

PE3 now has two routes to 10.0.10.0/24 in its routing table.

PE3#sh ip route vrf CUST1 10.0.10.0 255.255.255.0

Routing Table: CUST1
Routing entry for 10.0.10.0/24
  Known via "bgp 64512", distance 200, metric 0
  Tag 65000, type internal
  Last update from 11.11.11.11 01:10:52 ago
  Routing Descriptor Blocks:
  * 22.22.22.22 (default), from 111.111.111.111, 01:10:52 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 65000
      MPLS label: 17
      MPLS Flags: MPLS Required
    11.11.11.11 (default), from 111.111.111.111, 01:10:52 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 65000
      MPLS label: 28
      MPLS Flags: MPLS Required

The PE is now doing load sharing meaning that some traffic will take the path over PE1 and some over PE2.

MPLS BGP4

We have achieved load sharing and this also means that if something happens with PE1 or PE2, not all traffic will be effected. To see which path is being used from PE3 we can use the show ip cef exact-route command.

PE3#sh ip cef vrf CUST1 exact-route 10.0.0.10 10.0.10.1
10.0.0.10 -> 10.0.10.1 => label 17 label 16TAG adj out of GigabitEthernet1, addr 23.23.23.0
PE3#sh ip cef vrf CUST1 exact-route 10.0.0.5 10.0.10.1 
10.0.0.5 -> 10.0.10.1 => label 28 label 17TAG adj out of GigabitEthernet1, addr 23.23.23.0

What is the drawback of using this? It consumes more memory because the prefixes are now unique, in effect doubling the required memory to store BGP Paths. The PEs have to store several copies with different RD for the prefix before it can import it into the RIB.

PE3#sh bgp vpnv4 uni all
BGP table version is 46, local router ID is 33.33.33.33
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 11.11.11.11:1
 *>i 10.0.10.0/24     11.11.11.11              0    100      0 65000 i
Route Distinguisher: 22.22.22.22:1
 *>i 10.0.10.0/24     22.22.22.22              0    100      0 65000 i
Route Distinguisher: 33.33.33.33:1 (default for vrf CUST1)
 *>  10.0.0.0/24      32.32.32.1               0             0 65001 i
 *mi 10.0.10.0/24     22.22.22.22              0    100      0 65000 i
 *>i                  11.11.11.11              0    100      0 65000 i

For the multipathing to take place, PE3 must allow more than one route to be installed via BGP. This is done through the maximum-paths eibgp command.

address-family ipv4 vrf CUST1
  maximum-paths eibgp 2

In newer releases there are other features to overcome the limitation of only reflecting one route, such as BGP Add Path. This post showed the benefits of enabling unique RD for a VRF per PE to enable load sharing and better convergence. It also showed that doing so will use more memory due to having to store multiple copies of essentially the same route. Because multiple routes get installed into the FIB, that should also be a consideration depending on how large the FIB is for your platform.

Categories: BGP, MPLS Tags: , , , ,

A Quick Look at MPLS-TE

August 24, 2014 4 comments

Introduction

I’m currently designing and implementing a large network which will run MPLS.
This network will replace an old network that was mainly L2 based and did not
run MPLS, only VRF lite. There are a few customers that need to have diverse
paths in the network and quick convergence when a failure occurs.
This led me to consider MPLS-TE for those customers and to have plain MPLS
through LDP for other customers buying VPNs. What is the usage for MPLS-TE?

Weaknesses of IGP

When using normal IP forwarding a least cost path is calculated through an IGP,
such as OSPF or ISIS. The problem though is that only the least cost path will
be utilized, any links not on the best path will sit idle, which is a waste of
bandwidth. IGP metrics can be manipulated but that only moves the problem to
other links, it does not solve the root cause. Manipulating metrics is cumbersome
and prone to error. It’s difficult to think of all the traffic flows in the network
and get all the metrics correct. IGPs also lack the granularity in metrics to
utilize all the bandwidth in the network.

RSVP-TE

RSVP in the past was a protocol used for quality of service in the Intserv model.
It never got a lot of traction due to scalability issues of keeping state in the
core of the network. RSVP was modified to support MPLS and that protocol is known
as RSVP-TE. RSVP-TE provides support for:

  • Explicit path configuration
  • Path numbering
  • Route recording

RSVP assigns labels to the LSPs. The headend of the tunnel sends PATH messages
towards the tailend and then from the tailend back, RESV messages are sent together
with a label to use for the LSP.

RSVP-TE1

Constrained SPF (CSPF)

To overcome the limitations of IGPs mentioned above in the post, the SPF algorithm has
to be modified. This is called Constrained SPF (CSPF) and besides a simple metric it
can take other factors into account such as:

  • Bandwidth
  • Affinity
  • Administrative weight
  • Explicitly defined path

To support this modified SPF algorithm and to carry the information needed in the
LSU/LSP, the IGPs have been modified to support this. OSPF supports TE by using
an opaque LSA. ISIS which is easily modifiable with TLVs, supports TE through a new
TLV. When using ISIS as the IGP, a wide style metric must be used to support TE.

The CSPF path can either be calculated dynamically by the router or the user can
configure an explicit path. Both methods support the use of constraints to build
the path.

Routing Across an TE Tunnel

Once the tunnel has been built, traffic must be sent through the tunnel.
This can be achieved in a couple of different ways:

  • Static routing
  • Dynamic routing
  • Policy-based routing

The Lab

That was a brief overview of MPLS-TE. To test this out in a lab I have setup a
topology like this:

MPLS-TE1

All routers are running IOS except for one router. This is to show the syntax of
IOS-XR and for me to practice using it. IOS1 and IOS6 advertise their loopbacks to
the PE routers. Normal routing has been setup as well as MPLS and BGP. This is the
configuration so far:

IOS1:

router bgp 1
 bgp log-neighbor-changes
 network 1.1.1.1 mask 255.255.255.255
 network 11.11.11.11 mask 255.255.255.255
 neighbor 12.12.12.1 remote-as 100

IOS2:

ip vrf CUST_1
 rd 1:1
 route-target export 1:1
 route-target import 1:1
!
interface Loopback0
 ip address 2.2.2.2 255.255.255.255
 ip router isis backbone
!
interface Loopback1
 ip address 22.22.22.22 255.255.255.255
 ip router isis backbone
!
interface GigabitEthernet0/0
 ip vrf forwarding CUST_1
 ip address 12.12.12.1 255.255.255.254
! 
interface GigabitEthernet0/1
 ip address 23.23.23.0 255.255.255.254
 ip router isis backbone
 mpls ip
 isis circuit-type level-2-only
!
interface GigabitEthernet0/2
 ip address 25.25.25.0 255.255.255.254
 ip router isis backbone
 mpls ip
 isis circuit-type level-2-only
!
router isis backbone
 net 49.0001.0002.0002.0002.0002.00
 is-type level-2-only
 metric-style wide
 passive-interface default
 no passive-interface GigabitEthernet0/1
 no passive-interface GigabitEthernet0/2
!
router bgp 100
 bgp router-id 2.2.2.2
 bgp log-neighbor-changes
 neighbor 7.7.7.7 remote-as 100
 neighbor 7.7.7.7 update-source Loopback0
 !        
 address-family vpnv4
  neighbor 7.7.7.7 activate
  neighbor 7.7.7.7 send-community extended
 exit-address-family
 !
 address-family ipv4 vrf CUST_1
  neighbor 12.12.12.0 remote-as 1
  neighbor 12.12.12.0 activate
  neighbor 12.12.12.0 as-override
!
mpls ldp router-id Loopback0 force

XR1:

vrf CUST_1
 address-family ipv4 unicast
  import route-target
   1:1
  !
  export route-target
   1:1
  !
 !
!
interface Loopback0
 ipv4 address 7.7.7.7 255.255.255.255
!
interface GigabitEthernet0/0/0/0
 ipv4 address 47.47.47.1 255.255.255.254
!
interface GigabitEthernet0/0/0/1
 ipv4 address 57.57.57.1 255.255.255.254
!
interface GigabitEthernet0/0/0/2
 vrf CUST_1
 ipv4 address 76.76.76.1 255.255.255.254
!
route-policy PASS
  pass
end-policy
!
router isis backbone
 is-type level-2-only
 net 49.0001.0007.0007.0007.0007.00
 address-family ipv4 unicast
  metric-style wide
 !
 interface Loopback0
  passive
  address-family ipv4 unicast
  !
 !
 interface GigabitEthernet0/0/0/0
  address-family ipv4 unicast
  !
 !
 interface GigabitEthernet0/0/0/1
  address-family ipv4 unicast
  !
 !
!
router bgp 100
 bgp router-id 7.7.7.7
 address-family ipv4 unicast
 !
 address-family vpnv4 unicast
 !
 neighbor 2.2.2.2
  remote-as 100
  update-source Loopback0
  address-family vpnv4 unicast
  !
 !
 vrf CUST_1
  rd 1:1
  address-family ipv4 unicast
  !
  neighbor 76.76.76.0
   remote-as 1
   address-family ipv4 unicast
 route-policy PASS in
    route-policy PASS out
    as-override
   !
  !
 !
!
mpls ldp
 router-id 7.7.7.7
 interface GigabitEthernet0/0/0/0
  address-family ipv4
  !
 !
 interface GigabitEthernet0/0/0/1
  address-family ipv4

IOS6:

interface Loopback0
 ip address 6.6.6.6 255.255.255.255
!
interface Loopback1
 ip address 66.66.66.66 255.255.255.255
!
router bgp 1
 bgp log-neighbor-changes
 network 6.6.6.6 mask 255.255.255.255
 network 66.66.66.66 mask 255.255.255.255
 neighbor 76.76.76.1 remote-as 100

The other configuration has been left out, it’s just plain IGP routing and enabling MPLS.
The lab is currently using MPLS VPN and taking the upper path due to the IGP path
being shorter. Let’s confirm this with a traceroute.

IOS1#traceroute 6.6.6.6 numeric source lo0
Type escape sequence to abort.
Tracing the route to 6.6.6.6
VRF info: (vrf in name/id, vrf out name/id)
  1 12.12.12.1 1 msec 0 msec 0 msec
  2 25.25.25.1 [MPLS: Labels 24/16012 Exp 0] 8 msec 9 msec 4 msec
  3 57.57.57.1 [MPLS: Label 16012 Exp 0] 23 msec 5 msec 8 msec
  4 76.76.76.0 2 msec *  2 msec

The traceroute confirms this.

MPLS-TE tunnels are always unidirectional. To configure MPLS-TE we need to
go through the following steps:

  • Enable CEF (default)
  • Enable TE support in IGP
  • Enable MPLS-TE tunnels globally
  • Enable MPLS-TE tunnels on interface(s) in path
  • Enable RSVP on interface(s) in path

The following configuration is added to all IOS routers:

IOS2(config)#mpls traffic-eng tunnels 
IOS2(config)#int range gi0/1 - 2
IOS2(config-if-range)#mpls traffic-eng tunnels
IOS2(config-if-range)#ip rsvp bandwidth
IOS2(config-if-range)#router isis backbone
IOS2(config-router)#metric-style wide
IOS2(config-router)#mpls traffic-eng level-2
IOS2(config-router)#mpls traffic-eng router-id lo0

The following configuration is added to the IOS-XR router:

router isis backbone
 address-family ipv4 unicast 
  metric-style wide 
  mpls traffic-eng level-2-only 
  mpls traffic-eng router-id Loopback0 
rsvp 
 interface GigabitEthernet0/0/0/0
 ! 
 interface GigabitEthernet0/0/0/1
 ! 
mpls traffic-eng 
 interface GigabitEthernet0/0/0/0
 ! 
 interface GigabitEthernet0/0/0/1

This is enough to have ISIS support MPLS-TE. From the IOS-XR router we can
see that the IGP is now carrying more information in the LSPs.

RP/0/0/CPU0:ios#show mpls traffic-eng topology 
Sun Aug 24 17:16:57.583 UTC
My_System_id: 0007.0007.0007.00 (IS-IS backbone level-2)
My_BC_Model_Type: RDM 

Signalling error holddown: 10 sec Global Link Generation 19

IGP Id: 0002.0002.0002.00, MPLS TE Id: 2.2.2.2 Router Node  (IS-IS backbone level-2)

  Link[0]:Broadcast, DR:0002.0002.0002.01, Nbr Node Id:12, gen:10
      Frag Id:0, Intf Address:23.23.23.0, Intf Id:0
      Nbr Intf Address:0.0.0.0, Nbr Intf Id:0
      TE Metric:10, IGP Metric:10
      Attribute Flags: 0x0
      Ext Admin Group: 
          Length: 256 bits
          Value : 0x::
      Attribute Names: 
      Switching Capability:None, Encoding:unassigned
      BC Model ID:RDM
      Physical BW:1000000 (kbps), Max Reservable BW Global:750000 (kbps)
      Max Reservable BW Sub:0 (kbps)
                                 Global Pool       Sub Pool
               Total Allocated   Reservable        Reservable
               BW (kbps)         BW (kbps)         BW (kbps)
               ---------------   -----------       ----------
        bw[0]:            0         750000                0
        bw[1]:            0         750000                0
        bw[2]:            0         750000                0
        bw[3]:            0         750000                0
        bw[4]:            0         750000                0
        bw[5]:            0         750000                0
        bw[6]:            0         750000                0
        bw[7]:            0         750000                0

The next step is to create the tunnel. We will build a tunnel from IOS2 to XR1.

interface Tunnel0 
 ip unnumbered Loopback0 
 tunnel mode mpls traffic-eng 
 tunnel destination 7.7.7.7 
 tunnel mpls traffic-eng autoroute announce 
 tunnel mpls traffic-eng path-option 10 dynamic 
interface tunnel-te0 
 ipv4 unnumbered Loopback0 
 autoroute announce 
 destination 2.2.2.2 
 path-option 10 dynamic 

Autoroute announce is used to advertise the tunnel into the IGP. Another option
would be a static route for the tunnel destination across the tunnel interface.

The tunnel is now up and traffic is forwarding across it.

IOS1#traceroute 6.6.6.6 numeric so lo0
Type escape sequence to abort.
Tracing the route to 6.6.6.6
VRF info: (vrf in name/id, vrf out name/id)
  1 12.12.12.1 1 msec 0 msec 0 msec
  2 25.25.25.1 [MPLS: Labels 19/16012 Exp 0] 8 msec 1 msec 3 msec
  3 57.57.57.1 [MPLS: Label 16012 Exp 0] 2 msec 3 msec 7 msec
  4 76.76.76.0 9 msec *  3 msec

The label 19 is the label used for the tunnel and label 16012 is the VPN label.
We are still using the upper path though. Let’s configure an explicit path to
use the lower path of the topology.

ip explicit-path name TO_XR1 enable 
 next-address 23.23.23.1 
 next-address 34.34.34.1 
 next-address 47.47.47.1 
! 
interface Tunnel0 
 tunnel mpls traffic-eng path-option 1 explicit name TO_XR1 
explicit-path name TO_IOS2
 index 1 next-address strict ipv4 unicast 47.47.47.0 
 index 2 next-address strict ipv4 unicast 34.34.34.0 
 index 3 next-address strict ipv4 unicast 23.23.23.0 
interface tunnel-te0 
path-option 1 explicit name TO_IOS2

We try a traceroute from IOS1 to see if the traffic is following the lower path.

IOS1#traceroute 6.6.6.6 numeric so lo0
Type escape sequence to abort.
Tracing the route to 6.6.6.6
VRF info: (vrf in name/id, vrf out name/id)
  1 12.12.12.1 1 msec 0 msec 0 msec
  2 23.23.23.1 [MPLS: Labels 19/16012 Exp 0] 5 msec 2 msec 2 msec
  3 34.34.34.1 [MPLS: Labels 21/16012 Exp 0] 2 msec 2 msec 4 msec
  4 47.47.47.1 [MPLS: Label 16012 Exp 0] 1 msec 3 msec 1 msec
  5 76.76.76.0 3 msec *  2 msec

This is the power of MPLS-TE. The ability to from the headend, define where
the traffic should go. This is essentially source routing. There are also new
features coming out such as segment routing which does something similar by
extending OSPF and ISIS and generating labels through the IGP.

What happens if the explicit path goes down? We can use a dynamic path as a
fallback by setting it to a higher ID in our tunnel configuration.
This is the current configuration of the tunnel:

interface Tunnel0
 ip unnumbered Loopback0
 tunnel mode mpls traffic-eng
 tunnel destination 7.7.7.7
 tunnel mpls traffic-eng autoroute announce
 tunnel mpls traffic-eng path-option 1 explicit name TO_XR1
 tunnel mpls traffic-eng path-option 10 dynamic

We will initiate a ping from IOS1, then I will shutdown IOS4 interface towards IOS3.
Traffic should then go over the upper path again.

IOS1#ping 6.6.6.6 so lo0 re 100000
Type escape sequence to abort.
Sending 100000, 100-byte ICMP Echos to 6.6.6.6, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1 
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!.!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 99 percent (2876/2878), round-trip min/avg/max = 1/3/22 ms
IOS1# 

Only a single packet was lost. We confirm on IOS2 that the secondary path
is being used.

IOS2#show mpls traffic-eng tunnels 

Name: IOS2_t0                             (Tunnel0) Destination: 7.7.7.7
  Status:
    Admin: up         Oper: up     Path: valid       Signalling: connected
    path option 10, type dynamic (Basis for Setup, path weight 20)
    path option 1, type explicit TO_XR1

IOS1 is using the upper path again.

IOS1#traceroute 6.6.6.6 numeric so lo0
Type escape sequence to abort.
Tracing the route to 6.6.6.6
VRF info: (vrf in name/id, vrf out name/id)
  1 12.12.12.1 2 msec 0 msec 0 msec
  2 25.25.25.1 [MPLS: Labels 19/16012 Exp 0] 8 msec 8 msec 6 msec
  3 57.57.57.1 [MPLS: Label 16012 Exp 0] 2 msec 6 msec 10 msec
  4 76.76.76.0 1 msec *  2 msec

Conclusion

MPLS-TE can help overcome some of the limitations that come bundled with
IGPs, such as not utilizing links and not being able to consider constraints,
solely relying on a metric that says nothing about the bandwidth available.
MPLS-TE can be used to provide fast convergence by defining several path options.
It can also be combined with FRR to provide convergence times around 50ms.

Categories: MPLS Tags: , , , ,

Some pointers on OSPF as PE to CE protocol

February 23, 2014 5 comments

There was a discussion at the Cisco Learning Network (CLN) about OSPF as PE to CE
protocol.
I wanted to provide some pointers on using OSPF as PE to CE protocol.

RFC 4577 describes how to use OSPF as PE to CE protocol. When using BGP to carry the
OSPF routes the MPLS backbone is seen as a super backbone. This adds another level of
hierarchy making OSPF three levels compared to the usual two when using plain OSPF.

Superbackbone

Because the the MPLS backbone is seen as a super area 0, that means that OSPF routes
going across the MPLS backbone can never be better than type 3 summary LSA. Even if
the same area is used on both sides of the backbone and the input is a type 1 or type 2
LSA it will be advertised as a summary LSA on the other side.

LSA across superbackbone

The only way to keep the type 1 or type 2 LSAs as they are is to use a sham link.
Sham links sets up a control plane mechanism acting as a tunnel for the LSAs passing
over the MPLS backbone. Sham links are outside the scope of this article.

A LSA can never be “better” than it originally was input as. This means that if the input
to the PE isa type 3 LSA this can never be converted to a type 1 or type 2 LSA on the other
side. If the LSA was type 5 external to begin it will be sent as type 5 on the other side
as well.

To understand how the LSAs are sent over the backbone, look at this picture.

MPBGP

OSPF LSA is sent to PE which is running OSPF in a VRF with the CPE. The PE installs
the LSA as a route in the OSPF RIB. If the route is the best one known to the router
it can install it to the global RIB.

The PE redistributes from OSPF into BGP. Only routes that are installed as OSPF in
the RIB will be redistributed. To be able to carry OSPF specific information the PE
has to add extended communities. To make the IPv4 route a VPNv4 route the PE has
to add the RD and RT values. The OSPF specific communities consist of:

Domain-ID

The domain ID can either be hard coded or derived from the OSPF process running.
It is used to identify if LSAs are sent into the same domain as they originated
from. If the domain ID matches then type 3 summary LSAs can be sent for routes
that were internal or inter area. If the domain ID does not match then all routes
must be sent as external.

Domain ID match

Domain ID 1

Domain ID non match

Domain ID 2

OSPF Route Type

The route type consists of area number, route type and options.

Route Type

If we look at a MPBGP update we can see the route type encoded.

R4#sh bgp vpnv4 uni rd 1:1 1.1.1.1/32
BGP routing table entry for 1:1:1.1.1.1/32, version 5
Paths: (1 available, best #1, table cust)
Flag: 0x820
  Not advertised to any peer
  Local
    2.2.2.2 (metric 21) from 2.2.2.2 (2.2.2.2)
      Origin incomplete, metric 11, localpref 100, valid, internal, best
      Extended Community: RT:1:1 OSPF DOMAIN ID:0x0005:0x000000020200 
        OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:22.22.22.22:0
      mpls labels in/out nolabel/18

Something that is a bit peculiar is that this update has a route type of 2 even though
it originated from a type 1 LSA. In the end it doesn’t make a difference because it will
be advertised as type 3 LSA to the CPE.

OSPF Router ID

The router ID of the router that originated the LSA (PE) is also carried as an extended
community.

R4#sh bgp vpnv4 uni rd 1:1 1.1.1.1/32
BGP routing table entry for 1:1:1.1.1.1/32, version 5
Paths: (1 available, best #1, table cust)
Flag: 0x820
  Not advertised to any peer
  Local
    2.2.2.2 (metric 21) from 2.2.2.2 (2.2.2.2)
      Origin incomplete, metric 11, localpref 100, valid, internal, best
      Extended Community: RT:1:1 OSPF DOMAIN ID:0x0005:0x000000020200 
        OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:22.22.22.22:0
      mpls labels in/out nolabel/18

MED

The MED is set to the OSPF metric + 1 as defined by the RFC.


R4#sh bgp vpnv4 uni rd 1:1 1.1.1.1/32
BGP routing table entry for 1:1:1.1.1.1/32, version 5
Paths: (1 available, best #1, table cust)
Flag: 0x820
  Not advertised to any peer
  Local
    2.2.2.2 (metric 21) from 2.2.2.2 (2.2.2.2)
      Origin incomplete, metric 11, localpref 100, valid, internal, best
      Extended Community: RT:1:1 OSPF DOMAIN ID:0x0005:0x000000020200 
        OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:22.22.22.22:0
      mpls labels in/out nolabel/18

The goal of these extended communities is to extend BGP so that OSPF LSAs can be
carried transparently as if BGP hadn’t been involved at all. LSAs are translated
to BGP updates and then translated back to LSAs.

If we look at a packet capture we can see the extended communities attached.
This BGP Update originated from a type 5 external LSA with metric-type 1.

Capture

When using OSPF as the PE to CE protocol it is important to remember the design
rules of OSPF. Because of that you should avoid designs like this:

OSPF1

In this design area 1 is used on both sides but the CPE is then connected to area 0
which makes it an ABR. The rules of OSPF dictate that summary LSAs must only be
received over area 0 if it is an ABR. This means this topology is broken and would
require changing area or using a virtual link.

OSPF as PE to CE protocol has some complexity but must of it is still plain OSPF
which is in itself a complicated protocol. Combine that with BGP and MPLS and
it is easy to get confused which protocol is responsible for what. That is also
one of the reasons that I recommend to use eBGP or static when customers connect
to their ISP.

Categories: BGP, MPLS, OSPF Tags: , , , , ,

Scaling PEs in MPLS VPN – Route Target Constraint (RTC)

September 23, 2013 13 comments

Introduction

In any decent sized service provider or even an enterprise network running
MPLS VPN, it will most likely be using Route Reflectors (RR). As described in
a previous post iBGP fully meshed does not really scale. By default all
PEs will receive all routes reflected by the RR even if the PE does not
have a VRF configured with an import matching the route. To mitigate this
ineffecient behavior Route Target Constraint (RTC) can be configured. This
is defined in RFC 4684.

Route Target Constraint

The way this feature works is that the PE will advertise to the RR which RTs
it intends to import. The RR will then implement an outbound filter only sending
routes matching those RTs to the PE. This is much more effecient than the default
behavior. Obviously the RR still needs to receive all the routes so no filtering
is done towards the RR. To enable this feature a new Sub Address Family (SAFI) is
used called rtfilter. To show this feature we will implement the following topology.

RTC

The scenario here is that PE1 is located in a large PoP where there are already plenty
of customers. It currently has 255 customers. PE2 is located in a new PoP and so far only
one customer is connected there. It’s unneccessary for the RR to send all routes to PE2
for all of PE1 customers because it does not need them. To simulate the customers I wrote
a simple bash script to create the VRFs for me in PE1.

#!/bin/bash
for i in {0..255}
do
   echo "ip vrf $i"
   echo "rd 1:$i"
   echo "route-target 1:$i"
   echo "interface loopback$i"
   echo "ip vrf forwarding $i"
   echo "ip address 10.0.$i.1 255.255.255.0"
   echo "router bgp 65000"
   echo "address-family ipv4 vrf $i"
   echo "network 10.0.$i.0 mask 255.255.255.0"
done

PE2 will not import these due to that the RT is not matching any import statements in
its only VRF that is currently configured. If we debug BGP we can see lots of messages
like:

BGP(4): Incoming path from 4.4.4.4
BGP(4): 4.4.4.4 rcvd UPDATE w/ attr: nexthop 1.1.1.1, origin i, localpref 100, 
metric 0, originator 1.1.1.1, clusterlist 4.4.4.4, extended community RT:1:104
BGP(4): 4.4.4.4 rcvd 1:104:10.0.104.0/24, label 120 -- DENIED due to:  extended 
community not supported;

In this case we have 255 routes but what if it was 1 million routes? That would be
a big waste of both processing power and bandwidth, not to mention that the RR would
have to format all the BGP updates. These are the benefits of enabling RTC:

  • Eliminating waste of processing power on PE and RR and waste of bandwidth
  • Less VPNv4 formatted Updates
  • BGP convergence time is reduced

Currently the RR is advertising 257 prefixes to PE2.

RR#sh bgp vpnv4 uni all neighbors 3.3.3.3 advertised-routes | i Total
Total number of prefixes 257

Implementation

Implementing RTC is simple. It has to be supported on both the RR and the PE though.
Add the following commands under BGP:

RR:

RR(config)#router bgp 65000
RR(config-router)#address-family rtfilter unicast
RR(config-router-af)#nei 3.3.3.3 activate
RR(config-router-af)#nei 3.3.3.3 route-reflector-client

PE2:

PE2(config)#router bgp 65000
PE2(config-router)#address-family rtfilter unicast
PE2(config-router-af)#nei 4.4.4.4 activate

The BGP session will be torn down when doing this! Now to see how many routes the RR is
sending.

RR#sh bgp vpnv4 uni all neighbors 3.3.3.3 advertised-routes | i Total
Total number of prefixes 0

No prefixes! To see the rt filter in effect use this command:

RR#sh bgp rtfilter unicast all
BGP table version is 3, local router ID is 4.4.4.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
     0:0:0:0          0.0.0.0                                0 i
 *>i 65000:2:1:256    3.3.3.3                  0    100  32768 i

Now we add an import under the VRF in PE2 and one route should be sent.

PE2(config)#ip vrf 0
PE2(config-vrf)#route-target import 1:1
PE2#sh ip route vrf 0

Routing Table: 0
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area 
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP
       + - replicated route, % - next hop override

Gateway of last resort is not set

      10.0.0.0/8 is variably subnetted, 3 subnets, 2 masks
B        10.0.1.0/24 [200/0] via 1.1.1.1, 00:00:16
C        10.1.1.0/24 is directly connected, Loopback1
L        10.1.1.1/32 is directly connected, Loopback1
RR#sh bgp vpnv4 uni all neighbors 3.3.3.3 advertised-routes | i Total
Total number of prefixes 1 
RR#sh bgp rtfilter unicast all                                       
BGP table version is 4, local router ID is 4.4.4.4
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
     0:0:0:0          0.0.0.0                                0 i
 *>i 65000:2:1:1      3.3.3.3                  0    100  32768 i
 *>i 65000:2:1:256    3.3.3.3                  0    100  32768 i

Works as expected. From the output we can see that the AS is 65000, the extended
community type is 2 and the RT that should be exported is 1:1 and 1:256.

Conclusion

Route Target Constraint is a powerful feature that will lessen the load on both your
Route Reflectors and PE devices in an MPLS VPN enabled network. It can also help
with making BGP converging faster. Support is needed on both PE and RR and the BGP
session will be torn down when enabling it so it has to be done during maintenance
time.

Categories: BGP, MPLS Tags: , , ,

MPLS troubleshooting scenario

April 23, 2012 16 comments

I’m in final preparation for my second attempt and I have been doing a lot of troubleshooting scenarios lately. I created a MPLS topology in GNS3 and sent it to my friend Darren for testing. He is taking his lab very soon and he performed well on this lab. The lab contains multiple faults but I won’t say how many since that would spoil some of the surprise.

The assignment is to make sure CE1 can ping CE2 loopback 6.6.6.6.

Post in comments what you did to make it work or if you need a hint to get you going in the right direction. You need to edit the .net file to use your own working dir and IOS image. You need IOS images for 3725 and 7200. Start with the configurations provided by importing the configs or simply pasting them in whatever you prefer but you should not look at the startup config before starting.

Download the .net and config files here.

This is what the topology looks like.

MPLS – notes

December 5, 2010 Leave a comment

  • First defined in RFC 2547
  • Originally called tag switching and was Cisco proprietary
  • MPLS is the open standard
  • Operates at layer 2.5 between switching and routing

Terms used in MPLS:

LER = Label Edge Router – MPLS capable, placed at edge of network.

LSR = Label Switch Router – MPLS capable, note that a LER is also a LSR.

CE = Customer Edge device, demarcation between service provider and customer, CE is often managed by provider.

PE = Provider Edge device, This is the router that the CE connects to.

P = Provider router, used in the core of the provider network.

LSP = Label Switched Path, the path taken between the edge devices, unidirectional path.

Push – The ingress LSR pushes a label onto the packet.

Swap – Swap incoming label with outgoing label.

Pop  – The egress PE pops the label and forwards it according to IP routing table.

BGP free core – The core routers do not need to know routes for MPLS VPN connectivity, just need to know next-hop.

Types of VPN

Overlay VPN –  Layer one or two network with point-to-point links or virtual circuits which separate customer traffic. Customer does not need to peer with ISP, customer is responsible for own routing. Generic Routing Encapsulation (GRE) can also be used to tunnel traffic.

Peer-to-peer VPN –  Provider carries customer traffic but also peers with customer providing routing. Earlier to provide traffic separation, traffic filtering and access-lists had to be used, this is now solved in a much more scalable way with MPLS.

Reasons to use MPLS

  • One infrastructure carrying multiple services and protocols
  • BGP-free core
  • Scalable VPN solutions
  • Traffic engineering
  • Less configuration needed in a fully meshed network than with overlay VPNs

Running MPLS to gain speed is a bogus reason, traffic is forwarded by Application Specific Integrated Circuits (ASICs) and the difference in looking up a route or a label is minimal if any with MPLS.

BGP-free core

Normally a service provider needs to run BGP on all transit routers to know how to reach external prefixes. With MPLS BGP is not needed in the core since they only need to know how to reach the BGP next-hop. This is all great in theory but is this really implemented? This would require that only MPLS is used as transport even for regular IP traffic (non VPN).

MPLS labels

The  MPLS header is four bytes or 32 bits for every label, more than one label can be added to a packet if MPLS VPNS and/or traffic engineering is used. This can add up to three labels with 12 bytes of extra information. This needs to be accounted for on MPLS-enabled interfaces. Of the 32 bits in the header 20 bits are used for the label itself, this means that roughly one million labels are available. Labels 0-15 are reserved. There are also three experimental bits (EXP). These bits are used for Quality of Service (QoS) and aren’t really experimental at this stage. One bit is used to indicated Bottom of Stack (BoS). If this is set to one it means that this label is the final one in the stack. There is also Time To Live which uses eight bits, just as in an IP header.

 FEC

Forward Equivalence Class (FEC) is a group of packets that are forwarded along the same path and that get the same treatment. All packets belonging to a FEC use the same label, however not all packets with the same label belong to the same FEC.

Examples of FEC

  • Packets with layer three destination adress matching a certain prefix
  • Multicast packets that belong to the same group
  • Packets that have equal Diffserv markings

Label distribution modes

Downstream on Demand – LSR requests label from downstream neighbor (IP next hop) and receives one label for FEC.

Unsolicited Downstream – Each LSR distributes a remote label to its adjacent LSRs without them requesting it. DoD will produce only one label in LIB but UD can produce several. UD is default in Cisco IOS except for ATM interfaces.

Label retention modes

Liberal Label Retention (LLR) keeps all labels in LIB even those that will not end up in LFIB. The best goes to LFIB and others are kept in LIB in case of routing event which forces
reconvergence. Label for other next-hop will already be in LIB which means faster convergence.

Conservative Label Retention (CLR) keeps only label for next-hop in LIB. Default for ATM.

LSP control modes

Independent LSP control mode creates a local binding for FEC independent of other LSRS. It
will do this as soon as it recognizes a FEC meaning it is in the routing table. This will happen even if it is not egress LSR.

Ordered LSP control mode creates local binding if it is the egress LSR for the FEC or if it
has received a label from the next hop for the FEC.

Reserved labels

0 – Explicit null – Instead of popping label at PHP, the second last router sets top label to zero, this means EXP bits are preserved.
1 – Router alert – Alerts LSR that packet needs a closer look. Can’t be forwarded in hardware, software needed.
2 – Explicit null for IPv6
3 – Implicit null – Used for PHP, penultimate router pops label and egress LSR only needs to do IP lookup (advertised for directly connected and summaries)
14 – OAM alert

LDP

Hello packets sent to multicast address 224.0.0.2 over UDP.  TCP used to setup session. Uses TCP port 646. Hello is sent every five seconds, holdtime is 15 seconds by default. Timers above are used for discovery. When session is established a keepalive packet is sent every 60 seconds and the holdtime is 180 seconds. LDP packets will reset the holdtime. Assigns local label for every IGP prefix and is stored in LIB. All prefixes in IGP will get locally assigned label and all these prefixes are advertised to neighbors, even if neighbor owns prefix (no split horizon).

MPLS VPN

Neighbor ip-address as-override – Used to allow same AS as configured locally in AS-path, replaces the AS nr with the service providers AS.

allowas-in – Loosens loop check by allowing updates with own AS number in AS path.

SOO – Site Of Origin, used to prevent loops in MPLS VPN, every site has unique SOO which is an extended community.

Outer label also called IGP label used for finding next-hop in provider network. Inner label is VPN label used to find the right VRF for egress PE. IGP label is sent via LDP, based on routing table. VPN label and VPNv4 prefixes are sent via MP-BGP.

Categories: CCIE, MPLS, Notes Tags: , ,