Archive for the ‘Ethernet’ Category

Ethernet, STP, Topology change and the behaviour of Ethernet

June 24, 2014 2 comments


This post is inspired by a post at IEOC about Uplinkfast and TCN which
can be found here.

Before we get to those parts, let’s recap how Ethernet and STP work together.

Spanning Tree

The Spanning Tree Algorithm builds a loop free tree by comparing Bridge ID(BID) and
least cost paths to the root bridge. By doing this it blocks all links not leading
to the root.


MAC Learning

Switches learn where to forward frames by looking at the source MAC address of the frame
on the port that the frame was received on. This learning is done in the data plane
as opposed to routing where the routes are learned in control plane. I will come back
to this later in the post.

MAC learn1

S4 learns that A is located on port 1 after A has sent a frame. This is stored in
the MAC address table located in Content Addressable Memory (CAM). The CAM is a
fast memory optimized for quick lookups in the table. By default there is a 300
second aging timeout for learned MAC addressesm, meaning that if the switch
does not see any traffic from a source MAC within five minutes the entry will
age out of the table. This is used to remove stale entries and to keep the
MAC address table from becoming too large.

Potential Issues

As I mentioned briefly earlier in the post, MAC learning is done in the data plane.
When we exchange routes through protocols such as OSPF, EIGRP and BGP, this is
done in the control plane. If there is a /24 route in the routing table pointing
at a router, then those up to 254 hosts are behind that router. With MAC learning
every source MAC has its own entry, which would be the same as if we had /32 routes
for every host in the network. Not very effecient! This can also become a scalibility
issue in large networks if there are more hosts than the CAM can hold.

There are also other issues such as not being able to use all the links in the
network. Spanning tree will block the redundant links so we don’t get more bandwidth
if we add more links unless we put them into an Etherchannel or use technologies
such as vPC. In datacenter designs, using STP will lead to low bisectional bandwidth,
meaning that even if there are lots of links between a section in the network, most of
them will actually be blocked.

Another issue is that broadcast and unknown unicast traffic is flooded in the network.
Imagine a scenario as below where A is sending unicast traffic to B and it’s
an unidirectional flow. B rarely sends any traffic so its entry has been aged out
of the MAC address table.

Unknown unicast

In this scenario the unknown unicast will be flooded to all the switches and
all servers will have to receive the 300 Mbit/s stream and then discard the
traffic until the switches have learned the MAC of B again!

There is also a potential for black holing of traffic. In the topology below there
are four switches connected together and the primary path is through S4-S1-S2-S3.


Then the link between S1 and S2 fails.


When using 802.1D, there is no synchronization of the topology. It will take up to
50 seconds for the link between S3 and S4 to come up unless Backbonefast has been
deployed. When traffic is going from A to B, it will be blackholed. S4 still has an
entry for B towards S1. When the traffic reaches S1 it has nowhere to go.
Without aging of stale entries, this would take up to five minutes. This is
the purpose of topology change in STP, to faster age out stale entries.

Topology Change

Like I described above, without a mechanism for topology change, traffic could
potentially be black holed for quite a while. In 802.1D, when a link goes up
or down, the switch will generate a TCN BPDU which is a special BPDU sent out
the root port. Normally switches only relay BPDUs from the root on their designated
ports but this is a special case. A switch that receives a TCN BPDU will reply
to it with a configuration BPDU with the TC Acknowledge bit set.


The TCN BPDU will eventually reach the root which will then send out a configuration
BPDU with the TC bit set. This is done for a duration of MaxAge + FwDelay
seconds which is 20 + 15 seconds by default.


When switches receive this BPDU from the root with the TC bit set, they will age out
entries in the CAM at a faster pace. The aging timeout will be set to 15 seconds.
This will age out any stale entries in the CAM. If there are active flows they will
not be aged out because the age will be reset as the switch sees frames coming in
with the source MAC in question. As I described earlier there could be unidirectional
flows leading to flooding. Also flows that are inactive for a while and then resume
can get flooded if their entries time out during the period that the root bridge is
sending out these configuration BPDUs with TC set.


Uplinkfast is a feature deployed on access switches which have dual links to
the distribution layer. Because the switches are located at the edge of the network
it is safe to bring up an alternate port immediately without going through the regular
listening and learning phase, saving up to 30 seconds.

After a switch has failed over to the alternate link it will start to send out
dummy multicast frames. This is to speed up convergence. Even if a configuration
BPDU with TC set is sent by the root, it can still take up to 15 seconds before
stale entries age out.


So based on the thread at IEOC, what is the consequence of Uplinkfast and TC together?
The configuration BPDU with TC is sent for 35 seconds by default. Dummy multicast frames
will be sent out for a duration that is unknown. It depends on how many entries there are
in the CAM and the rate that the packets are sent at. So depending on when the multicast
frame is sent and if you have an unidirectional flow or a host gone silent, then yes
the configuration BPDU with TC could be counter productive. Traffic would reach its
destination though but it would be through flooding of the traffic.

In reality I doubt this would be much of an issue and most networks would be running
RSTP today. RSTP works differently by synchronizing the topology and when the TC bit
is set in BPDUs the entire CAM is flushed on all ports except where the BPDU was

Cisco Flex link

November 14, 2013 9 comments


Flex link is a Cisco solution which replaces STP in certain network topologies. It
works by detecting link down on a primary interface and then bringing up the backup
interface that has been defined as backup. It is most commonly implemented at the access
layer where the switch has dual uplinks to the distribution layer.

Flex link

How does it work?

Under the primary interface the backup interface is defined with the switchport backup
interface command. This command can be applied to L2 links or portchannels. The backup
interface is kept in down state until the primary fails. Under normal conditions traffic
will flow through the primary interface so all dynamic MAC entries are learned via the
primary interface.

As soon as the primary interface goes down the backup interface is brought online.
These things happen when the primary fails:

  • All dynamic MAC entries are moved to the backup interface
  • Moves the backup link into a forwarding state
  • Transmit dummy multicast frames to multicast destination 01:00:0c:cd:cd:cd
  • The source of these frames are the sources learned by the switch on its local ports

This is quite similar to the STP Uplinkfast feature. However with Flex link no BPDUs are
transmitted and STP is disabled on the interfaces that are enabled for Flex link.
Bringing the backup interface up is very fast and should take less than a second. To send
out dummy multicast frames the MAC-address table move update feature needs to be enabled.


Preemption is disabled by default. Enabling preemption means that the primary interface
will be brought into forwarding when it comes back. There is a preemption delay that can
be set to prevent flapping. Enable preemption if you have a primary interface of
higher bandwidth than the backup one.

Load balancing

Flex link can support load balancing. This means that one interface is primary for a set
of VLANs and backup for other VLANs and vice versa. Enable this if you need to use both
uplinks to support the amount of traffic exiting the switch.

Advantages of Flex links

What are the advantages of Flex link?

  • Light weight, no BPDUs transmitted.
  • Fast to converge
  • The topology is deterministic and not subject to STP reconverging due to misconfig

Disadvantages of Flex link

There are always negative sides with every solution/protocol in networking. It’s always
a choice to make to make the right design.

  • Relies on link down to detect failure
  • Can’t detect unidirectional links
  • Can’t detect wonky SFP or hardware failure not leading to link down
  • Risk of loops in certain topologies

Flex link could be used together with UDLD to solve some of these issues.

Risk of loops

So how could a loop be formed with Flex link? The first scenario is that someone
accidentally connects two access switches together.

Flex link loop 1

Because Flex link has no concept of STP if the link between the access switches is
brought into forwarding a loop has formed. This could be stopped by implementing BPDU
guard on all non uplink ports.

There could also be a situation where a link is added between the access and distribution
layer and because the Flex link does not consume/send BPDUs a loop could form.

Flex link loop 2


Flex link is a STP replacement from Cisco that works by bringing up an backup interface
when the primary interface has gone link down. It is light weight and fast but relies
on links going physically down. It also has the risk of loops in certain topologies.
It’s a viable solution where STP is not wanted due to buying a L2 service from a
provider or such to not mix STP with the provider.

Categories: Ethernet Tags: , , , ,

Resilient Ethernet Protocol (REP)

November 11, 2013 8 comments


I’m writing a short summary of REP as part of my CCDE studies. REP is an alternative protocol
used in place of STP and is most often run in ring based topologies. It is not limited to
these topologies however and it can also interact with STP if there is a desire to do so.
REP is Cisco proprietary, other vendors have similar protocols like EAPS from Extreme Networks.

Basic REP

REP uses the concept of segments. A segment ID is configured on all switches
belonging to the same segment. Two edge ports are selected where the REP
segment ends. These edge ports must not have connectivity with each other.

One port is blocking and this port is called the alternate port. All other
ports are transit ports.


Traffic flows towards the edge ports.

REP port roles

REP ports are either failed, open or alternate.

  • All regular segment ports start out as failed ports
  • After adjacencies have been determined, ports move to Alternate state. After negotiations on Alternate port is done the remaining ports move to open state while one port stays in Alternate state.
  • When a failure occurs on a link all ports move to failed state. When the Alternate port receives the notification it is moved to open state.

Failure Detection

REP does not work the same way that EAPS does. EAPS sends out a poll on one port
and expects to see it back on the other port facing the ring. It has a master node
that is responsible for this action.

REP works by detecting link failure (Loss of Signal). REP also forms adjacencies
with directly connected switches. Because the main method of converging is to detect LoS
that means that the network should be designed without converters or shared segments that
could affect the detection of a failure. REP Link Status Layer (LSL) is responsible for
detecting REP aware neighbors and establishing connectivity within a segment. After
connectivity has been setup, REP will choose which port is to be alternate and the other
ports will be forwarding. The alternate port can also manually be selected if desired.


Like mentioned earlier the main mechanism is to detect Loss of Signal. In the rare case
that the interface does not go down but connectivity it lost, REP must rely on timers.
The default is that the interface will stay up for five seconds when LSL hellos have
not been received from a neighbor.

When a link fails a notification is sent to a multicast destination address. This notification
is flooded in hardware speeding up the convergence. When a switch receives the notification
it must flush its L2 MAC table.

Interaction with STP

REP can interact with STP by generating TCN BPDUs. This could be desirable if you run REP
in a metro network and then have STP running in the network above that. Generally though
it would be best to not have that a large L2 segment so the REP segment should be
connected to a PE that runs MPLS/IP to the core.

End Port Advertisements

Starting from the edge ports End Port Advertisements (ESA) are sent out every four seconds.
These messages are used to discover the REP topology. The messages are relayed by all
intermediate ports and means that all the switches in the same segment knows what the
topology looks like and the state of all the ports in the segment. This can also be used
to see what the topology looked like before a failure because REP has an archive feature.

Other features of REP

REP supports preemption, meaning that when a failed link comes back the network can go
back to what it looked like before the failure. Manual preemption can also be used but
it will cause a temporary loss of traffic.

REP also supports VLAN load balancing meaning that the topology can look different
depending on the VLAN. However REP is not per VLAN in the sense that the hellos are
always sent on one VLAN compared to PVST+/RPVST+ which sends BPDUs per VLAN.
REP uses a concept of administrative VLAN which can be configured, the default is
to use VLAN 1.


Like any control plane protocols that are running in our networks, they can be open for
attacks. What would happen if someone faked PDUs for REP trying to make the network
converge in an unexpected manner or kept sending these PDUs to flap ports at a
very high rate.

Obviously this could be a dangerous scenario. Cisco thought of this and implemented a key
mechanism that starts from the Alternate port. The key consists of a port ID and a random
generated number created when the port activates. This key is distributed through the
segment to the other devices which can then use this key to unblock the alternate port.


REP is a Cisco proprietary protocol mainly used in metro based ring networks. It is likely
to converge faster than STP and can achieve best case convergence of around 50 ms. REP
can interact with STP by sending TCN BPDUs. REP is a similar technology to EAPS with some
differences. REP is supported on Cisco ME switches.

In the future I think protocols like REP and EAPS will start to fade away as metro based
networks go all MPLS/IP.

Categories: Convergence, Ethernet Tags: , , , ,

Detecting Network Failure

September 26, 2013 7 comments


In todays networks, reliability is critical. Reliability needs to be high and
convergence needs to be fast. There are several ways of detecting network failure
but not all of them scale. This post takes a look at different methods of
detection and discusses when one or the other should be used.

Routing Convergence Components

There are mainly four components of routing convergence:

  1. Failure detection
  2. Failure propagation (flooding)
  3. Topology/Routing recalculation
  4. Update of the routing and forwarding table (RIB and FIB)

With modern networking networking equipment and CPUs it’s actually the first
one that takes most time and not the flooding or recalculation of the topology.

Failure can be detected at different level of the OSI model. It can be layer 1, 2
or 3. When designing the network it’s important to look at complexity and cost
vs the convergence gain. A more complex solution could increase the Mean Time
Between Failure (MTBF) but also increase the Mean Time To Repair (MTTR) leading
to a lower reliability in the end.


Layer 1 Failure Detection – Ethernet

Ethernet has builtin detection of link failure. This works by sending
pulses across the link to test the integrity of it. This is dependant on
auto negotiation so don’t hard code links unless you must! In the case of
running a P2P link over a CWDM/DWDM network make sure that link failure
detection is still operational or use higher layer methods for detecting

Carrier Delay

  • Runs in software
  • Filters link up and down events, notifies protocols
  • By default most IOS versions defaults to 2 seconds to suppress flapping
  • Not recommended to set it to 0 on SVI
  • Router feature

Debounce Timer

  • Delays link down event only
  • Runs in firmware
  • 100 ms default in NX-OS
  • 300 ms default on copper in IOS and 10 ms for fiber
  • Recommended to keep it at default
  • Switch feature

IP Event Dampening

If modifying the carrier delay and/or debounce timer look at implementing IP
event dampening. Otherwise there is a risk of having the interface flap a lot
if the timers are too fast.

Layer 2 Failure Detection

Some layer 2 protocols have their own keepalives like Frame Relay and PPP. This
post only looks at Ethernet.


  • Detects one-way connections due to hardware failure
  • Detects one-way connections due to soft failure
  • Detects miswiring
  • Runs on any single Ethernet link even inside a bundle
  • Typically centralized implementation

UDLD is not a fast protocol. Detecting a failure can take more than 20 seconds so
it shouldn’t be used for fast convergence. There is a fast version of UDLD but this
still runs centralized so it does not scale well and should only be used on a select
few ports. It supports sub second convergence.

Spanning Tree Bridge Assurance

  • Turns STP into a bidirectional protocol
  • Ensures spanning tree fails “closed” rather than “open”
  • If port type is “network” send BPDU regardless of state
  • If network port stops receiving BPDU it’s put in BA-inconsistent state


Bridge Assurance (BA) can help protect against bridging loops where a port becomes
designated because it has stopped receiving BPDUs. This is similar to the function
of loop guard.


It’s not common knowledge that LACP has builtin mechanisms to detect failures.
This is why you should never hardcode Etherchannels between switches, always
use LACP. LACP is used to:

  • Ensure configuration consistence across bundle members on both ends
  • Ensure wiring consistency (bundle members between 2 chassis)
  • Detect unidirectional links
  • Bundle member keepalive

LACP peers will negotiate the requested send rate through the use of PDUs.
If keepalives are not received a port will be suspended from the bundle.
LACP is not a fast protocol, default timers are usually 30 seconds for keepalive
and 90 seconds for dead. The timer can be tuned but it doesn’t scale well if you
have many links because it’s a control plane protocol. IOS XR has support for
sub second timers for LACP.

Layer 3 Failure Detection

There are plenty of protocol timers available at layer 3. OSPF, EIGRP, ISIS,
HSRP and so on. Tuning these from their default values is common and many of
these protocols support sub second timers but because they must run to the
RP/CPU they don’t scale well if you have many interfaces enabled. Tuning these
timers can work well in small and controlled environments though. These are
some reasons to not tune layer 3 timers too low:

  • Each interface may have several protocols like PIM, HSRP, OSPF running
  • Increased supervisor CPU utilization leading to false positives
  • More complex configuration and bandwidth wasted
  • Might not support ISSU/SSO


Bidirectional Forwarding Detection (BFD) is a lightweight protocol designed to
detect liveliness over links/bundles. BFD is:

  • Designed for sub second failure detection
  • Any interested client (OSPF, HSRP, BGP) registers with BFD and is notified when BFD detects loss
  • All registered clients benefit from uniform failure detection
  • Uses UDP port 3784/3785 (echo)

Because any interested protocol can register with BFD there are less packets
going across the link which means less wasting of bandwidth and the packets
are also smaller in size which reduces this even more.

Many platforms also support offloading BFD to line cards which means that the
CPU does not get increased load when BFD is enabled. It also supports ISSU/SSO.

BFD negotiates the transmit and receive interval. If we have a router R1
that wants to transmit at 50 ms interval but R2 can only receive at 100 ms
then R1 has to transmit at 100ms interval.

BFD can run in asynchronous mode or echo mode. In asynchronous mode the BFD
packets go to the control plane to detect liveliness. This can also be combined
with echo mode which sends a packet with a source and destination IP of the
sending router itself. This way the packet is looped back at the other end
testing the data plane. When echo mode is enabled the control plane packets
are sent at a slower pace.

Link bundles

There can be challenges running BFD over link bundles. Due to CEF polarization
control plane/data plane packets might only be sent over the same link. This
means that not all links in the bundle can be properly tested. There is
a per link BFD mode but it seems to have limited support so far.

Event Driven vs Polled

Generally event driven mechanisms are both faster and scale better than polling
based mechanisms of detecting failure. Rely on event driven if you have the option
and only use polled mechanisms when neccessary.


Detecting a network failure is a very important part of network convergence. It
is generally the step that takes the most time. Which protocols to use depends
on network design and the platforms used. Don’t enable all protocols on a link
without knowing what they actually do. Don’t tune timers too low unless you
know why you are tuning them. Use BFD if you can as it is faster and uses
less resources. For more information refer to BRKRST-2333.

The history of Ethernet – DIX vs 802.3

June 6, 2012 14 comments

I’m planning to do a post on BPDUs sent by Cisco switches and analyze why they are sent. To fully understand the coming post first we need to understand the different versions of Ethernet. There is more than one version? Yes, there is although mainly one is used for all communication.

Most people will know that Robert Metcalfe was one of the inventors of Ethernet. Robert was working for Xerox back then. Digital, Intel and Xerox worked together on standardizing Ethernet. This is why it is often referred to as a DIX frame. The DIX version 1 standard was published in 1980 and the version used today is version 2. This is why we refer to Ethernet II or Ethernet version 2. The DIX version is the frame type that is most often used.

IEEE was also working on standardizing Ethernet. They began working on it in February 1980 and that is why the standard is called 802 where 802.3 is the Ethernet standard. We refer to it as Ethernet even though when IEEE released their standard it was called “IEEE 802.3 Carrier Sense Multiple Access with Collision Detection (CSMA/CD)
Access Method and Physical Layer Specifications”. So here we see the term CSMA/CD for the first time.

I’m not here to give you a history lesson but instead explain the frame types and briefly discuss the fields in them. We start with the DIX frame or Ethernet II frame. This is the frame that is most commonly used today. It looks like this.

The preamble is a pattern of alternating ones and zeroes and ending with two ones. When this pattern is received it is known that anything that comes after this pattern is the actual frame.

The source and destination MAC is used for switching based on the MAC.

The EtherType field specifies that upper level protocol. Some of the most well known ones are:

0x0800 – IP
0x8100 – 802.1Q tagged frame
0x0806 – ARP
0x86DD – IPv6

After that follow the actual payload which should be between 46 – 1500 bytes in size.

In the end there is a Frame Checking Sequence (FCS) which is used to check the validity of the frame. If the CRC check fails the frame is dropped.

In total the frame will be maximum 1514 bytes or 1518 if counting the FCS.

When it comes to 802.3 Ethernet there are actually two frame formats. One is 802.3 with 802.2 LLC SAP header. It looks like this.

This was the original version from the IEEE. Many of the fields are the same. Let’s look at those that are not.

The preamble is now divided in preamble and Start Frame Delimiter (SFD) but the function is the same.

The length field is used to indicate how many bytes of data are following this field before the FCS. It can also be used to distinguish between DIX frame and 802.3 frame as for DIX the values in this field will be higher e.g. 0x806 for ARP. If this value is greater than 1536 (0x600 Hex) then it is a DIX frame and the value is an Ethertype value.

Then we have some interesting values called DSAP, SSAP and Control. SAP stands for Service Access Point, the S and D in SSAP and DSAP stands for source and destination.

They have a similar function as the Ethertype. The SAP is used to distinguish between different data exchanges on the same station. The SSAP indicates from which service the LLC data unit was sent and the DSAP indicates the service to which the LLC data unit is being sent. IP has a SAP of 6 and 802.1D (STP) has a SAP of 42. It would be very strange to have a different SSAP and DSAP so these values should be the same. IP to IP would be SSAP of 06 and DSAP of 06. One bit (LSB) in the DSAP is used to indicate if it is a group address or an individual address. If it is set to zero it refers to an individual address going to a Local SAP (LSAP). One bit in the SSAP (LSB) indicates if it is a command or response packet. That leaves us with 128 possible different SAPs for SSAP and DSAP.

The contol field is used to select if communication should be connection-less or connection-oriented. Usually error recovery and flow control are performed by higher level services such as TCP.

The IEEE had problems to address all the layer 3 processes due to the short DSAP and SSAP fields in the header. This is why they introduced a new frame format called Subnetwork Access Protocol (SNAP). Basically this header is using the type field found in the DIX header. If the SSAP and DSAP is set to 0xAA and the Control field is set to 0x03 then SNAP encapsulation will follow. SNAP has a five byte extension to the standard 802.2 LLC header and it consists of a 3 byte OUI and a two byte Type field.

From a vendor perspective this is good because then they can have an OUI and then create their own types to use. If we look at PVST+ BPDUs from a Cisco device we will see that they are SNAP encapsulated where the organization code is Cisco (0x00000c) and the PID is PVSTP+ (0x010b). CDP is also using SNAP and it has a PID of CDP (0x0200). I will talk more about BPDUs and STP in a following post but first I wanted to provide the background on the Ethernet frame types used.

In summary there are three different Ethernet frame types used. DIX frame, also called Ethernet II, IEEE 802.3 with LLC and IEEE 802.3 with SNAP encapsulation. There are others out there as well but these are the three major ones and the DIX one is by far the most common one.

Categories: CCIE, Ethernet Tags: , , , , , ,

Ethernet – notes

December 16, 2010 Leave a comment

RJ 45 pinouts

10-BASE-T and 100BASE-TX uses pairs two and three, gigabit Ethernet uses all four pairs.
Pinout for straight cable: 1-1;2-2;3-3;6-6
Pinout for crossover cable: 1-3;2-6;3-1;6-2

A standard PC transmits on pair one and two and receives on three and six. A switchport is
the opposite. If two alike devices are connected a crossover cable should be used although
MDI-X is a standard today.

Cisco switches can detect the speed of a link through Fast Link Pulses (FLP) even if autonegotiation is disabled but the duplex can not be detected and this means that half duplex must be assumed. This is true for 10BASE-T and 100BASE-TX. Gigabit Ethernet uses all four pairs in the cable and can only use full duplex mode of operation. Also note that for gigabit Ethernet autonegotiation is mandatory although it is possible to hardcode speed and duplex .

Ethernet uses Carrier Sense Multiple Acess/Collision Detection (CSMA/CD). Before a client can send a frame it listens to the wire to see that it is not busy. It sends the frame and listens to ensure a collision has not occured. If a collision occurs all stations that sent a frame send a jamming signal to ensure that all stations recognized the collision. The senders of the original collided frames wait for a random amount of time before sending again.

Deferred frames

Frames that were meant to be sent but were paused because frames were being received at the moment. If in half duplex sending and receiving can not occur at the same time.


Collisions that are detected while the first 64 bytes are being transmitted are called collisions and collisions detected after the first 64 bytes are called late collisions.


Provides synchronization and signal transitions to allow proper clocking of the transmitted signal. Consists of 62 alternating one and zeroes and then ends with a pair of ones.

I/G bit and U/L bit

The I/G bit is placed in the most significant byte and the most significant bit of the MAC address. If set to zero it is an Individual (I) address and if set to one it is a Group (G) address. Multicast at layer two always sends to 01.00.5E which means that the G bit is set. The bit before the I/G bit is the U/L bit, this indicates if it is an Universally (U) administerad address or an Locally (L) assigned address. If it is an MAC address set by a manufacturer this should be set to zero.


SPAN and RSPAN are used to mirror traffic. The source of traffic can be a VLAN or a switchport or a routed port. Traffic can be mirrored from both rx and tx or just one of them. SPAN sends the traffic to a local destination port, RSPAN sends the traffic to a RSPAN VLAN which is used to transfer the traffic to its destination. Note that some layer two frames are not sent by default including CDP, VTP, DTP, BPDU and PagP, to include these use the command encapsulation replicate. SPAN is configured with the monitor session command.

Categories: CCIE, Ethernet, Notes Tags: , ,

The facts of Ethernet – Round three

August 9, 2010 Leave a comment

The previous post talked about autonegotiation. This time I will talk about cables and pinouts and how auto MDIX works. Although I’m not very old I still like to do it the old school way. I don’t rely on auto MDIX, instead I use the right cable. Lets look at a pinout for T568B:

A regular end device like a PC transmits on pin one and two and receives on pin three and six. Although we have four pairs only two are actually used, unless we are using gigabit Ethernet but that is another topic. A device like a switch does the opposite, it receives on pin one and two and sends on three and six. This is why we use a straight through cable. When connecting similar devices like a switch to a switch we need to use a cross over cable since they want to send on the same pins and receive on the same. So when choosing a cable remember that similar devices requires cross over and different devices needs a straight through.

An engineer at HP developed the auto MDIX standard since he was tired of looking for cross over cables. But how does it work?

The NIC expects to receive Fast Link Pulses (FLP) on pins three and six. If it receives FLPs it will know that the configuration is correct. If it doesn’t receive FLP’s it will switch over to MDI-X mode. This is a very simplified view of it, the process involves different timers and a XOR algorithm. If you want to know more check out the IEEE 802.3 specification section 3, clause 40.4.4.

Categories: Ethernet Tags: , , ,

The facts of Ethernet – Round two

August 7, 2010 1 comment

Autonegotiation – Either you love it or you hate it but pretty much everyone has an opinion on it. I was going to write something more lengthy at first but decided a blog was the wrong place.

Autonegotiation works by sending eletrical pulses. In 10Base-T these are called Normal Link Pulses (NLP). They are sent every 16th ms with a tolerance of 8 ms. They are only sent when the Network Interface Card (NIC) is not receiving or sending traffic. They look like this:

In the fast Ethernet standard (802.3u) these are called Fast Link Pulses (FLP) and they look like this:

These electrical pulses lets us determine the speed and duplex mode that is available in autonegotiation. The priority for choosing a speed and duplex mode goes like this:

  • 1000Base-T – Full duplex
  • 1000Base-T – Half duplex
  • 100Base-T2 – Full duplex
  • 100Base-TX – Full duplex
  • 100Base-T2 – Half duplex
  • 100Base-T4
  • 100Base-TX – Half duplex
  • 10BaseT – Full duplex
  • 10BaseT – Half duplex

If one side is set to auto and the other side hardcoded parallell detection kicks in. Parallell detection can determine the speed by looking at the format of the electrical pulses it is receiving from its link partner. Duplex can’t be detected so that will default to half duplex. This is why we sometimes see links with 100/half duplex. If one side is auto and the other 100/full the auto side will be set to 100/half.

Half duplex is of course very bad, it leads to frame errors, dropped packets and late collisions.

The facts of Ethernet round one

August 5, 2010 Leave a comment

Ethernet is the most used layer 2 protocol today and it’s dominance is not likely to end anytime soon. I decided to make a section with some quick facts about Ethernet. There is a lot to know about Ethernet but we usually neglect this because we are very focused on IP. Take a  look at an Ethernet frame:

The preamble field is not known to many people. It won’t show up in a packet capture since the network card will already have stripped it before it’s available for capture. So what is the purpose of preamble? The preamble field contains a synchronization pattern that consists of alternating ones and zeros and ends with two consecutive ones. It is used to synchronize node communication but also to indicate where the frame start. Because it is not processed in the same way as the rest of the frame we do not have to count the eight bytes of preamble when calculating Ethernet frame size. This is what preamble looks like:


Categories: Ethernet Tags: ,