Solaris Link Aggregation
From Genunix
Link Aggregation is the process of turning multiple physical Ethernet links into a single logical one. Formally, IEEE 802.3ad Link Aggregation, is the predecessor to IEEE 802.3 Trunking, a change that occurred in 2000 when 802.3ad was accepted.
Aggregations are known also as "Trunks", "Port Trunks", "Teaming", "Port Teaming", and "Bonding". Cisco's varient of 802.3ad is branded "EtherChannel". Its important to note than in some cases "Teaming" and "Bonding" also refer to link multipathing, which is handled on Solaris by means of IP Multipathing (IPMP) and is separate from link aggregation.
It is extremely important to understand: Link Aggregation does *not* work by passing packets across all the links in an aggregate group in a round robin fashion. When a packet arrives a simple calculation is made by XOR'ing the source and destination addresses (which can be L2, L3, or L4) modulo the link id. The result is that any given source-destination pair will be "pinned" to one of the links in the aggregate. Hence a single TCP connection can never achieve speeds surpassing the throughput of a single link. Therefore, while you might aggregate 4 1Gbps links into a single aggregate, you'll never get more than 1Gbps in any single data transfer. In order to test aggregates you should run multiple tests in parallel.
Switch Config: It's worth noting that the balancing algorithm you choose on the host should match the switch. If it doesn't, you'll end up with asymetric data flows and in some cases random behaviour. L4 balancing is usually only available on fairly "high end" switches. Most L3 switches default to using IP source and destination addresses. Pure L2 switches only support XOR'ing on the MAC addresses if they support link aggregations at all.
The Emeryville 172 storage network is based on aggregating at least 2 ports on each system to render a full throughput of 2Gbps.
[private:/tmp] root# /sbin/ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
aggr1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 172.16.165.6 netmask ffff0000 broadcast 172.16.255.255
ether 0:14:4f:20:dc:1
e1000g0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 10.71.165.6 netmask ffffff00 broadcast 10.71.165.255
ether 0:14:4f:20:dc:0
[private:/tmp] root# dladm show-aggr
key: 1 (0x0001) policy: L4 address: 0:14:4f:20:dc:1 (auto)
device address speed duplex link state
e1000g1 0:14:4f:20:dc:1 1000 Mbps full up attached
e1000g2 0:14:4f:20:dc:2 1000 Mbps full up attached
e1000g3 0:14:4f:20:dc:3 1000 Mbps full up attached
Contents |
Modifying Aggregates
Interfaces can dynamically be added or removed from an aggregate.
[atlantis:/] root# dladm show-aggr
key: 1 (0x0001) policy: L4 address: 0:14:4f:3f:b7:42 (auto)
device address speed duplex link state
e1000g2 0:14:4f:3f:b7:42 1000 Mbps full up attached
e1000g3 0:14:4f:3f:b7:43 1000 Mbps full up attached
[atlantis:/] root# dladm remove-aggr -d e1000g3 1
[atlantis:/] root# dladm show-aggr
key: 1 (0x0001) policy: L4 address: 0:14:4f:3f:b7:42 (auto)
device address speed duplex link state
e1000g2 0:14:4f:3f:b7:42 1000 Mbps full up attached
[atlantis:/] root# dladm add-aggr -d e1000g3 1
[atlantis:/] root# dladm show-aggr
key: 1 (0x0001) policy: L4 address: 0:14:4f:3f:b7:42 (auto)
device address speed duplex link state
e1000g2 0:14:4f:3f:b7:42 1000 Mbps full up attached
e1000g3 0:14:4f:3f:b7:43 0 Mbps half down standby
Statistics
Statistics per device, link, or aggregation can be found with the appropriate subcommand and the -s argument.
[atlantis:/] root# dladm show-aggr -s
key: 1 ipackets rbytes opackets obytes %ipkts %opkts
Total 2357125 273500473 2524810 3729981936
e1000g2 880647 111667510 2514674 3728911734 37.4 99.6
e1000g3 1476478 161832963 10136 1070202 62.6 0.4
[atlantis:/] root# dladm show-link -s
ipackets rbytes ierrors opackets obytes oerrors
e1000g0 16035714 4689453445 0 4283198 4691755344 0
e1000g1 10175974 973179872 0 15377977 9061905989 0
e1000g2 0 0 0 0 0 0
e1000g3 0 0 0 0 0 0
aggr1 2357147 273504377 0 2524831 3729984088 0
[atlantis:/] root# dladm show-dev -s
ipackets rbytes ierrors opackets obytes oerrors
e1000g0 16036762 4689521110 0 4283231 4691758802 0
e1000g1 10176143 973196714 0 15378199 9061924565 0
e1000g2 880710 111675568 0 2514705 3728914930 0
e1000g3 1476508 161836413 0 10146 1071246 0
Additionally, tools like nicstat can be used:
[private:/tmp] root# ./nicstat 10
Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat
00:06:01 aggr0/1 1314.9 1570.7 6524.8 6022.9 206.4 267.1 2.36 0.00
00:06:01 e1000g1/0 665.4 244.8 3432.7 673.9 198.5 372.0 0.75 0.00
00:06:01 aggr1 1314.9 1570.7 6524.8 6022.9 206.4 267.1 2.36 0.00
00:06:01 e1000g3/0 453.4 100.5 2396.0 259.3 193.8 396.8 0.45 0.00
00:06:01 e1000g0 0.06 0.28 0.91 0.85 66.59 332.0 0.00 0.00
00:06:01 e1000g2/0 196.1 1225.4 696.1 5089.7 288.4 246.5 1.16 0.00
00:06:01 e1000g0/0 0.06 0.28 0.91 0.85 66.59 332.0 0.00 0.00
Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat
00:06:11 aggr0/1 1255.7 2369.5 5859.5 5996.3 219.4 404.6 2.97 0.00
00:06:11 e1000g1/0 778.3 1155.6 3210.3 1657.3 248.3 714.0 1.58 0.00
00:06:11 aggr1 1255.7 2369.6 5859.7 5996.7 219.4 404.6 2.97 0.00
00:06:11 e1000g3/0 351.7 61.27 2056.8 278.8 175.1 225.0 0.34 0.00
00:06:11 e1000g0 0.14 0.61 2.29 3.18 64.00 196.5 0.00 0.00
00:06:11 e1000g2/0 125.8 1153.7 593.1 4061.4 217.1 290.9 1.05 0.00
00:06:11 e1000g0/0 0.14 0.61 2.29 3.18 64.00 196.5 0.00 0.00
LACP
Several arguments to dladm are used to support LACP:
-l mode
--lacp-mode=mode
Specifies whether LACP should be used and, if used, the
mode in which it should operate. Legal values are off,
active or passive.
-T time
--lacp-timer=time
Specifies the LACP timer value. The legal values are
short or long.
-L
--lacp
Specifies whether detailed LACP information should be
displayed.
Examples:
[private:/tmp] root# dladm show-aggr -L
key: 1 (0x0001) policy: L4 address: 0:14:4f:20:dc:1 (auto)
LACP mode: off LACP timer: short
device activity timeout aggregatable sync coll dist defaulted expired
e1000g1 passive short yes no no no no no
e1000g2 passive short yes no no no no no
e1000g3 passive short yes no no no no no
Enabling LACP
Example of temporarily enabling LACP in passive mode:
[atlantis:/] root# dladm show-aggr -L
key: 1 (0x0001) policy: L4 address: 0:14:4f:3f:b7:42 (auto)
LACP mode: off LACP timer: short
device activity timeout aggregatable sync coll dist defaulted expired
e1000g2 passive short yes no no no no no
e1000g3 passive short yes no no no no no
[atlantis:/] root# dladm modify-aggr -t -l passive 1
[atlantis:/] root# dladm show-aggr -L
key: 1 (0x0001) policy: L4 address: 0:14:4f:3f:b7:42 (auto)
LACP mode: passive LACP timer: short
device activity timeout aggregatable sync coll dist defaulted expired
e1000g2 passive short yes no no no yes no
e1000g3 passive short yes no no no yes no
BREAKS STORAGE NETWORK!
[atlantis:/] root# ping -s 172.16.165.6
PING 172.16.165.6: 56 data bytes
^C
[atlantis:/] root# dladm modify-aggr -t -l off 1
[atlantis:/] root# dladm show-aggr -L
key: 1 (0x0001) policy: L4 address: 0:14:4f:3f:b7:42 (auto)
LACP mode: off LACP timer: short
device activity timeout aggregatable sync coll dist defaulted expired
e1000g2 passive short yes no no no yes no
e1000g3 passive short yes no no no yes no
[atlantis:/] root# ping -s 172.16.165.6
PING 172.16.165.6: 56 data bytes
64 bytes from 172.16.165.6: icmp_seq=0. time=0.385 ms
64 bytes from 172.16.165.6: icmp_seq=1. time=0.253 ms
64 bytes from 172.16.165.6: icmp_seq=2. time=0.247 ms
^C
----172.16.165.6 PING Statistics----
3 packets transmitted, 3 packets received, 0% packet loss
round-trip (ms) min/avg/max/stddev = 0.247/0.295/0.385/0.078
Policies
From the dladm(1M) man page:
-P policy
--policy=policy
Specifies the port selection policy to use for load
spreading of outbound traffic. The policy specifies
which dev object is used to send packets. A policy con-
sists of a list of one or more layers specifiers
separated by commas. A layer specifier is one of the
following:
L2 Select outbound device according to source and
destination MAC addresses of the packet.
L3 Select outbound device according to source and
destination IP addresses of the packet.
L4 Select outbound device according to the upper
layer protocol information contained in the
packet. For TCP and UDP, this includes source
and destination ports. For IPsec, this includes
the SPI (Security Parameters Index.)
LACP Mode
LACP can be put into one of 3 modes: off, active, or passive.
-l mode, --lacp-mode=mode
Specifies whether LACP should be used and, if used,
the mode in which it should operate. Legal values
are "off", "active" or "passive".
LACP controls modification of the link. Future testing should demonstrate the following:
- How does removal of a aggregate member (simulated failure) effect the link in different LACP modes?
- Isn't the point of LACP to enable aggregation without hard-coding the target ports?
Benchmarks
Using Network benchmarking pathload and pathrate the following results were found on the improperly configured aggregates in Emeryville (sung-to-jennifer, e1000g2/3 aggr1 through Dell PowerConnect):
- Capacity: 950 Mbps
- Available Bandwidth: 923.08 - 1090.91 (Mbps) (There was one spike to 1714.29Mbps)
Aggregations on the Wire
LACP advertisements are made via Multicast from the destination address 01:80:C2:00:00:02. The following is an example packet as viewed with snoop:
ETHER: ----- Ether Header ----- ETHER: ETHER: Packet 112 arrived at 10:57:24.26997 ETHER: Packet size = 128 bytes ETHER: Destination = 1:80:c2:0:0:2, (multicast) ETHER: Source = 0:1:e8:d5:b6:4c, ETHER: Ethertype = 8809 (Unknown) ETHER:
In the above packet the destination 1:80:c2:0:0:2 is the multicast destination for LACP advertisements [1]. 0:1:e8:d5:b6:4c is the Force10 MAC address. Ethertype 8809 is an IEEE 802.3 [2] packet.
See Also
- Blog of Nicolas Droux: Solaris Link Aggregations (1): The Architecture
- Blog of Nicolas Droux: Link Aggregation vs IP Multipathing
- Aggregating Ports from Dell PowerConnect 62xx Manual
- Dell PowerConnect 6248
- BugID 6538146: Native Link Aggregation should support round robin policy
- BugID 6491179: link aggregation with e1000g does not work unless snoop is running (Fixed in snv_58)
- BugID 6326664: aggr does not support jumbo frames (Fixed in snv_50)
- BugID 6373974: dladm aggregations don't provide HA
Attribution
This content was donated by Joyent.
