nanog mailing list archives

Re: Sites unreachable while traversing Dallas IXP


From: Mel Beckman via NANOG <nanog () lists nanog org>
Date: Fri, 26 Sep 2025 05:33:22 +0000

I’m assuming you’ve tried the obvious “it’s the cable stupidity” rule outs such as replacing the involved physical 
components like cables or SFPs. After that, the problem likely is LACP configuration.


As you may know, LACP doesn't use a single "LACP algorithm" for distributing packets across links. Instead you 
configure one of the available hash-based distribution functions the two endpoints have in common.  The hash uses 
packet header information to distribute outgoing traffic across the LAG. Common hash algorithms include options to 
balance traffic based on combinations of Layer 2, 3 and 4 addresses, such as source and destination MAC addresses, 
source and destination IP addresses, or source and destination TCP/UDP ports. The best choice depends on the specific 
network traffic and desired distribution.


I’ve found that sometimes with LAGs between different equipment vendors one or more of these algorithms aren’t 
compatible, resulting in packets out of order or even dropped.


For example, Cisco and Juniper have different implementations of LACP hashing with similar names. But under the covers, 
Juniper allows finer-grained control over the specific Layer 2, Layer 3, and Layer 4 fields used for hashing through 
the forwarding-options hash-key configuration, while Cisco offers just a few fixed hash modes like Layer 2, Layer 3, 
and Layer 4, with the specific details of the hashing algorithm being proprietary.


In my own experience, packet loss on Cisco-Juniper LACP links has arisen from inconsistent or incompatible 
configurations. You can troubleshoot by checking LACP status and interface counters on both sides, ensuring compatible 
settings like LACP rate. I’ve even seen duplex flapping! Be sure to look at logs on both ends for hardware errors or 
weird messages. If the issue persists, try adjusting LACP parameters, and testing using single active member links.


Have you tried switching to a different algorithm?


 -mel beckman

On Sep 25, 2025, at 8:22 PM, Andy Cole via NANOG <nanog () lists nanog org> wrote:

Group,
 I've been Peering with both Route Servers in the Dallas IX for over a
month using a single 10G link with no issues. Due to capacity concerns I
had to augment to a 20G LAG. In order to do this, I shut the existing link
down (which dropped both eBGP sessions), used the existing IP space to
create the LAG, and then added the 2nd 10G link. The eBGP sessions
reestablished over the LAG and traffic started flowing error free. No
configuration changes to routing policy at all.  After a few days we
started to get customer complaints for certain sites/domains being
unreachable. I worked around the issue by not announcing the customer
blocks to the route servers and changed the return path to traverse
transit. This solved the issue, but I'm perplexed as to what could've
caused the issue, and where to look to resolve it.  If you guys could
provide feedback and point me in the right direction I'd appreciate it. TIA.

~Andy
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/VQJ37BWPQRQYQB6QMWG6E6SVUDHNYDTO/
_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/WL2A4FVEA36XP52LRYKQEXRWB55HL37S/

Current thread: