nanog mailing list archives

Re: Cisco ASR9902 SNMP polling ... is interesting


From: "ab.nanog--- via NANOG" <nanog () lists nanog org>
Date: Tue, 5 Aug 2025 11:37:00 -0700

On Tue, 5 Aug 2025 15:45:57 +0000
"LJ Wobker (lwobker) via NANOG" <nanog () lists nanog org> wrote:

Ah, a breath of fresh air.  Thank you for your response.  I
definitely agree with "No one uses the same terms for anything" having
worked at F5 for a while.  Trunks... what are trunks?  Depends on who
you ask and where they work.
 
Wow, what a food fight this became. 
No one uses the same terms for anything, so some terminology...  We
(cisco) broadly call the infrastructure "LPTS":  Local Packet
Transport Service.  The act of identifying that a packet needs to go
up to the control plane we call "punting".  Every modern system from
every vendor has SOME form or fashion for this, otherwise it's
trivial to melt the system with traffic pointed at the control CPU.
But no one uses the same words.

Drew - I'm sorry you don't like the way my router works.  This hurts
my feelings, because he's really a pretty good little router.  Let's
see if we can figure out why.  In this case, there's lots of possible
places things can behave in ways you don't like.

First question... when you say "we poll SNMP on any interface" -- do
you mean you're changing the target IP address for where you point
the SNMP manager, where sometimes it's the management ethernet
address and sometimes a regular interface address?  This matters
because IN GENERAL (yes, I know...) the system behaves differently
here.  Packets pointed at the management ethernet are run through a
different set of policers than if you're pointed at a data plane
interface.  IN GENERAL the "best" way to do something like this is
with a loopback interface, as the defaults are "better" tuned for
that config compared to a direct zap at the actual interface IP.
This also has the benefit of virtualizing the loopback so you aren't
tied to a single point of failure, but that's a separate thing.

I'm not remotely surprised that the behavior is different from the
9901 to the 9902.  At risk of being an apologist for my
implementation, even within a product family there are always
(sometimes stupid) differences in the implementations.  

I can ABSOLUTELY ASSURE you that there is nowhere in the code that
says "make 62% of the SNMP polls fail because we hate Drew".  This is
not how our system works... somewhere in the path there's a policer
or a meter that is either dropping some of the inbound requests, or
the SNMP process is choking on something and timing out, or something
like that.  But there is no such thing on the router side as an SNMP
polling timeout - that is a client side thing.  The SNMP process on
the router gets a request, and it sends a response, that's all.  If
something (either external or within the labyrinth of internal
protections) drops the request on the way in, SNMP never sees it, so
it can't respond.  Then the client has to figure out what to do,
which often is throw a timeout and/or retry -- but this is dependent
on the implementation of the SNMP client, and there's nothing that
the router OS can do about it.

As someone mentioned along the way, the right way to troubleshoot
this is to find the commands in XR that will show you the counters
and potential drops between "the packet arrives at the box" and "SNMP
did its thing with the packet".  I have to sadly admit that here I'm
one of those old-ass Air Force Colonels who USED to be a hot-shit
pilot, but now I fly a desk.  12 years ago I could have told you
chapter and verse what the commands are and where all the drop/meter
counters live, but father time is undefeated and now I spend time
apologizing on NANOG lists instead of having an actual lab to work
on.  That said, your expectation that someone in TAC can figure out
what's happening and explain it to you is totally reasonable, and if
you're not getting those answers then escalating is correct.  We
might not be able (or willing) to change the behavior to do things
the way you like them, but we absolutely owe you an explanation of
what's actually happening.  If you can't this from TAC, let me know
and I will attempt to shake that tree.

At LEAST the following things would need to be chased down, some of
which we'd have to get from the customer side...
* which interface(s) are being polled?  MgmtEth, loopback, physical?  
* at what rate does the SNMP station generate and send request
packets?  (Time windows matter here.  A short but very fast burst of
requests might trip the meter, stuff like that)
* can this rate be changed?
* how much stuff (i.e. MIBs) are you polling? 

Anyway... hopefully that points you at least somewhat in the right
direction.

--lj

-----Original Message-----
From: Mel Beckman via NANOG <nanog () lists nanog org> 
Sent: Monday, August 4, 2025 10:42 AM
To: Tom Beecher <beecher () beecher cc>
Cc: nanog () lists nanog org; Mel Beckman <mel () beckman org>
Subject: Re: Cisco ASR9902 SNMP polling ... is interesting

Sorry, Tom. I’m not taking the bait.

-mel via cell

On Aug 4, 2025, at 7:02 AM, Tom Beecher <beecher () beecher cc> wrote:


Mel-

You have made multiple technical assertions in this thread that are
demonstrably false. Quoting your earlier messages :

  1.  Also, non-management interfaces do packet processing in silicon
at the ASIC level and don’t have the capacity to do anything more
than statistical sampling of packets that require CPU-level
processing to retrieve counters and generate SNMP responses. 62 % is
as good a sampling rate as any other. 2.  Cisco is likely to say that
the control plane is only fully supported on the management port. 3.
In-band SNMP to data forwarding interfaces violates that separation.

 You have attempted to frame these comments as :

honest and sincere attempts by other members to help identify the
possible problem.

While your attempts to help may have been honest and sincere attempts
to help the OP, they actually achieved the opposite effect. Your
incorrect technical assertions , if anything, only hindered the OP's
attempt to understand and identify their issue. Comment #1 is
especially egregious ; you're telling Drew that his observations are
*normal*.

Saku made 2 comments that addressed these falsehoods :

It might be easier to contribute, if there is familiarity to the
subject matter.

some community member piled on with what can only be described as a
bizarre drivel.

The first was a polite way of calling out the technical inaccuracies.
The second was a more forceful way of stating "what you said was
wrong". Most people, when they are corrected on a factual point, tend
to reply with "Oh hey, I got that wrong, thanks for setting me
straight" and move on. You seem to have just ignored it.

There is a massive difference between the following statements :

  1.  You are an idiot. [ Attacking the person ]
  2.  What you said was idiotic. [ Attacking the statements ]

It seems to be that you may be struggling in identifying that
difference, and taking *any* criticism as a personal attack.

Nobody is bullying you, or anybody else, in this conversation.





On Mon, Aug 4, 2025 at 9:42 AM Mel Beckman via NANOG
<nanog () lists nanog org<mailto:nanog () lists nanog org>> wrote: Thanks.
I knew we were not so out to lunch! If you don’t push back on
bullies, they take over the community. It crops up on nanog
periodically. :(

-mel via cell

On Aug 4, 2025, at 5:54 AM, Joe Loiacono via NANOG
<nanog () lists nanog org<mailto:nanog () lists nanog org>> wrote:

Hi Mel, for what it's worth, I could not figure out what they were 
referring to by Saku's comments. I saw no justification for their 
complaint. A bit out of character for Saku, also,

Joe
 
On 8/2/2025 7:23 PM, Mel Beckman via NANOG wrote:
I’ll just let the incivility of you both stand.

-mel

On Aug 2, 2025, at 3:52 PM, Tom Beecher
<beecher () beecher cc<mailto:beecher () beecher cc>> wrote:


Mel-

Saku did not call *you* any names. He called your *incorrect
statements* in this thread 'bizzard drivel'. Which he is
absolutely correct about. While your intentions may certainly have
been to help, your statements here have been frankly dead wrong
and did not accomplish that.

Probably just want to take the L here.


On Sat, Aug 2, 2025 at 5:34 PM Mel Beckman via NANOG
<nanog () lists nanog org<mailto:nanog () lists nanog org><mailto:nanog () lists nanog org<mailto:nanog () lists 
nanog org>>>
wrote: Saku,

What is actually appalling is that a member of NANOG calls
“bizarre drivel” the honest and sincere attempts by other members
to help identify the possible problem. There’s no cause to be
uncivil, people can disagree without stooping to name-calling.

 -mel
 
On Aug 2, 2025, at 11:46 AM, Saku Ytti via NANOG
<nanog () lists nanog org<mailto:nanog () lists nanog org><mailto:nanog () lists nanog org<mailto:nanog () lists 
nanog org>>>
wrote:  

On Sat, 2 Aug 2025 at 21:02, Tom Beecher via NANOG 
<nanog () lists nanog org<mailto:nanog () lists nanog org><mailto:nanog () lists nanog org<mailto:nanog () lists 
nanog org>>>
wrote: 
I don't have in depth knowledge of Cisco's SNMP implementations,
or even the ASR platform specifically, but if Cisco TAC is
telling you this is 'normal', they are completely full of shit,
and you should click any and every 'escalate' button you can
find.

This almost sounds like a default control plane DDOS policer /
LPTS , something like that.  
There are various complicated reasons for this, LPTS policer is 
unlikely culprit, but possible. Bug search will show various DDTS 
with poor SNMP performance outcome, most of them are unrelated to
LPTS.

But absolutely correct, the right solution is to escalate. In
common case this would be SE from your account team, who would
fight for you internally.


It is appalling that OP came to nanog after correctly suspecting
TAC is gaslighting them, some community member piled on with what
can only be described as a bizarre drivel.
--
 ++ytti
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/
7KXUNRGFI5OEVSDEDU2OL5VMY5NBGQCV/  
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/C
F3QHVTISL6LDFTOWG4E3KK54QEDHUIY/ 
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/O
J7ICXLSPFND32X2XS2U7XIWA6DALSIF/  
_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/E4
CF2TFV35VSJVFEZZANEWOAJFUUNDL4/  
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/RU6WF77QOECXABP6IDCMVNLAH67X4WNW/
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/3NCOGL6SHARKHBT2TJRK4W7ZOP2BO2BW/
_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/LE6LLRVDEOQK3R5JO3G3QSIRYYICRQIZ/

_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/FKA4KUHH7PXFTBLRAWEVI2YDGLBF5MXR/

Current thread: