nanog mailing list archives

Re: Cisco ASR9902 SNMP polling ... is interesting


From: James Bensley via NANOG <nanog () lists nanog org>
Date: Sun, 03 Aug 2025 07:11:12 +0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On Friday, 1 August 2025 at 15:10, Drew Weaver via NANOG <nanog () lists nanog org> wrote:


Hello,

Hi Drew.

I haven't worked with IOS-XR for a few years but I have had problems with SNMP in the past.

A few years ago I was deploying 9904 chassis with a modest amount of services on them (not thousands of services per 
chassis, but hundreds, so they weren't idle, but certainly not under any mentionable load control-plane wise).

We noticed that SNMP polling was returning nothing for some of the services and it ended up being a couple of problems 
compounding. At that time we had virtually every 9xxx and 99xx chassis in the network. This problem only exists with 
these boxes, but they were also the only routers in the network with this exact combination of services on them. So 
nothing chassis specific I believe, this was on IOS-XR 6.something for reference.

When the SNMP process received the poll request, it in turn fires off requests internally to other processes to get the 
stats being asked for. This is/was (I'm out of touch now) a maximum amount of time SNMP would wait for the other 
processes to respond. If they didn't respond in time the SNMP response was sent without those details, or the query 
which was pending an answer was just dropped and no response sent. So problem number one was those other processes 
taking too long to respond.

Problem number two was those other processes had a bug; after provisioning services those processes hadn't pick up on 
the changes. The request came from the SNMP process to the other processes for stats relating to X, the other processes 
had no knowledge of X.

TAC provided us with a short term work around, which was to restart some processes after provisioning new services, to 
ensure the processes were aware of the new services and would respond to the SNMP process with the requested stats. 
Long term they created a DDTS and SMU to fix the inter-process timeout issues and missing stats issues.

I don't know exactly what you're polling, and like I said, I'm a bit out of touch here, but I can say that it took 
quite a lot of digging and working with TAC to bottom out the problem. We could replicate the issue in the lab which 
always helps. So if you can replicate the issue in the lab, and turn all debugging settings up to 11, you might be able 
to find something like we did (TAC sent some debug commands and we could trace the issue in the lab, IPC debgging is 
hard on these boxes!). Even if TAC are trying to fob you off by saying "oh yeah this is dropped by LTSP as expected", 
get them to prove it to you; replicate the issue in the lab and gather the debug info which shows how/where the request 
is being dropped, if they can't find the drop in LTPS, then LTPS isn't the problem and you need to look else were like 
IPC/EOBC.


Cheers,
James.

-----BEGIN PGP SIGNATURE-----
Version: ProtonMail

wsG5BAEBCgBtBYJojwuDCZCoEx+igX+A+0UUAAAAAAAcACBzYWx0QG5vdGF0
aW9ucy5vcGVucGdwanMub3Jne6/4gXRiD1B/oyx0cm03xe+bPfK4lh4ErWip
GQvWH9oWIQQ+k2NZBObfK8Tl7sKoEx+igX+A+wAAlZAP/3DFVyR1e2DiJ7bv
4udRjmX0xLtEpkZM7UJGwhihiIiqW/JV+TyqEq75Ko4Hu9xOiOURkz+VkBx6
XfgbrFuXxPT/i4NhcMZ8qygSBwoAQK4Z6CIeXf9msWnly259hA5F88SB/oCc
LKOjcH6hNHVI2+5jSIMJFqNVkD/3b2eSIF3ZHbdWsZ+uq6QRMMvM7gOHuJAm
0mCiOBTUbN4oIziQdN0u3tbWVgIWulC2TyM8wy2FGyN+r5ks/jqmZQhlTASo
u+9kPtBZ4SQc0p9GwvYZN4XHXQtcftx7xrPymmXhwU+3UaE70YoSZuJVULE+
eGipYUDUiQ9OA9pj39BWZe6fpRLqgoeEl6GDiavHYLcfw3CVkMwThPUGDRFX
RDNxKpebdPEZHzsJyvqORgM+/RHYIAgqOOQIQdiZGbaiIxa8ooT06WJRkNWO
iKL2jOkXndbbxWenyw4RNZwVX50H1Y79eqUxhU24yiA0Wfs6qVCRZWP3M//g
a+BJwOBqb8gFmuJErvezWUPUNIt94UhEv8aFpVtPZ7R4IIpPzFBFlLUV4HEK
F5IU9JgqvyBagubAPeIOoUk0+DboE4gGBPTz9RGWSfdxM+D5pX/HWBh8qIwB
prO6hDk3PkkGAk4/fhd5jNmGk0hE0yKyTubE711vIJ9vXD1dJbqKgoOjSA18
t315dumB
=LkYJ
-----END PGP SIGNATURE-----

Attachment: publickey - lists+nanog@bensley.me - 0x3E936359.asc.sig
Description:

_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/LFEK3EROE2TNHT7KOSM5WMW5HXGR4LQL/

Current thread: