nanog mailing list archives

Re: Pinging a Device Every Second


From: Olav Kvittem <olav.kvittem () uninett no>
Date: Mon, 17 Dec 2018 07:44:12 +0000

Hi,

The link is not the only component to fail - routers and routing protocols all contribute at least as much.
If your customers would have redundant connections,
you also would like to look at convergence times.
So a measurement end to end by a probe in the customers network could give
you a more true picture.
Facing that even sub second outages can annoy a video meeting,
it might be that you want to  poll more often than a second.

Realizing that your "internet service" depends on the behaviour of all all the other
service providers quality and if you even start monitoring that - you understand that
you are "in deep shit" ;-)


I did a small scale global inter domain measurement and discovered that the sheer number of small outages is way too 
high.

Many of them  might be routing changeovers in  multi-redundant networks.

cheers
Olav

On 15.12.2018 18:55, Tim Pozar wrote:
In one of my client's company, we use LibreNMS. It is normally used > to get SNMP data but we also have it configured 
to ping our more > "high touch" cients routers. In that case we can record performance > such as latency and packet 
loss. It will generate graphs that we can > pass on to the client. It also can be set to alert us if a client's > 
router is not pingable. > > LibreNMS can also integrate Smokeping if you want Smokeping-style > graphs showing 
standard deviation, etc. > > Currently I am running LibreNMS on a VM on a Proxmox cluser with a > couple of cores. It 
is probing 385 devices every 5 minutes and > keeping up with that. In polling, SNMP is the real time and CPU hog > 
where ping is pretty low impact. > > Tim > > On 12/15/18 9:37 AM, Baldur Norddahl wrote: >> You could configure BFD 
to send out a SNMP alert when three packets >> have been missed on a 50 ms cycle. Or instantly if the interface >> 
charges state to down. This way you would know that they are down >> within 150 ms. >> >> BFD is the hardware 
solution. A Linux box that has to ping 1000 >> addresses per second will be very taxed and likely unable to do >> 
that in a stable way. You will have seconds where it fails to do >> them all followed by seconds where it attempts to 
do them more than >> once. The result is that the statistics gathered is worthless. If >> you do something like this, 
it is much better to have a less >> ambitious 1 minute cycle. >> >> Take a look at Smokeping. If you want a graph to 
show the quality >> of the line, Smokeping makes some very good graphs for that. >> >> Regards Baldur >> >> 15. dec. 
2018 16.49 skrev "Colton Conor" <colton.conor () gmail com ><mailto:colton.conor () gmail com>> <mailto:colton.conor 
() gmail com><mailto:colton.conor () gmail com>>: >> >> How much compute and network resources does it take for a NMS 
to: >> >> 1. ICMP ping a device every second 2. Record these results. 3. >> Report an alarm after so many seconds of 
missed pings. >> >> We are looking for a system to in near real-time monitor if an end >> customers router is up or 
down. SNMP I assume would be too >> resource intensive, so ICMP pings seem like the only logical >> solution. >> >> 
The question is once a second pings too polling on an NMS and a >> consumer grade router? Does it take much network 
bandwidth and CPU >> resources from both the NMS and CPE side? >> >> Lets say this is for a 1,000 customer ISP. >> >> 



Current thread: