nanog mailing list archives
Seeking operator feedback on low-footprint network metadata probe with the goal of open-sourcing
From: Fanch FRANCIS via NANOG <nanog () lists nanog org>
Date: Thu, 4 Dec 2025 18:35:32 +0000
Hi all,
I’m part of a small team that’s been working on network visibility and security for some time now and we ended up
developping a software network probe from scratch that we are considering to open-source. Our ask is not really should
we, more how much of it? Would any of you use that and how?
Now, for a bit of context.
For years we have used the usual stack: Zeek plus homegrown glue for databases, dashboards, and so on.
That works well enough in many environments (I mean, that’s what META uses for it’s own DC, so it does work alright),
but in our own environment… we repeatedly hit limits in a few places: small edge boxes, noisy OT/telco environments,
MSP-style multi-tenant deployments, and links where bandwidth drops are painful. At some point we stopped trying to
patch more on top of Zeek and started building our own internal network probe instead 6 years ago (time flies by when
you code fun stuff).
We are now trying to decide how much of it (if any) should be open-sourced, and I’d like to sanity-check that with
people who run similar tooling in production.
What the probe does (high level)
This is NANOG, so a network probe (DPI-based), shouldn’t be strange for a lot of you ;-)
For those who are not familiar, it’s a tool that captures packets and turns them into enriched metadata / DB-ready
records (flows, protocols, selected network metadata, assets, etc.).
Some operational characteristics:
*
At around 1 Gbit/s sustained line rate, we are currently in the ballpark of:
These numbers are still being tuned, but that’s the order of magnitude.
* 2 CPU cores and 8GB of RAM
* ~10 Mbit/s of metadata sent to the database
*
~1 Mbit/s stored on disk after compression
That's for the lean part, now for the carrier-grade:
*
Internal testing, plus commercial test gear (we had it certified by Spirent Communication), shows no false detections
at full ine-rate 100 Gbit/s (145.000.000 pps, for around 10 to 12.000.000 news sessions per second) for 135+ network
protocols (L2–L7) and nearly 4000+ applications, on simple commodity hardware (no FPGA/ASIC; just good old CPU and RAM).
We are not using DPDK. For higher-speed use cases we ended up writing our own NIC drivers in Rust, too. For small links
at full line rate with full protocol analysis, resource usage is roughly an order of magnitude lower than what we
observed with Zeek or Suricata in equivalent scenarios (happy to share benchmark details if useful).
What were our Design goals, in brief:
* Low footprint: able to run “where Zeek/Suricata hurt”: on-prem systems, VMs, small Kubernetes worker nodes, cloud
workloads, on both small x86 or ARM edge boxes (thanks to Rust), etc.
* Simple deployment: a single static Rust binary with no dynamic dependencies. Drop it on a recent Linux host,
point it at an interface, and it starts capturing. There is an installer with a CLI mode for use with Ansible/other
automation. Optionally, a dockerized DB pipeline for ClickHouse/Postgres.
* Fleet-oriented: usable at the scale of hundreds or thousands of probes in an MSP / distributed environment.
* Outputs: JSON over HTTP / REST API, plus structured schemas for ClickHouse/Postgres so operators can plug in
their own analytics, detections, or reporting.
* Implementation: full Rust codebase, with a focus on predictability and safety rather than ad-hoc packet tricks
that reduce visibility or telemetry quality.
Why we didn’t just stick with Zeek
This is not “Zeek bad, our code good”. We simply had a different set of constraints.
The main drivers were:
* Resource footprint when deploying probes directly on Kubernetes worker nodes, small cloud instances, or ARM edge
devices. We also wanted to reuse the same probe design when monitoring much higher-speed links on commodity hardware.
* Fleet and multi-tenant operation: the need to deploy, manage, and upgrade a large number of probes in an MSP /
MSSP context, with clear separation between tenants.
* Tighter control over metadata shape and volume: so that DB / storage does not explode in noisy environments. Our
past Zeek deployments filled an Elastic cluster in a couple of days and often forced us to rebuild that instance; we
wanted more predictable volume control.
The result is a probe that overlaps with Zeek/Suricata functionally, but with different trade-offs.
Open source, open core, or something else?
Internally we are debating what would actually be useful to open-source for the operator community, versus what (if
anything) should remain “product”.
The rough options we see:
* Open-source the probe engine and protocol parsers, so operators can run and extend it themselves and build their
own services / UX on top.
* Open-source primarily the DB schemas, ingestion pipeline, and operational tooling, while keeping the probe itself
closed.
* Keep the entire stack closed and offer it only as a self-hosted / appliance / cloud solution.
Before we spend months going down any of these routes, I would really value operator feedback.
Specific questions for the readers courageous enough to have reached this point in the post ;-)
1.
Does this actually fill a gap for you, or is your current setup “good enough”?
If you have deployed Zeek / Suricata / nProbe / NTOP / similar in anger, would you even look at something like this?
2.
If some part of it were open-sourced, what would be most useful to you in practice?
* Core probe and parsers?
* Schemas / ingestion pipeline / deployment tooling?
* SDKs / libraries to embed in your own systems?
* Something else entirely?
3.
Licensing / model concerns:
Are there licenses that are an immediate “no” (e.g. AGPL)?
Would “core open-source with additional commercial features” be acceptable, or is that a non-starter in your
environment?
4.
How you would realistically consume it:
In your networks, would you be more likely to:
* run it as a self-hosted binary on your own infrastructure,
* deploy it as some kind of appliance,
* or consume it as a managed service that delivers metadata or alerts?
5.
What would make you discard it immediately?
Examples: excessive resource usage, awkward integration model, unclear security story, problematic license, unclear
long-term maintenance, etc.
This is not a product announcement, beta signup, or marketing exercise. There are no links in this message. I am trying
to avoid spending time open-sourcing the wrong components, or doing it in a way that doesn’t match how operators would
actually use such a tool.
If you have fought with network telemetry in production, I would appreciate hearing “this would be useful if X/Y/Z” or
“we wouldn’t bother, because…”.
I am happy to answer technical questions and take blunt feedback, on- or off-list, if this is of interest.
Best regards,
Fanch
Fanch FRANCIS, PhD
CEO
+33 6 14 60 05 47
https://calendly.com/fanch-nanocorp/visio
https://www.nanocorp.ai/<https://141-nanocorp.trakqit.com/?u=https:%2F%2Fwww.nanocorp.ai%2F&e=d4d55ac50f741f8ca2a25dfe80934e92>
[signature_2808701025]
[https://141-nanocorp.trakqit.com/img/d4d55ac50f741f8ca2a25dfe80934e92]
_______________________________________________
NANOG mailing list
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/IEP7YJUOWCTXHD6GDFUK5JZSPY5SM4WN/
Current thread:
- Seeking operator feedback on low-footprint network metadata probe with the goal of open-sourcing Fanch FRANCIS via NANOG (Dec 04)
