Educause Security Discussion mailing list archives

Re: How much host data collected?

From: "Bridges, Robert A." <bridgesra () ORNL GOV>
Date: Mon, 30 Apr 2018 15:51:59 +0000

Alan,

* What's "host security data?"

I think we are interested in data that can be used for diagnosing both security-related incidents (intrusions,
breaches), and misconfigurations. Any system logs, host-based IPS logs etc. we're interested in understanding.

* Are you breaking things out by service?

No, but we’ve been having a general understanding that workstation monitoring is generally different (higher number of
IPs, less data per IP, different data) than servers.

* Are you considering the differences in OSes?

This has been a necessity. Folks we talk to generally collect greater sets of system logs for Windows than other OSes
on workstations. Is that true in your (everyone out there?)’s operations?

* Is compressibility a factor?

I haven’t considered this. We’ve been defaulting to whatever the impact is for the resource (memory, HD, disk IO on the
host device, and HD space to store logs if you store logs) for whatever format it is in.

* Are you interested in event counts or raw byte counts for data?

We are interested in bytes per IP per day, and number of IPs. Ideally can estimate the cost of security using cloud
costs (for each host we need x amount of memory to run the AV, y amount of disk space..), which is directly
translatable to $.

Overall, our goals are to understand what host data is collected and how much (in terms of bytes per host per day).
We're informing future research efforts. We'd like to know, e.g., the cost of security (e.g., how much memory on the
host is used? How much HD space is used to store data? .. ) and then if we can find ways to lower the cost but increase
the signal (e.g., only collect high fidelity data after some alert has tripped).

If anyone has more information about what and how much data you collect, we’d be interested. Or if there are ideas for
next-generation tools research can pursue—e.g., turning on audit logs only after some event?

Similarly, if anyone can give costs of an intrusion, that’d be interesting for estimating the opposite side of the
coin, i.e., when security is insufficient.

Thanks

Bobby

Robert A. Bridges, PhD, Research Mathematician, Cyber & Information Science Research Group, Oak Ridge National
Laboratory

On 4/26/18, 3:37 PM, "Alan Amesbury" <amesbury () oitsec umn edu> wrote:

On Apr 19, 2018, at 20:32 , Bridges, Robert A. <bridgesra () ORNL GOV> wrote:

> What is the average amount of host security data your SOC collects per host, per day?

[snip]

It's hard to say without knowing the full extent of what "security data" entails. Some questions that come to mind
include:

* What's "host security data?" There's a great deal of overlap between "security" and

"operations" as far as I'm concerned. For example, log data generated by the latter

domain will almost certainly contain information the former domain finds interesting.

However, others might consider system logs to not be "security data."

* Are you breaking things out by service? I'm also not sure whether "average" will

suffice as a reasonable measure, given that a web server's logs are going to likely

be very different than logs from another kind of server, e.g., mail, DHCP, LDAP,

domain controller, etc. Workstations (i.e., users' hosts) are also an entirely

different category (maybe multiple ones?), too.

* Are you considering the differences in OSes? Different OSes also log at

significantly different levels depending on their settings. Windows hosts, for

example, can produce MASSIVE amounts of data when compared to a Unix host.

* Is compressibility a factor? Some log formats are binary, which may not compress

very well. Text formatted logs may compress *extremely* well, at better than 10:1.

* Are you interested in event counts or raw byte counts for data? There's a vast

difference between storing 1000 events and storing 1000 bytes of event data.

Data can generally be stored pretty cheaply. Filesystems like ZFS can provide transparent data compression and
scale to pretty large sizes while maintaining data integrity (it checksums the data, checksums the metadata, and then
checksums the checksums, if I recall correctly, and can use distributed parity to reconstruct corrupted data). If
you're talking about being able to *use* the data, then costs tend to go up. Tools available can range from about zero
software costs to thousands or millions of dollars depending on scale, ease of use, and a host of other factors.

That said, I might be able to give you a rough idea of what we see in terms of event counts by several different
sources, although it might make more sense to discuss those specifics off list.

Alan Amesbury

University Information Security

http://umn.edu/lookup/amesbury

Current thread:

How much host data collected? Bridges, Robert A. (Apr 19)
- Re: How much host data collected? Alan Amesbury (Apr 26)
  - Re: How much host data collected? Valdis Kletnieks (Apr 26)
    - Re: How much host data collected? Bridges, Robert A. (Apr 30)
  - Re: How much host data collected? Bridges, Robert A. (Apr 30)