Information Security News mailing list archives

ITL Bulletin for July 2003

From: InfoSec News <isn () c4i org>
Date: Tue, 22 Jul 2003 02:19:44 -0500 (CDT)
Forwarded from: Elizabeth Lennon <elizabeth.lennon () nist gov>

TESTING INTRUSION DETECTION SYSTEMS
Elizabeth B. Lennon, Editor
Information Technology Laboratory
National Institute of Standards and Technology

Introduction

In government and industry, intrusion detection systems (IDSs) are now
standard equipment for large networks. IDSs are software or hardware
systems that automate the process of monitoring the events occurring
in a computer system or network, analyzing them for signs of security
problems.  Despite the expansion of IDS technology in recent years,
the accuracy, performance, and effectiveness of these systems is
largely untested, due to the lack of a comprehensive and
scientifically rigorous testing methodology. This ITL Bulletin
summarizes NISTIR 7007, An Overview of Issues in Testing Intrusion
Detection Systems, by Peter Mell and Vincent Hu of NIST's Information
Technology Laboratory, and Richard Lippmann, Josh Haines, and Marc
Zissman of the Massachusetts Institute of Technology Lincoln
Laboratory. The Defense Advanced Research Projects Agency (DARPA)
sponsored the work.

The lack of quantitative IDS performance measurements can be
attributed to some challenging research barriers that must be overcome
before the necessary tests can be created.  NISTIR 7007 outlines the
quantitative measurements that are needed, discusses the obstacles to
the development of these measurements, and presents ideas for research
in IDS performance measurement methodology to overcome the obstacles.
NISTIR 7007 is available online at
http://csrc.nist.gov/publications/nistir/index.html.

Who Needs Quantitative Evaluations?
The results of quantitative evaluations of IDS performance and
effectiveness would benefit many potential customers.  Acquisition
managers need this information to improve the process of system
selection, which is often based only on the claims of the vendors and
limited-scope reviews in trade magazines. Security analysts who review
the output of IDSs would like to know the likelihood that alerts will
result when particular kinds of attacks are initiated.  Finally, R&D
program managers need to understand the strengths and weaknesses of
currently available systems so that they can effectively focus
research efforts on improving systems and measure their progress.

Measurable IDS Characteristics
Listed below is a partial set of measurements that can be made on
IDSs. These measurements are quantitative and relate to performance
accuracy.

* Coverage. This measurement determines which attacks an IDS can
detect under ideal conditions. For signature-based systems, this would
simply consist of counting the number of signatures and mapping them
to a standard naming scheme.  For non-signature-based systems, one
would need to determine which attacks out of the set of all known
attacks could be detected by a particular methodology. The number of
dimensions that make up each attack makes this measurement difficult.
Another problem with assessing the coverage of attacks is determining
the importance of different attack types. In addition, most sites are
unable to detect failed attacks seeking vulnerabilities that no longer
exist on a site.

* Probability of False Alarms. This measurement determines the rate of
false positives produced by an IDS in a given environment during a
particular time frame. A false positive or false alarm is an alert
caused by normal non-malicious background traffic. Some causes for
Network IDS (NIDS) include weak signatures that alert on all traffic
to a high-numbered port used by a backdoor; search for the occurrence
of a common word such as help in the first 100 bytes of SNMP or other
TCP connections; or detection of common violations of the TCP
protocol. They can also be caused by normal network monitoring and
maintenance traffic generated by network management tools.  It is
difficult to measure false alarms because an IDS may have a different
false positive rate in each network environment, and there is no such
thing as a standard network. Also important to IDS testing is the
receiver operating characteristic (ROC) curve, which is an aggregate
of the probability of false alarms and the probability of detection
measurements. This curve summarizes the relationship between two of
the most important IDS characteristics: false positive and detection
probability.

* Probability of Detection. This measurement determines the rate of
attacks detected correctly by an IDS in a given environment during a
particular time frame. The difficulty in measuring the detection rate
is that the success of an IDS is largely dependent upon the set of
attacks used during the test. Also, the probability of detection
varies with the false positive rate, and an IDS can be configured or
tuned to favor either the ability to detect attacks or to minimize
false positives. One must be careful to use the same configuration
during testing for false positives and hit rates.

* Resistance to Attacks Directed at the IDS. This measurement
demonstrates how resistant an IDS is to an attacker's attempt to
disrupt the correct operation of the IDS. One example is sending a
large amount of non-attack traffic with volume exceeding the
processing capability of the IDS. With too much traffic to process, an
IDS may drop packets and be unable to detect attacks. Another example
is sending to the IDS non-attack packets that are specially crafted to
trigger many signatures within the IDS, thereby overwhelming the human
operator of the IDS with false positives or crashing alert processing
or display tools.

* Ability to Handle High Bandwidth Traffic. This measurement
demonstrates how well an IDS will function when presented with a large
volume of traffic.  Most network-based IDSs will begin to drop packets
as the traffic volume increases, thereby causing the IDS to miss a
percentage of the attacks. At a certain threshold, most IDSs will stop
detecting any attacks.

* Ability to Correlate Events. This measurement demonstrates how well
an IDS correlates attack events.  These events may be gathered from
IDSs, routers, firewalls, application logs, or a wide variety of other
devices. One of the primary goals of this correlation is to identify
staged penetration attacks. Currently, IDSs have only limited
capabilities in this area.

* Ability to Detect Never-Before-Seen Attacks. This measurement
demonstrates how well an IDS can detect attacks that have not occurred
before. For commercial systems, it is generally not useful to take
this measurement since their signature-based technology can only
detect attacks that had occurred previously (with a few exceptions).  
However, research systems based on anomaly detection or
specification-based approaches may be suitable for this type of
measurement.

* Ability to Identify an Attack. This measurement demonstrates how
well an IDS can identify the attack that it has detected by labeling
each attack with a common name or vulnerability name or by assigning
the attack to a category.

* Ability to Determine Attack Success. This measurement demonstrates
if the IDS can determine the success of attacks from remote sites that
give the attacker higher-level privileges on the attacked system. In
current network environments, many remote privilege-gaining attacks
(or probes) fail and do not damage the system attacked.  Many IDSs,
however, do not distinguish the failed from the successful attacks.

* Capacity Verification for NIDS. The NIDS demands higher-level
protocol awareness than other network devices such as switches and
routers; it has the ability of inspection into the deeper level of
network packets.  Therefore, it is important to measure the ability of
a NIDS to capture, process, and perform at the same level of accuracy
under a given network load as it does on a quiescent network.

* Other Measurements. There are other measurements, such as ease of
use, ease of maintenance, deployments issues, resource requirements,
availability and quality of support, etc. These measurements are not
directly related to the IDS performance but may be more significant in
many commercial situations.

IDS Testing Efforts to Date
IDS testing efforts vary significantly in their depth, scope,
methodology, and focus.  Evaluations have increased in complexity over
time to include more IDSs and more attack types, such as stealthy and
denial of service (DoS)  attacks. Only research evaluations have
included novel attacks designed specifically for the evaluation and
evaluated the performance of anomaly detection systems.  Evaluations
of commercial systems have included measurements of performance under
high-traffic loads.  Traffic loads were generated using real
high-volume background traffic mirrored from a live network and also
with commercial load-testing tools.

Academic, research laboratories, and commercial organizations have all
been active in IDS testing efforts.  The University of California at
Davis and IBM Zurich developed prototype IDS testing platforms. MIT
Lincoln Laboratory performed the most extensive quantitative IDS
testing to date, developing an intrusion detection corpus that is used
extensively by researchers. The Air Force Research Laboratory focused
on testing IDSs in real-time in a more complex hierarchical network
environment. The MITRE Corporation investigated the characteristics
and capabilities of network-based IDSs. The Neohapsis
Laboratories/Network Computing magazine collaboration involved the
evaluation of commercial systems. The NSS Group evaluated 15
commercial IDSs and one open-source IDS in 2000 and 2001, and issued a
detailed report and analysis. Lastly, Network World Fusion magazine
reported a more limited review of five commercial IDSs. See NISTIR
7007 for a complete description of these testing efforts.

IDS Testing Issues

* Difficulties in Collecting Attack Scripts and Victim Software. The
difficulty of collecting attack scripts and victim software hinders
progress in developing tests. It is difficult and expensive to collect
a large number of attack scripts. While such scripts are widely
available on the Internet, it takes time to find relevant scripts to a
particular testing environment. Once a script is identified, our
experience is that it takes roughly one person-week to review the
code, test the exploit, determine where the attack leaves evidence,
automate the attack, and integrate it into a testing environment.

* Differing Requirements for Testing Signature-Based vs.  
Anomaly-Based IDSs. Although most commercial IDSs are signature-based,
many research systems are anomaly-based, and it would be ideal if an
IDS testing methodology would work for both of them. This is
especially important for comparison of the performance of upcoming
research systems to existing commercial ones. However, creating a
single test to cover both types of systems presents some problems.

* Differing Requirements for Testing Network-Based vs.  Host-Based
IDSs. Testing host-based IDSs presents some difficulties not present
when testing network-based IDSs.  In particular, network-based IDSs
can be tested in an off-line manner by creating a log file containing
TCP traffic and then replaying that traffic to IDSs. Since it is
difficult to test a host-based IDS in an off-line manner, researchers
must explore more difficult real-time testing. Real-time testing
presents problems of repeatability and consistency between runs.

* Four Approaches to Using Background Traffic in IDS Tests.  Most IDS
testing approaches can be classified in one of four categories with
regard to their use of background traffic: testing using no background
traffic/logs, testing using real traffic/logs, testing using sanitized
traffic/logs, and testing using simulated traffic/logs.  While there
may be other valid approaches, most researchers find it necessary to
choose among these categories when designing their experiments.  
Furthermore, it is unclear which approach is the most effective for
testing IDSs since each has unique advantages and disadvantages.

See NISTIR 7007 for a complete discussion of these issues.

Recommendations for IDS Testing Research Research recommendations for
IDS testing focus on two areas: improving datasets and enhancing
metrics.

* Shared Datasets. There is a great need for IDS testing datasets that
can be shared openly between multiple organizations. Few datasets
exist that have even semi-realistic data or have the attacks within
the background traffic labeled. Without shareable datasets, IDS
researchers must either expend enormous resources creating proprietary
datasets or use fairly simplistic data for their testing.

* Attack Traces. Since it is difficult and expensive to collect a
large set of attacks scripts for the purposes of IDS testing, a
possible alternative is to use attack "traces" instead of real
attacks. Attack traces are the log files that are produced when an
attack is launched and that specify exactly what happened during the
attack. Such traces usually consist of files containing network
packets or systems logs that correspond to an instance of an attack.
Researchers need a better understanding of the advantages and
disadvantages of replaying such traces as a part of an IDS test. In
addition, there is a great need to provide the security community with
a large set of attack traces. Such information could be easily added
to and would greatly augment existing vulnerability databases. The
resulting vulnerability/attack trace databases would aid IDS testing
researchers and would provide valuable data for IDS developers.

* Cleansing Real Data. Real data generally cannot be distributed due
to privacy and sensitivity issues. Research into methods to remove the
confidential data within background traffic while preserving the
essential features of the traffic could enable the use of such data
within IDS tests. Such an advance would alleviate the need for
researchers to expend additional effort creating expensive simulated
environments. Another problem with real background data is that it may
contain attacks about which nothing is known. It is possible, however,
that such attacks could be automatically removed. One idea is to
collect a trace of events in the real world and use a simulation
system to produce data similar to those in the collected trace.

* Sensor and Detector Alert Datasets. Some intrusion correlation
systems do not use a raw data stream (like network or audit data) as
input, but instead rely upon alerts and aggregated information reports
from IDSs and other sensors. Researchers need to develop systems that
can generate realistic alert log files for testing correlation
systems. A solution is to deploy real sensors and to sanitize the
resulting alert stream by replacing IP addresses. Sanitization in
general is difficult for network activity traces, but it is relatively
easy in this special case since alert streams use well-defined formats
and generally contain little sensitive data (the exception being IP
addresses and possibly passwords).

* Real-Life Performance Metrics. Receiver operating characteristic
(ROC) curves are created by stepping through alerts emitted by the
detector in order of confidence or severity. The goal is to show how
many alerts must be analyzed to achieve a certain level of performance
and, by applying costs, to determine an optimal point of operation.  
The confidence or severity-based ROC curve, however, is not a good
indicator of how the IDS will perform with an intelligent human
administrator sitting at the console. The human administrator does not
consider the IDS alerts alone, but makes use of additional information
such as network maps, user trouble reports, and learned knowledge of
common false alarms when considering which alerts to analyze first.
Thus the alert ordering used as a basis of the ROC is often not
realistic. A further problem is that few current detection systems
output a continuous range of scores but instead output only a few
priorities (low/medium/high). Thus the ROC consists of only a few very
coarse points. It might be useful to use alert type, source, and/or
destination IP address along with severity or confidence to order a
set of IDS alerts for the purpose of estimating cost and performance
of a detector. This new technique could produce a curve that could
provide a much more realistic basis for comparing attack detection and
false alarm performance, and for estimating the cost of using the
intrusion detection product at various levels of performance.

* New Technologies. Newly evolving IDS technologies include meta-IDS
technologies that attempt to ease the burden of cross-vendor data
management; IDS appliances that promise increased processing power and
more robust remote management capabilities; and Application-layer
technologies that filter potential attack traffic to downstream
scanner on dedicated network segments. These new directions focus on
new technologies for enterprises or service providers and represent
examples of research efforts to solve the difficulties of false
positives, traffic bottlenecks, and distinguishing serious attacks
from nuisance alarms.

Conclusion
While IDS testing efforts to date vary significantly and have become
increasingly complex, the lack of a comprehensive and scientifically
rigorous testing methodology to quantify IDS performance has hindered
the development of needed tests. NIST believes that a periodic,
comprehensive evaluation of IDSs could be valuable for acquisition
managers, security analysts, and R&D program managers. However,
because both normal and attack traffic vary widely from site to site,
and because normal and attack traffic evolve over time, these
evaluations will likely be complex and expensive. To enable
evaluations to be conducted more efficiently, NIST recommends that the
community find ways to create, label, share, and update relevant data
sets containing normal and attack activity.

Disclaimer
Any mention of commercial products or reference to commercial
organizations is for information only; it does not imply
recommendation or endorsement by NIST nor does it imply that the
products mentioned are necessarily the best available for the purpose.

-
ISN is currently hosted by Attrition.org

To unsubscribe email majordomo () attrition org with 'unsubscribe isn'
in the BODY of the mail.
Current thread:

ITL Bulletin for July 2003 InfoSec News (Jul 22)
- <Possible follow-ups>
- Re: ITL Bulletin for July 2003 InfoSec News (Jul 23)