OS fingerprint classification using a support vector machine

David Fifield

January 16, 2011

Abstract

An evaluation of using a support vector machine (SVM) to classify operating
system fingerprints in the Nmap security scanner. In solving a simplified
version of operating system classification, the SVM got marginally more
accurate results than Nmap’s built-in classifier.

1 Introduction

Remote operating system detection (OS detection) is any way of finding out the
operating system system of a network host through analysis of network traffic.
It has implications for network security, including network inventory,
vulnerability detection, and exploit tailoring. OS detection is a feature of
various network security programs including Nmap, p0f, and SinFP.

There are many ways of doing OS detection. This paper is concerned only with
active, low-level detection. “Active” means that the scanner is free to send
packets and not just sniff passively. “Low-level” means that it operates at the
level of IP, TCP, and ICMP, and not, for example, by reading banners from
application services.

OS detection is possible because of implementation differences in TCP/IP
stacks. OS programmers are free to set some values within certain limits.
Systems differ in how they handle unexpected or uncommon conditions.
Implementation errors are sometimes visible in network traffic. OS detection
seeks to elicit as many of these distinguishing characteristics as possible.
Nmap sends over a dozen TCP, UDP, and ICMP probes, with various valid and
invalid combinations of fields, and collects the responses into a data
structure called an OS fingerprint. This is a fingerprint of Linux 2.6.34:
SEQ(SP=CC%GCD=1%ISR=CD%TI=Z%CI=Z%II=I%TS=8)  
OPS(O1=M5B4ST11NW7%O2=M5B4ST11NW7%O3=M5B4NNT11NW7  
   %O4=M5B4ST11NW7%O5=M5B4ST11NW7%O6=M5B4ST11)  
WIN(W1=16A0%W2=16A0%W3=16A0%W4=16A0%W5=16A0%W6=16A0)  
ECN(R=Y%DF=Y%T=40%W=16D0%O=M5B4NNSNW7%CC=Y%Q=)  
T1(R=Y%DF=Y%T=40%S=O%A=S+%F=AS%RD=0%Q=)  
T2(R=N)  
T3(R=Y%DF=Y%T=40%W=16A0%S=O%A=S+%F=AS%O=M5B4ST11NW7%RD=0%Q=)  
T4(R=Y%DF=Y%T=40%W=0%S=A%A=Z%F=R%O=%RD=0%Q=)  
T5(R=Y%DF=Y%T=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=)  
T6(R=Y%DF=Y%T=40%W=0%S=A%A=Z%F=R%O=%RD=0%Q=)  
T7(R=Y%DF=Y%T=40%W=0%S=Z%A=S+%F=AR%O=%RD=0%Q=)  
U1(R=Y%DF=N%T=40%IPL=164%UN=0%RIPL=G%RID=G%RIPCK=G%RUCK=G%RUD=G)  
IE(R=Y%DFI=N%T=40%CD=S)

The format of the fingerprint and the meaning of its fields are described fully
in [1]. A dotted notation is used to refer to tests within each line: in the
above fingerprint it is the case that ECN.W=16D0, meaning that the TCP window
size in the response to the Explicit Congestion Notification probe was 0x16D0.
Compare with a fingerprint of Windows XP SP3:
SEQ(SP=105%GCD=1%ISR=10B%TI=I%II=I%SS=S%TS=0)  
OPS(O1=M5B4NW0NNT00NNS%O2=M5B4NW0NNT00NNS%O3=M5B4NW0NNT00  
   %O4=M5B4NW0NNT00NNS%O5=M5B4NW0NNT00NNS%O6=M5B4NNT00NNS)  
WIN(W1=FFFF%W2=FFFF%W3=FFFF%W4=FFFF%W5=FFFF%W6=FFFF)  
ECN(R=Y%DF=Y%TG=80%W=FFFF%O=M5B4NW0NNS%CC=N%Q=)  
T1(R=Y%DF=Y%TG=80%S=O%A=S+%F=AS%RD=0%Q=)  
T2(R=Y%DF=N%TG=80%W=0%S=Z%A=S%F=AR%O=%RD=0%Q=)  
T3(R=Y%DF=Y%TG=80%W=FFFF%S=O%A=S+%F=AS%O=M5B4NW0NNT00NNS%RD=0%Q=)  
T4(R=Y%DF=N%TG=80%W=0%S=A%A=O%F=R%O=%RD=0%Q=)  
U1(R=N)  
IE(R=Y%DFI=S%TG=80%CD=Z)

See that in this case ECN.W=FFFF. A few of other differences are in the TCP
options (OPS.O1 through OPS.O6 and T3.O), don’t fragment flags (*.DF), and
explicit congestion notification status (ECN.CC). Windows responded to the T2
probe while Linux did not; also the Windows fingerprint is missing lines T5,
T6, and T7 because the host lacked a closed port.

This paper is not about the technical details of OS detection probes and the
assembly of fingerprints. Rather, it is assumed that the fingerprints are given
and the only remaining task is to classify them. The focus is on converting a
fingerprint to a feature vector and then classifying it using a support vector
machine.

OS identification is harder than it sounds. In an ideal world, every operating
system would have just one fingerprint, distinguishable from those of all
others. In actuality, one OS can have dozens of fingerprints, even at the
granularity of a Linux kernel micro revision, let also a Windows service pack.
There are at least two reasons for this: some tests are sensitive to network
conditions and (especially) packet-mangling routers, and fingerprints are
affected by OS configuration. Nmap’s current second-generation OS database has
a simple pattern language that allows for some variation in fingerprints [1],
but even so it contains over 2,600 fingerprints for 106 OS families from AIX to
ZyNOS. Of these, 468 are Linux, 471 are Windows, and 1,110 are miscellaneous
“embedded” operating systems of hardware devices. The size of this database and
its maintenance costs are one reason for investigating whether machine learning
can do the same job.

2 Previous work

Sarraute and Burroni [4] used a hierarchy of neural networks and Nmap’s older
first-generation OS database to do first coarse, then fine OS classification.

Zaid Aiman [5] outlined a plan to implement something similar for Nmap’s
second-generation database.

João Medeiros et al. [6] used a neural network and the second-generation Nmap
database to do classification and build a map of operating system similarity.
They used Euclidean distance of feature vectors as a measure of fingerprint
similarity.

3 Data sources

The Nmap OS database is built from submissions by users. When Nmap fails to
identify an operating system it prints a fingerprint and asks the user, if the
OS is known, to submit it to an online form along with a classification. The
submissions are integrated into the database a few times per year. The raw data
for this experiment were 12,531 submissions that had already been integrated
into the Nmap database between December 2006 and April 2010 (the training set),
and 1,500 that were received between April 2010 and October 2010 but had not
been integrated (the testing set).

Nmap’s second-generation database was introduced in June 2006, so these are
almost all the submissions available. It is necessary to go far into the past
to counteract the bias that otherwise exists with the submission system: once
an operating system is reliably matched, it becomes less likely to be submitted
in the future.

Many submissions have to be discarded because they are of low quality or have
uncertain identification. All submissions marked by Nmap as being suspect
(using the pseudo-test SCAN.G=N) were removed. During manual database
integration, submissions for common operating systems must include the output
of winver.exe or uname or equivalent, to check that the submitter actually has
access to the operating system ostensibly being identified. A robotic
classification program did pattern matching against this uname output to put
submissions into one of seven OS families; those that could not be positively
identified were discarded. A human doing database integration will discard on
the order of 40% of submissions, and this process is supposed to be even more
conservative.

These seven OS families were chosen because (1) they are common, and (2) they
are easily identified by their uname strings: Linux, Windows, FreeBSD, Darwin
(Mac OS X), SunOS (Solaris), OpenBSD, and AIX.

As will be discussed in the section called “Procedure,” errors (i.e., mistaken
submissions) in the training set were only removed as they became apparent, but
every submission in the testing set was carefully checked by hand (by
cross-checking against the Nmap database) to remove all errors.

After this winnowing process, the data broke down as follows:
OS      # training  # testing
Linux   3,246       570
Windows 1,100       71
FreeBSD 325         15
Darwin  293         9
SunOS   143         4
OpenBSD 79          9
AIX     74          9
total   5,260       687

4 Feature selection

An Nmap fingerprint consists of up to 119 tests [1]. They can be of five
different classes:

 1. Numeric, encoded in hexadecimal. These are mostly fixed-size positive
    integers taken directly from packet contents, like T1.T (T1 probe TTL).
    Members of this class: T[1234567].T, ECN.T, U1.T, IE.T, T[1234567].TG,
    ECN.TG, U1.TG, IE.TG, T[234567].W, WIN.W[123456], ECN.W, T[1234567].RD,
    SEQ.SP, SEQ.GCD, SEQ.ISR, U1.IPL, U1.UN, U1.RIPL, U1.RID.
 2. Enumeration, one of a finite set of symbolic values. Examples are T1.R
    (responsiveness to T1 probe), which may be either Y or N, and U1.RIPCK
    (integrity of IP checksum), which may be G, Z, or I. Members of this class:
    T[1234567].R, ECN.R, U1.R, IE.R, T[1234567].DF, ECN.DF, U1.DF, T
    [1234567].A, T[1234567].S, SEQ.SS, ECN.CC, IE.DFI, U1.RIPCK, U1.RUD.
 3. Enumeration or numeric. These may be like either class 1 or class 2. An
    example is IE.CD (ICMP response code), which may be any of Z, S, or O, or a
    number like 38 if none of the enumeration values fit. U1.RIPL and U1.RID
    would appear to be in this class because they may be numeric or have the
    value G, but G only stands for a specific number so it is more convenient
    to handle these as class 1. Members of this class: SEQ.TI, SEQ.CI, SEQ.II,
    SEQ.TS, U1.RUCK, IE.CD.
 4. Set. These are marked by the presence or absence of letters from a finite
    set. T1.F through T7.F (TCP flags) are examples using the set EUAPRSF.
    Members of this class: T[1234567].F, T[1234567].Q, ECN.Q.
 5. TCP options. OPS.O1 through OPS.O6 as well as T2.O through T7.O encode TCP
    options as strings like M5B4ST11NW7. Members of this class: OPS.O[123456],
    T[234567].O, ECN.O.

Each test is converted to a fixed-size numeric vector as follows. The
concatenation of all these becomes the overall feature vector. Numeric values
map directly to integers.
T1.T=40     → [64]

Enumerations are converted to a sequence of 0/1 indicators, one for each
possible value, according to the advice in [3]. The only exceptions are the Y/N
enumerations, which become a single indicator.
U1.RIPCK=G  → [1,0,0]
U1.RIPCK=Z  → [0,1,0]
U1.RIPCK=I  → [0,0,1]
T1.R=N      → [0]
T1.R=Y      → [1]

Enumeration/numeric tests get categorical indicators followed by one element
for the numeric value. When an enumerated value is used, the spot for the
numeric data is -1. When a numeric value is used, all the categorical
indicators are -1.
IE.CD=G    → [1,0,0,-1]
IE.CD=S    → [0,1,0,-1]
IE.CD=O    → [0,0,1,-1]
IE.CD=38   → [-1,-1,-1,56]

TCP flags are represented by an indicator for each flag.
T1.F=AS    → [0,0,1,0,0,1,0]
T1.F=R     → [0,0,0,0,1,0,0]

TCP options, because of their more complex structure, were ignored. (See
“Continuing work.”)

Whenever a test was missing, all its elements were assigned -1. The
concatenation of all these sub-vectors leads to a feature vector of 240
elements.

5 Procedure

LIBSVM 3.0 [2] is the support vector machine implementation used. The training
of the SVM went according to the recommendations in [3]. Specifically, the
training set was first converted to a set of feature vectors. The elements of
each vector were scaled to lie in the range [-1,1]. Using the RBF kernel K(x
[1],x[2]) = exp(-γ||x[1] -x[2]||^2), grid search found good values of C and γ
using five-fold cross validation. All this is encapsulated in the easy.py and
grid.py scripts that come with LIBSVM.

The feature vector and training set were refined over eight iterations. After
each of the first few iterations, each incorrectly classified submission from
the training set was checked for plausibility by matching it against the Nmap
database. The matching program shows the ten closest matches and for common
operating systems all ten are usually variations of the same OS family. It
would be surprising, for example, for a Linux result to be in the top ten for a
Windows fingerprint. Submissions that appeared to be wrong were excluded from
later iterations. 43 submissions were excluded in this way.

All 1,500 testing submissions were individually checked for correctness. 11 of
these were excluded.

In a typical example of a submission that was manually excluded, one submitter
claimed that the operating system was “Linux 2.6.34.” However, when
cross-checked using Nmap, the nearest matches were
  96% Microsoft Windows XP Professional SP2  
  96% Microsoft Windows XP SP2 - SP3  
  96% Microsoft Windows XP SP3  
  95% Microsoft Windows XP SP2  
  94% Microsoft Windows Server 2003 SP2

The submitter likely scanned the wrong machine by mistake, or sent the
classification of the scanning host instead of the one being scanned.

After the final iteration, the SVM correctly classified 5,237 of 5,260 training
examples (99.5%). Training and cross-validation of the final iteration took
about 90 minutes with easy.py. Experiments during the earlier iterations
included using integers for enumeration tests instead of categorical
indicators, using two-dimensional vectors for binary features instead of
one-dimensional, (accidentally) leaving out a couple of tests, and artificially
adjusting fingerprints to correct a byte ordering bug present in older versions
of Nmap. The training and testing accuracy never dropped below 97% so these
changes are likely not of much consequence. (The byte order fix was kept.)

6 Results

The SVM classified 685 of 687 testing examples correctly (99.7%). Both errors
were Darwin misclassified as FreeBSD.

Nmap’s detection engine, taking the best match and accepting any match at 80%
confidence or more, classified 683 of 687 correctly (99.4%). The errors were
AIX misclassified as an embedded messaging server, AIX as Darwin, Darwin as
FreeBSD, and Windows as a wireless access point. (There were also 10 instances
of Linux classified as “embedded,” but on inspection all of these embedded
devices do appear to run Linux.) If 90% or greater confidence is required of
Nmap, there is only one mistaken classification, AIX as Darwin, but 26
submissions have no result at all, for a total accuracy of 660 in 687 (96.1%).

7 Continuing work

These results are promising, but it must be kept in mind that this particular
SVM will do well only when the operating system is known a priori to be in one
of the seven selected families. Unlike Nmap’s matching algorithm, it will
always return a result in one of these families, even if the fingerprint is
unlike anything seen before. A practical matching algorithm will have to be
aware of novel fingerprints and at least avoid returning a very uncertain
answer.

It is limiting to have to declare possible OS matches in advance as is done
here. “Windows” is unsatisfying when the technology can reliably distinguish
between “Windows 2000 SP4” and “Windows XP SP1.” The authors of [4] used a
hierarchy of classifiers, so once “Windows” was known they could get a more
specific answer by turning to a specialized Windows classifier. Even this
doesn’t answer for the many undocumented “embedded” operating systems known
only by the hardware they run on, such as
Polycom SoundPoint 500 or 601 IP phone  
W&T Web-IO Thermometer model 57101  
Xerox WorkCentre Pro C2128 printer

One possibility, as is done in [4], is to put all such miscellaneous
fingerprints into a single category.

TCP options are a great source of variation and should be part of the feature
vector. The references show some ways of doing this.

The source code of the tools developed for this project is at http://
www.bamsoftware.com/stanford/cs229/.

References

[1]   Gordon “Fyodor” Lyon. Nmap Network Scanning. Chapter 8, “Remote OS
Detection” (2009). http://nmap.org/book/osdetect.html.

[2]   Chih-Chung Chang and Chih-Jen Lin. LIBSVM: a library for support vector
machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[3]   Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin. “A Practical Guide to
Support Vector Classification.” http://www.csie.ntu.edu.tw/~cjlin/papers/guide/
guide.pdf.

[4]   Carlos Sarraute and Javier Burroni. “Using Neural Networks to improve
classical Operating System Fingerprinting techniques.” Electronic Journal of
SADIO vol. 8, no. 1, pp. 35–47 (2008). http://www.coresecurity.com/files/
attachments/Sarraute_EJS.pdf

[5]   Zaid Aiman. “Ideas: Nmap Fingerprint Analyzer.” nmap-dev mailing list
(2008). http://seclists.org/nmap-dev/2008/q2/2

[6]   João Medeiros et al. “Automating Security Tests for Industrial Automation
Devices Using Neural Networks.” IEEE Conference on Emerging Technologies and
Factory Automation (2007). “Nmap OS Database and Artificial Neural Networks.”
nmap-dev mailing list (2008). http://seclists.org/nmap-dev/2008/q1/325