Security Incidents mailing list archives

Code Red, Virus Growth, and some misunderstandings


From: Thomas Roessler <roessler () does-not-exist org>
Date: Tue, 7 Aug 2001 20:21:08 +0200

[Note: Nothing of the stuff in this message is new, but it still seems that there are various common misunderstandings about this, so an explanation may be in order.]


I have seen various fascinating predictions and theories on the
growth of the code red worm(s), including claims that the infected base of computers could grow by a factor of two when the worm was close to saturation. Another claim (an anonymous Network Associates employee as quoted by heise online) was that the August outbreak infected more computers than the July incidents. [1]

Finally, on incidents.org, Pastor-Satorras' and Vespignani's paper on "Epidemic Spreading in Scale-Free Networks" has been cited as explaining "that [the] scale-free property of the Internet allows even slowly spreading worms to proliferate quickly". [2]


I believe that all three statements are misunderstandings, and are misrepresenting the actual behaviour of the worms we are currently seeing.


Let's start with the incidents.org claim that worm spread on the Internet can be described by the model Pastor-Satorras' and Vespignani's describe. This model is based on a number of rather simple assumptions which are indeed plausible to make for traditional computer viruses (and possibly even some worms such as the 1988 Internet worm), but which do not apply to Code Red v2, which was responsible for the July and August outbreaks (as far as we know).

The assumptions are these (quoted almost verbatim from the paper):

1. The probability that a node of the network has  k  connections
  follows a scale-free distribution   P(k) ~ k^(-gamma) .

2. At each time step, each susceptible (healthy) node is infected with a rate nu if it is connected to one of more infected nodes. 3. At the same time, infected nodes are cured and become again susceptible with rate delta .

While the Internet may indeed look like described under 1., and while it may even be perceived like this by computer viruses (which is a very interesting finding!), it looks entirely different from Code Red's point of view.


Indeed, the "infection network" as seen by code red is extremely simple: Every susceptible node is connected to every other susceptible node. Every instance of IIS may infect every other instance of IIS.

This model for the "infection network" leads to a rather simple mathematical model which was introduced to the Code Red discussions by Stuart Staniford[3], and which is known as the logicistic differential equation. (It was initially introduced in the 19th century as a model for population growth, and has been applied to rats' body weight and the growth of sunflowers, among other aspects. Just look it up in about any textbook on ordinary differential equations.)

The model roughly goes like this: Let the number of infected computers be denoted by I, and let the total number of susceptible computers be denoted by N. Assume that, in initial state, there is a single infected computer, and let K be the number of computers this single infected machine can infect in a time unit. Then, at some point of time, I infected computers can infect K * I * (N-I)/N computers which weren't infected before. Thus,

        dI/dt = K * I * (N-I)/N

Rescaling, we can set a = I/N, and arrive at this:

        da/dt = K * a * (1-a)

Now, the important thing to note is that it's entirely sufficient to look at a rather small network's hits by Code Red v2 in order to determine K (which lead Stuart to his results, K = 1.6 for July and K = 0.7 for August). You do not need to know N in order to determine K!

However, K can safely be assumed to be proportional to the portion of IIS servers on the net which are susceptible for the worm, which directly leads to the conclusion that the number of machines susceptible was 2.3 times as high in July as it was in August.

Of course, the question remains how large a grew in July - was it actually close to 1, or would the curve have grown almost infinitely for days? The model teaches us that we had reached saturation in July. It also teaches us that we reached saturation again last week-end.

Thus, assuming that the worms in July and early August actually were the same, we can derive that the number of infected machines was 2.3 times as high in July as it was in August.

The "contradiction" that the number of infected computers which have been identified in August is considerably higher than in July is readily explained by the fact that the worm is still in its infection phase, while it stopped spreading in July. That is, we are now being hit by a larger portion of infected computers than we were in July, and are able to derive more precise July figures, based on just the timing behaviour of the August infection.


Finally, please note that the August outbreak has reached saturation last week-end. Thus, even the more efficient CodeRed II which tortures servers right now will NOT infect any computers which weren't infected by one of the other worms before. However, I'd expect Code Red II probes to once again follow a logistic curve.

URLs:

[1] http://www.heise.de/newsticker/data/lab-07.08.01-001/
[2] http://www.incidents.org/diary/diary.php#605
[3] http://www.silicondefense.com/cr/

--
Thomas Roessler                        http://log.does-not-exist.org/

Attachment: _bin
Description:


Current thread: