mailing list archives
Re: Intradomain Traffic Engineering
From: Deepak Jain <deepak () ai net>
Date: Wed, 18 Jan 2006 09:41:26 -0500
1. In the traces I have, there exist several intervals with a huge,
sudden increase of traffic on some links. The prediction model I use
cannot predict those 'big spikes'. Do these 'big spikes' really happen
in operational networks? Or are they merely measurement errors? If they
really happen, is there a gradual ramp up of traffic in smaller time
scale, say, on the order of tens of seconds? Or do these 'big spikes'
really occur very quickly, say, in a few seconds?
2. I have the option to make a tradeoff between average case
performance and worst case performance guarantee, but I don't know which
one is deemed more important by you. Are ISP networks currently
optimized for worst case or average case performance? Is the trade-off
between these two an appealing idea, or may the ISP networks are already
This email covers a lot of issues, perhaps it'll start a discussion.
I think the question depends on how big a core you are talking about.
Excluding local effects (the operator of the network bounces a link or
loses a router, etc), I doubt if you have a significantly large network
you have many effects that shift traffic faster than 10s of seconds
(upperbound on this statement is ~30 seconds).
For example, if you "lose" a BGP session, it may take more than 30
seconds for the router to notice it. Once it realizes that its gone, it
may re-route traffic very rapidly. But it would still take a while (at
least a few seconds for a local link, more for a backbone link) before
that traffic really renormalizes). This has more to do with TCP noticing
packet loss, backing off [only for the traffic that has been effected]
and starting back up. It takes up to half a second to *establish* a
single TCP session on an average latency link.
So, the trick would be to discover the traffic has gone or gone wonky
before the BGP session is dropped. This would allow your algorithm to
back off until a new /normal/ has been established.
However, the talk of traffic engineering and maximum utilization always
come into vogue when folks want to squeeze more utilization out of their
networks without really spending more money. IMO, the best time to use
TE is when customer-links to your network approach your maximum core
speed [relative here... there is /core/ in your datacenter/pop and there
is /core/ that is your network to the point the packets get handed off
on average]. Often this limit on the operator's core is technology
imposed (though budgetary concerns get in there too).
I think the technology doesn't really exist at a scalable level to
operate for the worst-case scenario, despite what some people may say.
Our traffic measurement/link measurement tools are almost all average...
and "spot" checks are of only marginal value. I would suggest that this
is because of the nature of TCP. If the Internet were UDP based, there
would be a *lot* more flash traffic problems. So, for those who have a
high amount of UDP traffic (media streamers, DNS hosts, etc) would have
a very different experience.
I'm not the first person to say it, and I can't remember the first place
I heard it... but I'd suggest that the core is not where TE has the best
benefits. Cores by their nature need to be overengineered. You have
very little flexibility because the demands on them are wide [they need
to handle UDP and TCP, low latency and high latency acceptable
applications with aplomb].
TE belongs to the Customer or non-backbone operating ISP. If one were to
start an ISP where all residential customer connections were 1Gb/s I
could conceivably have thousands of customers operating without needing
200Gb/s of uplink [assuming that were really feasible for a network with
very little traffic terminating on the network]. By using TE I could
shape my peak traffic needs (MLU) to approach my average. This would
make me a much more desirable customer to sell transit to.
TE, MLU, and other concerns while most well understood by
core-operators, aren't by customers. Core operators may eventually need
to push these concerns to customers if backbone link speeds do not stay
far above end-user connection speeds. [on an ICB basis, they are --
whenever you want to buy a few OC-48s in a single POP or an OC-192
customer connection, someone is always going to ask you what your
traffic looks like and when]. This would be easiest to push over by
providing differential pricing. Enforcement and Analysis of *what* is a
desirable traffic pattern and what financial value that provides is
where we are largely lacking today. Since a customer knows their traffic
and their needs better than a core operator, they would be much better
at enforcing traffic flows/engineering. This is better than a core that
optimizes for its own link utilization instead one that just tries to
stay as empty as possible for as long as possible.
This is way early in the day for me, so this may not make any sense.