Bugtraq mailing list archives

Re: Sun M-class hardware denial of service


From: "B 650" <dunc.on.usenet () googlemail com>
Date: Tue, 9 Sep 2008 22:04:14 +0100

On Tue, Sep 9, 2008 at 8:42 PM, Theo de Raadt <deraadt () cvs openbsd org> wrote:
While having to power cycle the remainder of the frame may be a pain, the
fact it isolates the fault to only power off the affected domain suggests to
me that it is working as designed (the relative virtue of the design not up
for debate).  The power cycle of the remainder of the frame can be done at
your leisure.

Didn't you read the advisory?

You don't get any crashed domain back until you power cycle the entire
machine.  If you need that domain back, you have to make a very nasty
choice.  It is a denial of service.

It is for this reason I would not class this as a DoS attack,
as the "attacker" could not affect the availability of the other domains,
only the admin could.

The admin is forced to choose between "bring the crashed domain back
now by calling Sun and then powering the whole machine down" and
"accept that the crashed domain is down until you call Sun and power
the whole machine down".  How is that not a denial of service?  Do you
work for Sun?

If that is not a denial of service, I don't know what is.

Firstly, I don't work for Sun.

I apologise if I'm misunderstanding you, but it seems to me that this
issue can only be initiated by a privileged user on a domain.  The
only system immediately affected is that particular domain.  The
removal of service from the other domains is a system/service
management decision, rather than an exploit of some kind.   That's why
I don't view it as a DoS vulnerability.  If you exploit this on your
own domain, which then becomes unavailable then, frankly, tough.  You
wait until the frame administrators choose to power cycle the other
domains to bring you back.

You stated in your original message that this is a high-end frame, of
the kind generally used by financial institutions etc.  I would
imagine any system which warrants this kind of hardware would have
some level of redundancy or DR.


You don't state what privileges are required on the affected domain to
initiate the fault.

This was very obvious from the advisory.

If this is executable by unprivileged users, then I
would agree with you that this represents a DoS issue for *that domain*.

Have you ever used vmware?  I don't see how Sun domains are supposed
to be any different from vmware in that case.  Obviously you are
handing sub-admins control over a domain so that they can run any OS
they need to.  There hardware isolation is not supposed to be a joke.  It
is serious stuff.  It has to work.

Obviously you expect that what a sub-admin does in his domain should
not affect the rest of your machine; ie. force you to power it off.
But that is exactly what is required -- a power off of the whole chassis
and all the existing domains.  You cannot even do the equivelant of VMotion.

The admin eventually MUST take all his other domains down.  That is a
denial of service.

I view hardware isolation as not being able to do something in one
domain which can affect processes running in another domain.  I see
nothing here which will do that.  As I said above,  the removal of
service from other domains is a management decision.


It
sounds like the XSCF is monitoring the domain for certain events, and
mistaking legitimate operation for one of these events which leads it to
disable a component in the domain.  While I haven't worked with the M-class
systems, I have some experience with the F15K/E25K range, and it sounds like
the XSCF is blacklisting some component (likely a system board).  Requiring
a power cycle of the whole frame to clear a fault with a single (or even
multiple) components is fairly poor, the most I would expect is to power
cycle the domain components.

That's what you expect, but that is not what happens.  It requires a
service call out to Sun to repair the machine, followed by a power
down of the entire machine.  That is just "fairly poor"??  What the
hell do you think people are paying so much money for?  A complete
illusion of reliability??


I'm don't disagree that this appears to be a bug of some kind in the
error handling by the system controller, what I'm arguing is that it
is not a DoS vulnerability, as the attacker cannot immediately
precipitate a lack of service for any system other than the one over
which they have administrative control.

I'm not surprised you didn't get any interest from Fujitsu/Sun security
people, for the reasons stated above.  As for engineering, I would expect
they will only address the issue if they see a commercial or reputational
benefit in doing so (i.e. someone wants to spend a *lot* of money on
hardware to run OpenBSD, and this issue is a show-stopper).

As the advisory made clear, we are certain that someone could write a
Solaris kernel module that would trigger this same behaviour.

Yet you don't know what it is that causes the issue?  What's Sun's
support arrangement for OpenBSD on SPARC?  If it is reproduced in
Solaris, then I'm sure Sun would address it, but where is the benefit
for them to do so at present?


In other worse, please learn to read.  You'll get further in life.


Thanks for the tip, I'd never thought of that.....


Current thread: