Re: Sun single-CPU DOS

Doug Hughes <doug@xxxxxxxxxxxxxx> · Mon, 22 May 2006 17:07:10 -0500 (CDT)

On Fri, 19 May 2006, Mike O'Connor wrote:

> :single CPU Sun microsystems system running solaris7, 8, or 9
> :(haven't tested on 10). E.g. netra.
> :
> :if you telnet to a local router, disable nagle (on purpose
> :or by accident or whatever - if nagle is turned off), and then
>
> TCP_NODELAY by any other name, I assume.
>
> :ping another device with interpacket delay of 0 and a count
>
> Define what you mean by "interpacket delay".  Are you referring to an
> Ethernet-specific setting, perhaps?  Ethernet's "interpacket gap" is
> really about the gap between Ethernet frames, not IP packets.  Having
> "packet" in the terminology leads people to think it's an IP thing,
> and ranks up their with "collisions" as far as misleading Ethernet
> terminology goes.  Think of it as "interframe gap", or IFG.

cisco router. extending ping. 0 delay.
I was speaking of cisco ping.
I should have said 'timeout'. mea culpa.

>
> For that manner, define "ping".  You're certainly not talking about
> /usr/sbin/ping, but something that spews out TCP, correct?  It sounds
> like you're hitting the Sun system with a TCP ping stream -from- your
> router, correct?

running ping on the cisco to another device (preferably a fast
cisco as the source and a nice fast interface like a gige or
a IP/sonet)

>
> :of somewhere above 100,000 pings, it will effectively
> :DOS the machine you are telneting from.
> :
> :The machine becomes unusable, will not accept break on console.
> :totally hung.
> :
> :After opening a case with Sun on this issue and going back and
> :forth for 9 months, they have decided that I am manufacturing
> :jabber and the appropriate course of action is to remove the
> :offending device (the router in this case) from the network.
>
> If you're talking IFG...
>
> Having an IFG < 96 "bittimes (where the wall-clock units for bittimes
> varies as a function of specific ethernet speed) leads to out-of-spec
> Ethernet frames, which could reasonably be parsed as "jabber".  The
> too-short IFG could lead the other node(s) in the ethernet not knowing
> when you've stopped sending any given frame.  In a shared ethernet,
> you can also end up with fun conditions like the "capture effect".

dedicated, switched Ethernet here.
it seems to mostly overwhelm the sun's interupt processing, but
that's just a theory since Sun has decided that the solution is to
unplug the machine on the other end.

We're only talking about 14000 packets per second to kill a netra
T1. I've been able to drive one faster than that via other means
without causing a 'jabber effect'.

>
> There's no requirement for the networking to that particular interface
> on the Sun to actually work in the face of a too-short IFG or any other
> physical out-of-spec condition.  Now, that doesn't mean the -console-
> should go out to lunch (sounds like you're getting a little too much
> "The Network Is The Computer" :) ), but it's perfectly ok to simply not
> listen or xmit on an ethernet that's chronically out-of-spec.
>
indeed. that's my issue, the console should not be hung. The machine
should not require a hard reset. And, I do not believe there is
an electrical problem. I'm not doing anything down that low, It's
just a TCP/IP stream, and, a not outrageous one at at that.

> If Sun were to tweak things so it could detect and log the out-of-spec
> network and react to it by downing the interface, rather than just keep
> listening and accumulating a ton of bogusly-spaced interrupts that bog
> it down, that would seem to be reasonable.  Some Unixes have userspace
> routing daemons that periodically look for network brokenness and will
> ifconfig the interface down  But, if the system is bogged down quickly
> enough where that those processes never get a chance to run, such forms
> of mitigation won't work.
>
> Oh as an important side note -- your Sun is set up where it won't hang
> owing to network dependencies if its interface is ifconfig'ed up, but
> the actual network it talks to is offline, right?  Otherwise, you are
> making yourself DoS-prone in a whole lot of ways besides pfutzing with
> out-of-spec ethernets.
>
correct. standalone mechine. (even if it were not, there would still
be response on console to, e.g. break)

> :In other words, they refuse to fix the DOS issue under the assertion
> :that it is a physical issue rather than an issue of the OS
> :improperly handling a stream of small TCP packets.
>
> My -suspicion- here is that it's the interrupts that the "stream of
> small TCP packets" generates that is leading to the system hang, but
> it'd take some kernel profiling to understand the specific impact.
> If the only way to generate the particular concentration of network
> interrupts along that ethernet interface involves outright breaking
> the ethernet spec, I can see where Sun rejects this as bogus from a
> -security- perspective.
>
See, that's where I have trouble. From a Security perspective, you'd
want to avoid the DOS via some kind of drop or disable mechanism
in the first place... IMHO.

> Have you tried with, say, tiny UDP packets, without messing around
> with the IFG (and no need to mess with the TCP-only Nagle algorithm)?
> That might hit the interface hard in a way that will show the problem
> without an out-of-spec ethernet (or it might not -- interrupt timing
> attacks can be very "fussy").  Have you tried doing a back-to-back
> configuration with another Sun?  It -could- be the case that only a
> very particular flavor of interrupt load triggers this, that it's
> not a terribly generic problem.
>
neither.
What is triggering it is the stream of !...!..!!..!..!....!!!
stuff from the cisco.

> FWIW, there's a number of old (and sometimes not-so-old) ethernet NICs
> that will "seize up" and need "kicking" (typically an ifconfig down/up
> at a Unix OS level) in the face of various flavors of out-of-spec events
> No, I don't have a laundry list of such NICs, though I'd imagine that
> the folks http://iol.unh.edu might.  As long as the OS drivers for such
> NICs don't take the rest of the OS down for the ride when the NIC hangs
> up in the face of the out-of-spec event, it's not a big deal DoS in my
> mind.  If you have some ethernet where it's way too easy to propagate
> out-of-spec ethernet events, fix the ethernet.
>
I've found the Netra interface to be pretty resilient and robust (previously).

> :They have closed the escalation, so I am left with no recourse but
> :to report it as a bug to the rest of you.
> :
> :For machines with more than 1 CPU, one cpu becomes bogged down but
> :the other CPU continues to handle OS tasks ok.
>