On Fri, 19 May 2006, Mike O'Connor wrote: > :single CPU Sun microsystems system running solaris7, 8, or 9 > :(haven't tested on 10). E.g. netra. > : > :if you telnet to a local router, disable nagle (on purpose > :or by accident or whatever - if nagle is turned off), and then > > TCP_NODELAY by any other name, I assume. > > :ping another device with interpacket delay of 0 and a count > > Define what you mean by "interpacket delay". Are you referring to an > Ethernet-specific setting, perhaps? Ethernet's "interpacket gap" is > really about the gap between Ethernet frames, not IP packets. Having > "packet" in the terminology leads people to think it's an IP thing, > and ranks up their with "collisions" as far as misleading Ethernet > terminology goes. Think of it as "interframe gap", or IFG. cisco router. extending ping. 0 delay. I was speaking of cisco ping. I should have said 'timeout'. mea culpa. > > For that manner, define "ping". You're certainly not talking about > /usr/sbin/ping, but something that spews out TCP, correct? It sounds > like you're hitting the Sun system with a TCP ping stream -from- your > router, correct? running ping on the cisco to another device (preferably a fast cisco as the source and a nice fast interface like a gige or a IP/sonet) > > :of somewhere above 100,000 pings, it will effectively > :DOS the machine you are telneting from. > : > :The machine becomes unusable, will not accept break on console. > :totally hung. > : > :After opening a case with Sun on this issue and going back and > :forth for 9 months, they have decided that I am manufacturing > :jabber and the appropriate course of action is to remove the > :offending device (the router in this case) from the network. > > If you're talking IFG... > > Having an IFG < 96 "bittimes (where the wall-clock units for bittimes > varies as a function of specific ethernet speed) leads to out-of-spec > Ethernet frames, which could reasonably be parsed as "jabber". The > too-short IFG could lead the other node(s) in the ethernet not knowing > when you've stopped sending any given frame. In a shared ethernet, > you can also end up with fun conditions like the "capture effect". dedicated, switched Ethernet here. it seems to mostly overwhelm the sun's interupt processing, but that's just a theory since Sun has decided that the solution is to unplug the machine on the other end. We're only talking about 14000 packets per second to kill a netra T1. I've been able to drive one faster than that via other means without causing a 'jabber effect'. > > There's no requirement for the networking to that particular interface > on the Sun to actually work in the face of a too-short IFG or any other > physical out-of-spec condition. Now, that doesn't mean the -console- > should go out to lunch (sounds like you're getting a little too much > "The Network Is The Computer" :) ), but it's perfectly ok to simply not > listen or xmit on an ethernet that's chronically out-of-spec. > indeed. that's my issue, the console should not be hung. The machine should not require a hard reset. And, I do not believe there is an electrical problem. I'm not doing anything down that low, It's just a TCP/IP stream, and, a not outrageous one at at that. > If Sun were to tweak things so it could detect and log the out-of-spec > network and react to it by downing the interface, rather than just keep > listening and accumulating a ton of bogusly-spaced interrupts that bog > it down, that would seem to be reasonable. Some Unixes have userspace > routing daemons that periodically look for network brokenness and will > ifconfig the interface down But, if the system is bogged down quickly > enough where that those processes never get a chance to run, such forms > of mitigation won't work. > > Oh as an important side note -- your Sun is set up where it won't hang > owing to network dependencies if its interface is ifconfig'ed up, but > the actual network it talks to is offline, right? Otherwise, you are > making yourself DoS-prone in a whole lot of ways besides pfutzing with > out-of-spec ethernets. > correct. standalone mechine. (even if it were not, there would still be response on console to, e.g. break) > :In other words, they refuse to fix the DOS issue under the assertion > :that it is a physical issue rather than an issue of the OS > :improperly handling a stream of small TCP packets. > > My -suspicion- here is that it's the interrupts that the "stream of > small TCP packets" generates that is leading to the system hang, but > it'd take some kernel profiling to understand the specific impact. > If the only way to generate the particular concentration of network > interrupts along that ethernet interface involves outright breaking > the ethernet spec, I can see where Sun rejects this as bogus from a > -security- perspective. > See, that's where I have trouble. From a Security perspective, you'd want to avoid the DOS via some kind of drop or disable mechanism in the first place... IMHO. > Have you tried with, say, tiny UDP packets, without messing around > with the IFG (and no need to mess with the TCP-only Nagle algorithm)? > That might hit the interface hard in a way that will show the problem > without an out-of-spec ethernet (or it might not -- interrupt timing > attacks can be very "fussy"). Have you tried doing a back-to-back > configuration with another Sun? It -could- be the case that only a > very particular flavor of interrupt load triggers this, that it's > not a terribly generic problem. > neither. What is triggering it is the stream of !...!..!!..!..!....!!! stuff from the cisco. > FWIW, there's a number of old (and sometimes not-so-old) ethernet NICs > that will "seize up" and need "kicking" (typically an ifconfig down/up > at a Unix OS level) in the face of various flavors of out-of-spec events > No, I don't have a laundry list of such NICs, though I'd imagine that > the folks http://iol.unh.edu might. As long as the OS drivers for such > NICs don't take the rest of the OS down for the ride when the NIC hangs > up in the face of the out-of-spec event, it's not a big deal DoS in my > mind. If you have some ethernet where it's way too easy to propagate > out-of-spec ethernet events, fix the ethernet. > I've found the Netra interface to be pretty resilient and robust (previously). > :They have closed the escalation, so I am left with no recourse but > :to report it as a bug to the rest of you. > : > :For machines with more than 1 CPU, one cpu becomes bogged down but > :the other CPU continues to handle OS tasks ok. >