Hi Stan, Thanks for your detailed reply. On Sun, Jun 03, 2012 at 01:49:23AM -0500, Stan Hoeppner wrote: > On 6/2/2012 10:30 PM, Andy Smith wrote: > > md0 is mounted as /boot > > md1 is used as swap > > md2 is mounted as / > > What's the RAID level of each of these and how many partitions in each? $ cat /proc/mdstat Personalities : [raid1] [raid10] md3 : active raid10 sdd5[0] sdb5[3] sdc5[2] sda5[1] 581022592 blocks 64K chunks 2 near-copies [4/4] [UUUU] md2 : active raid10 sdd3[0] sdb3[3] sdc3[2] sda3[1] 1959680 blocks 64K chunks 2 near-copies [4/4] [UUUU] md1 : active raid10 sdd2[0] sdb2[3] sdc2[2] sda2[1] 1959680 blocks 64K chunks 2 near-copies [4/4] [UUUU] md0 : active raid1 sdd1[0] sdb1[3] sdc1[2] sda1[1] 489856 blocks [4/4] [UUUU] unused devices: <none> > > Even though it was a guest that was attacked, the hypervisor still > > had to route the traffic through its userspace and its CPU got > > overwhelmed by the high packets-per-second. > > How many cores in this machine and what frequency? It is a single quad core Xeon L5420 @ 2.50GHz. > > would I not just have got I/O errors on the single md device that > > everything was running from, causing instant crash? > > Can you post the SCSI/ATA and md errors? Sure. I posted an excerpt in the first email but here is a fuller example: Actually it's pretty big so I've put it at http://paste.ubuntu.com/1022219/ At this point the logs stop because /var is an LV out of md3. I'm moving remote syslog servers further up my priority list... If there hadn't been a DDoS attack at the exact same time then I'd have considered this purely hardware failure due to the way that "mptscsih: ioc0: attempting task abort! (sc=ffff8800352d80c0)" is the absolute first thing of interest. But the timing is too coincidental. It's also been fine since, including a quite IO-intensive backup job and yesterday's "first Sunday of the month" sync_action. > > Unfortunately my monitoring of IOPS for this host cut out during the > > attack and later problems so all I have for that period is a blank > > graph, but I don't think the IOPS requirement would actually have > > been that high. > > What was actually being written to md3 during this attack? Just > logging, or something else? All the VMs would have been doing their normal writing of course, but on the hypervisor host /usr and /var come from md3. From the logs, the main thing it seems to be having problems with is dm-1 which is the /usr LV. > What was the exact nature of the DDOS attack? What service was it > targeting? I assume this wasn't simply a ping flood. It was a UDP short packet (78 bytes) multiple source single destination flood, ~300Mbit/s but the killer was the ~600kpps. Less than 10Mbit/s made it through to the VM it was targeting. > > The CPU was certainly overwhelmed, but my main concern is that I > > am never going to be able to design a system that will cope with > > routing DDoS traffic in userspace. > > Assuming the network data rate of the attack was less than 1000 Mb/s, > most any machine with two or more 2GHz+ cores and sufficient RAM should > easily be able to handle this type of thing without falling over. I really don't think it is easy to spec a decent VM host that can also route hundreds of thousands of packets per sec to guests, without a large budget. I am OK with the host giving up, I just don't want it to corrupt its storage. I mean I'm sure it can be done, but the budget probably doesn't allow it and temporary problems for all VMs on the host are acceptable in this case; filesystem corruption isn't. > Can you provide system hardware details? My original email: > > Controller: LSISAS1068E B3, FwRev=011a0000h > > Motherboard: Supermicro X7DCL-3 > > Disks: 4x SEAGATE ST9300603SS Version: 0006 Network: e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k2 The hypervisor itself has access to only 1GB RAM (the rest dedicated to guest VMs) which may be rather low; I could look at boosting that. The other thing is that the hypervisor and all VM guests share the same four CPU cores. It may be prudent to dedicate one CPU core to the hypervisor and then let the guests share the other three. Any other hardware details that might be relevant? > > I am OK with the hypervisor machine being completely hammered and > > keeling over until the traffic is blocked upstream on real routers. > > Not so happy about the hypervisor machine kicking devices out of > > arrays and ending up with corrupted filesystems though. I haven't > > experienced that before. > > Well, you also hadn't experienced an attack like this before. Correct? No, they happen from time to time and I find that a pretty big one will cripple the host but I haven't yet seen that cause storage problems. But as I say, all the other servers are 3ware+BBU. To be honest I do find I get a better price/performance spot out of 3ware setup; this host represented an experiment with 10kRPM SAS drives and software RAID a few years ago and whilst the performance is decent, the much higher cost per GB of the SAS drives ultimately makes this uneconomical for me, as I do have to provide a certain storage capacity as well. So I haven't gone with md RAID for this use case for a couple of years now and am unlikely to do so in the future anyway. I do still need to work out what to do with this particular server though. > Consider this. If the hypervisor was writing heavily to logs, and if > the hypervisor went into heavy swap during the attack, and the > partitions in md3 that were kicked reside on disks where the swap array > and/or / arrays exist, this would tend to bolster my theory regarding > seek starvation causing the timeouts and kicks. I have a feeling it was trying to swap a lot from the repeated mentions of "swapper" in the logs. The swap partition is md1 which doesn't feature in any of the logs, but perhaps what is being logged there is the swapper kernel thread being unable to do anything because of extreme CPU starvation. > Or, if this is truly a single CPU/core machine, the core was pegged, and > the hypervisor kernel scheduler wasn't giving enough time to md threads, > this may also explain the timeouts, though with any remotely recent > kernel this 'shouldn't' happen under load. Admittedly this is an old Debian lenny server running 2.6.26-2-xen kernel. Pushing ahead the timescale of clearing VMs off of it and doing an upgrade would probably be a good idea. Although it isn't exactly the hardware setup I would like it's been a pretty good machine for a few years now so I would rather not junk it, if I can reassure myself that this won't happen again. > > Also still wondering if what I did to recover was the best way to go > > or if I could have made it easier on myself. > > I don't really have any input on this aspect, except to say that if you > got all your data recovered that's the important part. If you spent > twice as long as you needed to I wouldn't sweat that at all. I'd put > all my concentration on the root cause analysis. Sure, though this was a completely new experience for me so if anyone has any tips for better recovery then that will help should I ever face anything like it again. I do use md RAID in quite a few places (just not much for this use). Notably, the server was only recoverable because I had backups of /usr and /var. There were some essential files corrupted but being able to get them from backups saved having to do a rebuild. Actually corruption of VM block devices was quite minimal -- many of them just had to replay the journal and clean up a handful of orphaned inodes. Apparently a couple of them did lose a small number of files but these VMs are all run by different admins and I don't have great insight into it. Anything I could have done to lessen that would have been useful. They're meant to have backups but meh, human nature... Cheers, Andy -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html