Re: Can extremely high load cause disks to be kicked?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/31/2012 3:31 AM, Andy Smith wrote:

> Now, is this sort of behaviour expected when under incredible load?
> Or is it indicative of a bug somewhere in kernel, mpt driver, or
> even flaky SAS controller/disks?

It is expected that people know what RAID is and how it is supposed to
be used.  RAID is to be used for protecting data in the event of a disk
failure and secondarily to increase performance.  That is not how you
seem to be using RAID.  BTW, I can't fully discern from your log
snippets...are you running md RAID inside of virtual machines or only on
the host hypervisor?  If the former problems like this are expected and
normal, which is why it is recommended to NEVER run md RAID inside a VM.

> Controller: LSISAS1068E B3, FwRev=011a0000h
> Motherboard: Supermicro X7DCL-3
> Disks: 4x SEAGATE  ST9300603SS      Version: 0006
> 
> While I'm familiar with the occasional big DDoS causing extreme CPU
> load, hung tasks, CPU soft lockups etc., I've never had it kick
> disks before. 

The md RAID driver didn't kick disks.  It kicked partitions, as this is
what you built your many arrays with.

> But I only have this one server with SAS and mdadm
> whereas all the others are SATA and 3ware with BBU.

Fancy that.

> Root cause of failure aside, could I have made recovery easier? Was
> there a better way than --create --assume-clean?
> 
> If I had done a --create with sdc5 (the device that stayed in the
> array) and the other device with the closest event count, plus two
> "missing", could I have expected less corruption when on 'repair'?

You could probably expect it to be more reliable if you used RAID as
it's meant to be used, which in this case would be a single RAID10 array
using none, or only one partition per disk, instead of creating 4 or 5
different md RAID arrays from 4-5 partitions on each disk.  This is
simply silly, and it's dangerous if doing so inside VMs.

md RAID is not meant to be used as a thin provisioning tool, which is
what you seem to have attempted here, and is almost certainly the root
cause of your problem.

I highly recommend creating a single md RAID array and using proper thin
provisioning tools/methods.  Or slap a real RAID card into this host as
you have the others.  The LSI (3ware) cards allow the creation of
multiple virtual drives per array, each being exposed as a different
SCSI LUN in Linux, which provides simple and effective thin
provisioning.  This is much simpler and more reliable than doing it all
with kernel drivers, daemons, and filesystem tricks (sparse files
mounted as filesystems and the like).

There are a number of scenarios where md RAID is better than hardware
RAID and vice versa.  Yours is a case where hardware RAID is superior,
as no matter the host CPU load, drives won't get kicked offline as a
result, as they're under the control of a dedicated IO processor (same
for SAN RAID).

-- 
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux