On Mon, 23 Apr 2012 23:47:12 +0200 Christoph Nelles <evilazrael@xxxxxxxxxxxxx> wrote: > Hello Neil, > > > first thanks for the answer. I will happily provide any data or logs if > it helps you to investigate this problem. > > > Am 23.04.2012 23:00, schrieb NeilBrown: > > This is really worrying. It's about the 3rd or 4th report recently which > > contains: > > > >> Raid Level : -unknown- > >> Raid Devices : 0 > > > > and that should not be possible. There must be some recent bug that causes > > the array to be "cleared" *before* writing out the metadata - and that should > > be impossible. > > What kernel are you running? > > I switched kernel versions during that server rebuild. Last running > system was with 3.2.5, then rebuild and switch to 3.3.1 ant with that it > crashed. Kernel is vanilla selfcompiled, x86_64. > mdadm is 3.1.5, selfcompiled, too. Thanks. This is suggestive that it is a very recently introduced bug, and your earlier observation that the "update time" correlated with the machine being rebooted was very helpful. I believe I have found the problem and have reproduced the symptom The sequence I used to reproduce it was a bit forced and probably isn't exactly what happened in your case. Maybe there is a race condition that can trigger it as well. In any case, the following patch should fix the issue, and is strongly recommended for any kernel to which it applies. I'll send this upstream shortly. Of course this doesn't help you with your current problem though at least it suggests that it won't happen again. I recall that you said you would be re-creating the array with a chunk size of 64k. The default has been 512K since mdadm-3.1 in late 2009. Did you explicitly create with "-c 64" when you created the array? If not, maybe you need to use "-c 512". NeilBrown diff --git a/drivers/md/md.c b/drivers/md/md.c index 333190f..4a7002d 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -8402,7 +8402,8 @@ static int md_notify_reboot(struct notifier_block *this, for_each_mddev(mddev, tmp) { if (mddev_trylock(mddev)) { - __md_stop_writes(mddev); + if (mddev->pers) + __md_stop_writes(mddev); mddev->safemode = 2; mddev_unlock(mddev); }
Attachment:
signature.asc
Description: PGP signature