Re: Kernel BUG

"Dan Williams" <dan.j.williams@xxxxxxxxx> · Thu, 11 Dec 2008 14:23:54 -0700

On Tue, Dec 2, 2008 at 4:38 PM, Neil Brown <neilb@xxxxxxx> wrote:
> Thanks for reporting this.  And sorry for not responding when you
> posted it over a week ago to linux-kernel.  I did see it....
>
> It seems that the in-kernel shutdown process is stopping the md arrays
> before all dirty data is flushed.  I guess that is reasonable as the
> '-n' means "don't sync".  However the kernel keeps flushing out dirty
> data after the shutdown has started and that seems to be the problem.
>
> The fact that it has only recently started happening is a useful clue.
> It would be really helpful to use 'git bisect' to find out which
> change introduced the problem.  That should make it a lot easier to
> understand the cause.
>
> I might try to give this a try, but if you are able to try that too it
> would be very helpful.
>

So I took a look at this and if I am not mistaken it appears to be
problem with how md handles the readonly flag.  It was not a problem
before because md would not force the switch to readonly prior to
2.6.27.  Another test case which reproduces this, without the shutdown
interaction, is:

1/ create a dirty xfs filesystem such that on next mount it will
require recovery
mdadm --create /dev/md0 /dev/sd[a-d] -n 4 -l 5
mkfs.xfs /dev/md0
mount /dev/md0 /mnt/tmp
dd if=/dev/zero of=/mnt/tmp/test &
reboot -fn

2/ Assemble the array readonly but then mark it rw with blockdev
mdadm -A /dev/md0 /dev/sd[a-d]
mdadm --readonly /dev/md0
blockdev --getro /dev/md0
1
blockdev --setrw /dev/md0
blockdev --getro /dev/md0
0
cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active (read-only) raid5 sda[0] sdd[3] sdc[2] sdb[1]
      234451968 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
        resync=PENDING

3/ Mounting the filesystem triggers the bug in md_write_start
mount /dev/md0 /mnt/tmp
Filesystem "md0": Disabling barriers, not supported by the underlying device
XFS mounting filesystem md0
kernel BUG at drivers/md/md.c:5358!

Is there a reason md does not catch the BLKROSET in md_ioctl?  That
seems like a straightforward fix.  What I am not sure how to properly
handle is the window between setting the block device readonly and
marking the mddev readonly.  Seems like ->quiesce() could be put to
use here, but I don't think that completely closes the door on
in-flight writes:

Thread0	Thread1
write1
write2	setro
write3	quiesce
write4	do_md_stop

write1 succeeeds, write3,4 get EPERM, but write2 can be in flight
after the check but before ->make_request()?

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html