Bug in mdadm?

Tapani Utriainen <tapani@cs.chalmers.se> · Mon, 7 Jul 2003 15:34:01 +0200 (MET DST)

Hi,

I have struck into something that seems to be a bug in mdadm, and/or in the kernel (2.4.20).

I wanted to create a RAID 5 with 6 disks with mdadm:

# mdadm --create /dev/md0 --level=5 --chunk=256 --raid-devices=6 --spare-devices=0 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/hdg /dev/hde

Despite the explicit statement of SIX disks and NO spares it
created an array of SEVEN disks, with ONE spare and ONE missing/failed+removed?!

Output from 'mdadm -D' is found at the end of this message. This is very
likely to be a bug in mdadm.

Now, in case this was just a quirk of some other kind I proceeded with
creating a reiserfs ; mount ; fiddle ; testing of redundancy by
marking a drive as failed.

# mdadm /dev/md0 -f /dev/sdf

After this all processes accessing the fs goes into disk sleep.
(If functional, the array was expected to go into degenerate mode, and me
still being able to access the fs).

In the logs there is an indication of a kernel bug. (See the dump at the
very end of this message)

However I am no software raid expert, and this might just be a result of
severe misusage/misunderstanding of the tools..

//Tapani

* * * * *  MISCONFIGURED ARRAY ?  * * * * *

# mdadm --create /dev/md0 --level=5 --chunk=256 --raid-devices=6 --spare-devices=0 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/hdg /dev/hde

# mdadm -D /dev/md0
/dev/md0:
        Version : 00.90.00
  Creation Time : Mon Jul  7 13:45:47 2003
     Raid Level : raid5
     Array Size : 603136000 (575.20 GiB 617.61 GB)
    Device Size : 120627200 (115.04 GiB 123.52 GB)
   Raid Devices : 6
  Total Devices : 7
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Jul  7 13:45:47 2003
          State : dirty, no-errors
 Active Devices : 5
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 256K

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       48        1      active sync   /dev/sdd
       2       8       64        2      active sync   /dev/sde
       3       8       80        3      active sync   /dev/sdf
       4      34        0        4      active sync   /dev/hdg
       5       0        0        5      faulty
       6      33        0        6        /dev/hde
           UUID : faa0f80f:a5bacb7f:1caf43d5:d0147d81
         Events : 0.1

* * * * *  KERNEL BUG ?  * * * * *

# mdadm /dev/md0 -f /dev/sdf
mdadm: set /dev/sdf faulty in /dev/md0

>From the logs:

# dmesg

...

raid5: Disk failure on sdf, disabling device. Operation continuing on 4
devices
md: updating md0 RAID superblock on device
md: hde [events: 00000002]<6>(write) hde's sb offset: 120627264
md: hdg [events: 00000002]<6>(write) hdg's sb offset: 120627264
md: md_do_sync() got signal ... exiting
RAID5 conf printout:
 --- rd:6 wd:4 fd:2
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
 disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
 disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
RAID5 conf printout:
 --- rd:6 wd:4 fd:2
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
 disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
 disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
md: recovery thread finished ...
md: recovery thread got woken up ...
md0: resyncing spare disk hde to replace failed disk
RAID5 conf printout:
 --- rd:6 wd:4 fd:2
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
 disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
 disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
RAID5 conf printout:
 --- rd:6 wd:4 fd:2
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
 disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
 disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 100 KB/sec/disc.
md: using maximum available idle IO bandwith (but not more than 100000 KB/sec) for reconstruction.
md: using 124k window, over a total of 120627200 blocks.
md: md_do_sync() got signal ... exiting
RAID5 conf printout:
 --- rd:6 wd:4 fd:2
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
 disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
 disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
RAID5 conf printout:
 --- rd:6 wd:4 fd:2
 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdc
 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdd
 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sde
 disk 3, s:0, o:0, n:3 rd:3 us:1 dev:sdf
 disk 4, s:0, o:1, n:4 rd:4 us:1 dev:hdg
 disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
md: recovery thread finished ...
md: (skipping faulty sdf )
md: sde [events: 00000002]<6>(write) sde's sb offset: 120627264
md: sdd [events: 00000002]<6>(write) sdd's sb offset: 120627264
md: sdc [events: 00000002]<6>(write) sdc's sb offset: 120627264
journal-601, buffer write failed
kernel BUG at prints.c:334!

invalid operand: 0000
CPU:    0
EIP:    0010:[<c01aa4b8>]    Not tainted
EFLAGS: 00010282
eax: 00000024   ebx: f7b3b000   ecx: 00000012   edx: ef66ff7c
esi: 00000000   edi: f7b3b000   ebp: 00000003   esp: f7bd3ec0
ds: 0018   es: 0018   ss: 0018
Process kupdated (pid: 7, stackpage=f7bd3000)
Stack: c02bade6 c0355ce0 f7b3b000 f8d264ec c01b584a f7b3b000 c02bc900 00001000
       eecbef80 00000006 00000004 00000000 ee621e40 00000000 00000008 ecb5c000
       00000004 c01b9991 f7b3b000 f8d264ec 00000001 00000006 f8d2f58c 00000004
Call Trace:    [<c01b584a>] [<c01b9991>] [<c01b8ba4>] [<c01a7240>] [<c0141d0a>]
  [<c0140e14>] [<c014118d>] [<c0105000>] [<c0105000>] [<c01058ce>] [<c0141090>]

Code: 0f 0b 4e 01 ec ad 2b c0 85 db 74 0e 0f b7 43 08 89 04 24 e8

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html