On Oct 26, 7:25am, Neil Brown wrote: } Subject: Re: RAID5 refuses to accept replacement drive. Hi Neil, hope your week is going well, thanks for the reply. > > Environment: > > Kernel: 2.4.33.3 > > MDADM: 2.4.1/2.5.3 > > MD: Three drive RAID5 (md3) > Old kernel, new mdadm. Not a tested combination unfortunately. I > guess I should try booting 2.4 somewhere and try it out... Based on what I found, its probably an old library issue as much as anything. More below. > > Drives were shuffled to get the machine operational. The machine came > > up with md3 degraded. The md3 device refuses to accept a replacement > > partition using the following syntax: > > > > mdadm --manage /dev/md3 -a /dev/sde1 > > > > No output from mdadm, nothing in the logfiles. Tail end of strace is > > as follows: > > > > open("/dev/md3", O_RDWR) = 3 > > fstat64(0x3, 0xbffff8fc) = 0 > > ioctl(3, 0x800c0910, 0xbffff9f8) = 0 > Those last to lines are a called to md_get_version. > Probably the one in open_mddev > > > _exit(0) = ? > > But I can see no way that it would exit... > > Are you comfortable with gdb? > Would you be interested in single stepping around and seeing what path > leads to the exit? My apologies for not being quicker on the draw, I should have gone grovelling with gdb first. The problem appears to be due to what must be a broken implementation of getopt_long in the version of the installed C library. Either that or the reasonably complex.... :-) option parsing in mdadm is tripping it up. As I noted before the following syntax fails: mdadm --manage /dev/md3 -a /dev/sde1 After poking around a bit and watching the option parsing in gdb I noticed that the following syntax should work: mdadm /dev/md3 -a /dev/sde1 I tried the latter command outside of GDB and things worked perfectly. The drive was added to the RAID5 array and synchronization proceeded properly. I then failed out a drive element on one of the other MD devices on the machine and was able to repeat the problem. The following refused to work: mdadm --manage /dev/md1 -a /dev/sdb2 While the following worked: mdadm /dev/md1 -a /dev/sdb2 The getopt_long function is not picking up on the fact that -a should have optarg set to /dev/sdb2 when the option is recognized. Instead optarg is set to NULL and devs_found is left at 1 rather than 2. That results in mdadm simply exiting without saying anything. I know the 1.x version of mdadm we were using before processed the 'mdadm --manage' syntax properly. This must have been the first time we had to add a drive element back into an MD device since we upgraded mdadm. I would be happy to chase this a bit more or send you a statically linked binary if you want to see what it is up to. At the very least it may be worthwhile to issue a warning message on exit if mdadm has an MD device specification, a mode specification and no devices. I remember trying to build a statically linked copy of mdadm with dietlibc and ran into option parsing problems. The resultant binary would always exit complaining that a device had not been specified. I remember the dietlibc documentation noting that the GNU folks had an inconsistent world view when it came to getopt processing semantics... :-) I suspect there is a common thead involved in both cases. > NeilBrown Hope the above is useful. Let me know if you have any questions/issues. Happy Halloween. Greg }-- End of excerpt from Neil Brown As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@xxxxxxxxxxxx ------------------------------------------------------------------------------ "Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove it. -- Perliss' Programming Proverb #58 SIGPLAN National, Sept. 1982 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html