Hi Doug, That is a lot of information in there, let me try to summarize it and please let me know if I've missed anything: 1) The default chunksize for raid4/5/6 is changing, this should not be a problem as we do not specify a chunksize when creating new arrays 2) The default bitmap chunk size changed, again not a problem as we don't use bitmaps in anaconda atm 3) We need to change the not using of a bitmap, we should use a bitmap by default except when the array will be used for /boot or swap. Questions: 1) What commandline option should we pass to "mdadm --create" to achieve this? 4) We need to start specifying a superblock version, and preferably version 1.1 5) Specifying a superblock version of 1.1 will render systems non bootable, I assume this only applies to systems which have a raid1 /boot, so I guess that we need to specify a superblock version of 1.1, except when the raid set will be used for /boot, where we should keep using 0.9 Questions: 1) Is the above correct ? 6) When creating 1.1 superblock sets we need to pass in: --homehost=<hostname> --name=<devicename> -e{1.0,1.1,1.2} Questions 1) Currently when creating a set, we do for example: mdadm --create /dev/md0 --run --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1 What would this look like with the new mdadm, esp, what would happen to the /dev/md0 argument ? If we can still specify which minor to use when creating a new array, even though that minor may change after the first reboot, then the amount of changes needed to the installer are minimal and we can likely do this without problems for RHEL-6. Regards, Hans On 11/26/2009 03:59 AM, Doug Ledford wrote:
Please keep me on the Cc: as I'm not on this list. Upstream recently released mdadm-3.1.1, which I intend to include in Fedora soon. It finally updates three default settings that should have been updated a long time ago. The default chunk size for raid4/5/6 is now 512K. Anaconda needs to be updated to either leave the default alone or use 512K itself. In the past it has passed in 256K, but extensive performance testing shows that 512K is indeed the sweet spot on pretty much any SATA device, which simply due to SATA being the overwhelming majority of disks we run on today, it's sweet spot should be our default. It updates the default bitmap chunk to be at least 65536K when using an internal bitmap. Performance tests showed as much as a 10% performance penalty for the old default bitmap chunk (8192K). The new bitmap chunk reduces that performance penalty (although we don't have solid numbers on how much...I'll work on that). However, we've never used a bitmap by default on any arrays we create. That needs to change. The simple logic is this: no bitmap on /boot or any swap partitions, use a bitmap on anything else. If we need a bitmap chunk other than the default, I'll follow up here. It updates the default superblock format from the old, antiquated, deprecated version 0.90 superblock that we should have quit using years ago to version 1.1. This is the real kicker. Since anaconda has never actively set the superblock metadata version (even though we should have been using 1.1 long ago), it's now going to have to start. The reason is that unless you upgrade machines to use an md raid aware boot loader, such as grub2 for x86 although I have no idea what would work on non-x86 arches, version 1.1 superblocks will render all installs unbootable. More importantly though, unless the anaconda team decides to blindly set all superblocks back to the old version 0.90 format, this change necessitates more than just a change to controlling which version of 1.x superblock we use on any given array, but also a change to how we create and name arrays in general. Version 0.90 superblocks are from back in the day when we thought it was smart/reasonable to name arrays by number and to mount scsi devices in fstab by their /dev/ entry. That day has long since been gone, dead and buried. We switched filesystems to mount by label so they are immune to device number changes and similarly version 1.x superblocks totally do away with the preferred-minor field in the superblock. Instead, they have a homehost and name field that are used to control device *naming*, not numbering, and in a properly running version 1.x superblock system, the device numbers are not guaranteed to be static from boot to boot (although they usually are). This doesn't appear to be much problem for dracut, but as an example, I'm attaching the mkinitrd patch I have to apply to an F11 system after every mkinitrd update in order to get initrd images that mount by name properly. So, those are the major differences. Switching to any of the version 1.x superblocks necessitates that anaconda pass a few arguments that it hasn't in the past. Right now, these are the things anaconda is going to need to start passing in on any mdadm create commands (that I don't currently believe it does, but I haven't checked and could be wrong): --homehost=<hostname> --name=<devicename> -e{1.0,1.1,1.2} In addition, we should start passing the bitmap option as I outlined above. We will also likely need to set the HOMEHOST entry in mdadm.conf and possibly the AUTO entry in mdadm.conf as well. And this brings me to a different point. Hans asked me to comment on bz537329. I would suggest people look at my comments there for some additional explanation of why ideas like trying to make things work without mdadm.conf are probably a bad idea. So here are a few additional things that I think are worth taking into consideration. If an array is listed in mdadm.conf, then *every* item on the array line must match the array or else it will fail to start. This means that ARRAY lines that list things that can change by using mdadm --grow to change aspects of the array can result in the array failing to be found on the next reboot. Therefore, it would be best if each new ARRAY line we write includes nothing besides the name of the array, the metadata version, and the UUID. If an array is listed in mdadm.conf, then both the --homehost and --name settings will be overridden by the name in the mdadm.conf file, so do not depend on either having an effect for arrays listed in mdadm.conf. However, homehost and name are both used heavily any time the array is not listed in mdadm.conf so setting them correctly is still important. There are a number of common scenarios that make this important: you are carrying an array from machine to machine (like an external drive tower, or raid1 usb flash drive, etc.), when an array is visible to multiple hosts (like arrays built over SAN devices), or when you've built a machine to replace an existing machine and you temporarily install the drives from the machine being replaced in the new machine to copy data across in which case you are starting both your new array and the old array on the same machine. They are also relied upon heavily in order to attempt to satisfy those people that think the md raid stack should work without any mdadm.conf file at all. And there is a special case exception in the name field that is used to attempt to preserve back compatibility. The intersection of all these attempts to satisfy various needs is tricky. Here's how names are determined: 1) If the array is identified in mdadm.conf, the name from the ARRAY line is used. 2) If HOMEHOST has been set in the config a) If the array uses a version 0.90 superblock, check to see if the HOMEHOST has been encoded in the UUID via hash. If not, treat as foreign, if so, treat as local. b) For version 1.x superblocks check the homehost in the superblock against the set homehost. If they match, treat as local, else if the homehost in the superblock is not empty treat as named foreign else treat as foreign. 3) else a) for version 0.90 superblocks treat the array as foreign. b) for 1.x if homehost is set then named foreign else foreign. In case #1, the name as it's in the file is used. If the remainder of cases, local means to attempt to create the array with the requested number (in the case of 0.90 superblocks) or requested name (in the case of version 1.x superblocks). Foreign means that the array will be started with the requested name + a suffix. For example, version 0.90 superblock with preferred-minor of 0 would get created with a random *actual* minor number and the name /dev/md0_0 or md0_1 if md0_0 already exists, etc. A version 1.x superblock with the name root would get created as /dev/md/root_0. Named foreign is used whenever a version 1.x superblock can't be identified as local but it has a valid homehost entry in the superblock. The format attempt is /dev/md/homehost:name so that if you were to mount an array from workstation2:root on workstation1, it would be /dev/md/workstation2:root. There is a special exception for version 1.x superblock arrays. If the name field of the superblock contains a specially formatted name, then it will be treated as a request to create the device with a given minor number and name identical to an old version 0.90 superblock array. Those special case names are: a) a bare number (aka, 0) b) a bare name using standard number format (aka, md0 or md_d0) c) a full name using standard number format (aka, /dev/md0 or /dev/md_d0) If an array uses a name instead of a number, then the named entry created in /dev/md/ will be a symlink to a random numeric md device in /dev/. For example, /dev/md/root, since it's the first device started and since we start grabbing md devices at 127 and counting backwards when starting named devices, will almost always point to /dev/md127. The /dev/md127 file will be the real device file while the entries in /dev/md/ are always symlinks. This is in order to be consistent with the fact that our /sys/block entry will be md127 and our entry in /proc/mdstat will also be md127. This is because the current /sys/block setup does not allow /sys/block/md/root, only md<number>. _______________________________________________ Anaconda-devel-list mailing list Anaconda-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/anaconda-devel-list
_______________________________________________ Anaconda-devel-list mailing list Anaconda-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/anaconda-devel-list