On Sun, 2007-10-28 at 22:59 -0700, Daniel L. Miller wrote: > Doug Ledford wrote: > > Anyway, I happen to *like* the idea of using full disk devices, but the > > reality is that the md subsystem doesn't have exclusive ownership of the > > disks at all times, and without that it really needs to stake a claim on > > the space instead of leaving things to chance IMO. > > > I've been re-reading this post numerous times - trying to ignore the > burgeoning flame war :) - and this last sentence finally clicked with me. > > As I'm a novice Linux user - and not involved in development at all - > bear with me if I'm stating something obvious. And if I'm wrong - > please be gentle! > > 1. md devices are not "native" to the kernel - they are > created/assembled/activated/whatever by a userspace program. My real point was that md doesn't own the disks, meaning that during startup, and at other points in time, software other than the md stack can attempt to use the disk directly. That software may be the linux file system code, linux lvm code, or in some case entirely different OS software. Given that these situations can arise, using a partition table to mark the space as in use by linux is what I meant by staking a claim. It doesn't keep the linux kernel from using it because it thinks it owns it, but it does stop other software from attempting to use it. > 2. Because md devices are "non-native" devices, and are composed of > "native" devices, the kernel may try to use those components directly > without going through md. In the case of superblocks at the end, yes. The kernel may see the underlying file system or lvm disk label even if the md device is not started. > 3. Creating a partition table somehow (I'm still not clear how/why) > reduces the chance the kernel will access the drive directly without md. The partition table is more to tell other software that linux owns the space and to avoid mistakes where someone runs fdisk on a disk accidentally and wipes out your array because they added a partition table on what they thought was a new disk (more likely when you have large arrays of disks attached via fiber channel or such than in a single system). Putting the superblock at the beginning of the md device is the main thing that guarantees the kernel will never try to use what's inside the md device without the md device running. > These concepts suddenly have me terrified over my data integrity. Is > the md system so delicate that BOOT sequence can corrupt it? If you have your superblocks at the end of the devices, then there are certain failure modes that can cause data inconsistencies. Generally speaking they won't harm the array itself, it's just that the different disks in a raid1 array might contain different data. If you don't use partitions, then the majority of failure scenarios involve things like accidental use of fdisk on the unpartitioned device, access of the device by other OSes, that sort of thing. > How is it > more reliable AFTER the completed boot sequence? Once the array is up and running, the constituent disks are marked as busy in the operating system, which prevents other portions of the linux kernel and other software in general from getting at the md owned disks. > Nothing in the documentation (that I read - granted I don't always read > everything) stated that partitioning prior to md creation was necessary > - in fact references were provided on how to use complete disks. Is > there an "official" position on, "To Partition, or Not To Partition"? > Particularly for my application - dedicated Linux server, RAID-10 > configuration, identical drives. > > And if partitioning is the answer - what do I need to do with my live > dataset? Drop one drive, partition, then add the partition as a new > drive to the set - and repeat for each drive after the rebuild finishes? You *probably*, and I emphasize probably, don't need to do anything. I emphasize it because I don't know enough about your situation to say so with 100% certainty. If I'm wrong, it's not my fault. Now, that said, here's the gist of the situation. There are specific failure cases that can corrupt data in an md raid1 array mainly related to superblocks at the end of devices. There are specific failure cases where an unpartitioned device can be accidentally partitioned or where a partitioned md array in combination with superblocks at the end and using a whole disk device can be misrecognized as a partitioned normal drive. There are, on the other hand, cases where it's perfectly safe to use unpartitioned devices, or superblocks at the end of devices. My recommendation when someone asks what to do is to use partitions, and to use superblocks at the beginning of the devices (except for /boot since that isn't supported at the moment). The reason I give that advice is that I assume if a person knows enough to know when it's safe to use unpartitioned devices, like Luca, then they wouldn't be asking me for advice. So since they *are* asking my advice, and since a lot of the failure cases have as much to do with human error as they do with software error, and since human error always seems to find new ways to err, it's therefore impossible to list all the error cases, and so it's best just to give the known safe advice. Just because you heard the advice after creating your arrays is no reason to panic though. Since the disks are local to your linux server and not attached via a fiber channel network or something similar, about 2/3rds of the failure cases drop away immediately. And given that you are using raid10 instead of raid1, the possible silent inconsistency issue drops away. All in all, your pretty safe. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband
Attachment:
signature.asc
Description: This is a digitally signed message part