Re: let md auto-detect 128+ raid members, fix potential race condition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[linux-raid added to cc.
 Background: patch was submitted to remove the current hard limit
 of 127 partitions that can be auto-detected - limit set by 
 'detected_devices array in md.c.
]

My first inclination is not to fix this problem.

I consider md auto-detect to be a legacy feature.
I don't use it and I recommend that other people don't use it.
However I cannot justify removing it, so it stays there.
Having this limitation could be seen as a good motivation for some
more users to stop using it.

Why not use auto-detect?
I have three issues with it.

 1/
    It just isn't "right".  We don't mount filesystems from partitions
    just because they have type 'Linux'.  We don't enable swap on
    partitions just because they have type 'Linux swap'.  So why do we
    assemble md/raid from partitions that have type 'Linux raid
    autodetect'? 

 2/ 
    It can cause problems when moving devices.  If you have two
    machines, both with an 'md0' array and you move the drives from one
    on to the other - say because the first lost a powersupply - and
    then reboot the machine that received the drives, which array gets
    assembled as 'md0' ?? You might be lucky, you might not. This
    isn't purely theoretical - there have been pleas for help on
    linux-raid resulting from exactly this - though they have been
    few. 

 3/ 
    The information redundancy can cause a problem when it gets out of
    sync.  i.e. you add a partition to a raid array without setting
    the partition type to 'fd'.  This works, but on the next reboot
    the partition doesn't get added back into the array and you have
    to manually add it yourself.
    This too is not purely theory - it has been reported slightly more
    often than '2'.

So my preferred solution to the problem is to tell people not to use
autodetect.  Quite possibly this should be documented in the code, and
maybe even have a KERN_INFO message if more than 64 devices are
autodetected. 


Now one doesn't always go with one's first inclination so I should
discuss the approaches to fixing the problem, should a fix really be
the right thing to do.

The idea of having a generic notifier is, I think, missing the point -
so I'd better explain what the point is.....

The kernel already has the hotplug mechanism for alerting user-space
about new partitions, and I'm sure kernel clients could hook into this
some how.  But even if they could, md wouldn't want to.
md doesn't really want to know about new partitions.  It simply wants a
list of all partitions which are of type 'autodetect'.   Getting a
list of all partitions is quite easy (/proc/partitions does it, so it
cannot be hard).  But 'struct hd_struct' doesn't record what the
partition type is at all.  It spare bits, so it could. (->policy is
currently a 'ro' flag.  A little work would allow multiple flags there).

The point of the current detected_devices array is precisely to find
partitions which have this flag set.

If we were to 'fix' this problem, I think the cleanest approach (which
I haven't actually coded, so it might not work...) would be to define
a new flag to go in hd_struct->policy to say if the partition type
suggested auto-detect, and get partitions/check.c to set this.
Then have md iterate all partitions looking for this flag.

That could be considered to be intrusive to bits of the kernel which
don't need to be intruded in to, and could run the risk of someone
wanting to expose that flag to user-space (with in /proc/partitions or
/sys) and I would be against that.  But otherwise it would be a
fairly clean approach.

The minimal (non-empty) approach of replacing the array by a linked
list as the original patch did would also be quite reasonable.  Maybe
not as clean as a flag in hd_struct, but maybe we don't need this
code to be particularly clean(?).

So:  Do you *really* need to *fix* this, or can you just use 'mdadm'
to assemble you arrays instead?

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux