Re: [PATCH 2 of 9] MD: should_read_superblock

Jonathan Brassow <jbrassow@xxxxxxxxxx> · Thu, 26 May 2011 09:50:05 -0500

On May 25, 2011, at 7:32 PM, NeilBrown wrote:

> On Wed, 25 May 2011 09:00:19 -0500 Jonathan Brassow <jbrassow@xxxxxxxxxx>
> wrote:
> 
>> 
>> On May 24, 2011, at 11:01 PM, NeilBrown wrote:
>> 
>>> On Mon, 23 May 2011 22:06:09 -0500 Jonathan Brassow <jbrassow@xxxxxxxxxxxxxx>
>>> wrote:
>>> 
>>>> Patch name: md-should_read_superblock.patch
>>>> 
>>>> Add new function to determine whether MD superblocks should be read.
>>>> 
>>>> It used to be sufficient to check if mddev->raid_disks was set to determine
>>>> whether to read the superblock or not.  However, device-mapper (dm-raid.c)
>>>> sets this value before calling md_run().  Thus, we need additional mechanisms
>>>> for determining whether to read the superblock.  This patch adds the condition
>>>> that if rdev->meta_bdev is set, the superblock should be read - something that
>>>> only device-mapper does (and only when there are superblocks to be read/used).
>>>> 
>>>> Signed-off-by: Jonathan Brassow <jbrassow@xxxxxxxxxx>
>>> 
>>> I've been feeling uncomfortable about this and have spent a while trying to
>>> see if my discomfort is at all justified.  It seems that maybe it is.
>>> 
>>> The discomfort is really at analyze_sbs being used for dm arrays.  It is
>>> really for arrays where md completely controls the metadata.  dm array are in
>>> a strange intermediate situation where some metadata is controlled by
>>> user-space (so md is told about some details of the array) and other metadata
>>> is managed by the kernel - so md finds those bits out by itself.
>>> 
>>> It isn't yet entirely clear to me how to handle the half-way state best.
>>> 
>>> But the particular problem is that analyse_sbs can call kick_rdev_from_array.
>>> This will call export_rdev which will call kobject_put(&rdev->kboj) which is
>>> bad because dm-based rdevs do not get their kobj initialised.
>>> 
>>> So I think analyse_sbs should not be used for dm arrays.
>>> Rather the code in dm-raid.c which parses the metadata_device info from the
>>> constructor line should load_super.  Then before md_run is called it should
>>> do the 'validate_super' step and record any failures.
>>> 
>>> So the only super_types method that md code would call on a dm-raid array
>>> would be sync_super.
>>> 
>>> Does that work for you?
>> 
>> That seems sensible.  It changes things up a bit though...
>> 
>> 1) the load_super and validate_super functions would go into dm-raid.c, but stubs (returning EINVAL) would remain in md.c in order to fill-out the super_types pointers.
>> 2) the device-mapper superblock would have to move to a common place because it would need to be shared by the super functions in dm-raid.c and sync_super in md.c.  I'd rather not put the new superblock in md_p.h... perhaps a new file, dm-raid.h?  (You could hide the superblock entirely in dm-raid.c, but you'd have to export a function from dm-raid.c that would be called by sync_super in md.c - necessitating a dm-raid.h again.  Is this a better solution?)
>> 
>> brassow
> 
> How about we put a 'sync_super' or possibly a 'struct super_type' pointer in
> mddev_t, and use it instead of mddev->major_version for finding operations.
> Then all knowledge of the dm metadata  can live in dm-raid.c??

I was just thinking that - yes, that sounds good.

I haven't thought about it too deeply yet, so I'm not sure which I like better:
1) just sync_super ptr in mddev_t
2) super_types in mddev_t

My first impression is just sync_super, after all, the load and validate can be done within device-mapper and never need to be called by MD outside analyze_sbs and routines that add devices, right?  Perhaps we would just remove sync_super from super_types or check for mddev->sync_super before calling super_types[x].sync_super?  I'll think more about it.

thanks,
 brassow

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html