Re: potentially lost largeish raid5 array..

Alexander Lyakas <alex.bolshoy@xxxxxxxxx> · Sat, 5 Nov 2011 14:17:50 +0200

Hi Neil,
I wanted to ask how the mentoring you're willing to offer would work,
on your opinion?
I had much fun developing for that *other* operating system kernel,
and I am looking forward to do things for Linux kernel too some day.

My pain at present, is the lack of kernel printouts during various
config sequences. I have been trying to follow, for example, the
ADD_NEW_DISK sequence and the slot_store() sequence, to understand how
it detects whether a full-sync or not will be required. And there are
many possibilities before you figure out the right "if"
(non-persistent arrays, containers, external metadata....).

I am used to heavily print-around all config (non-IO-path) sequences
(such as ADD_NEW_DISK). This also helps if a particular step in the
sequence takes a long time, due to a slow IO or something like that.
What will be the pros/cons of doing that in md on your opinion?

Thanks!
Alex.

On Sun, Sep 25, 2011 at 12:10 PM, NeilBrown <neilb@xxxxxxx> wrote:
> On Sat, 24 Sep 2011 23:57:58 +0200 Aapo Laine <aapo.laine@xxxxxxxxxxxxx>
> wrote:
>
>> On 09/23/11 11:15, NeilBrown wrote:
>> > On Fri, 23 Sep 2011 02:09:36 -0600 Thomas Fjellstrom<thomas@xxxxxxxxxxxxx>
>> > wrote:
>> >> I forgot to say, but: Thank you very much :) for the help, and your tireless
>> >> work on md.
>> >>
>> >
>> > You've very welcome .... but I felt I needed to respond to that word
>> > "tireless".
>> > The truth is that I am getting rather tired of md .... if anyone knows anyone
>> > who wants to get into kernel development and is wondering where to start -
>> > please consider whispering 'the md driver' in their ear.  Plenty to do, great
>> > mentoring possibilities, and competent linux kernel engineers with good
>> > experience are unlikely to have much trouble finding a job ;-)
>> >
>> > NeilBrown
>>
>> Whoa this is shocking news!
>
> Hopefully not too shocking...  I'm not planning on leaving md any time soon.
> I do still enjoy working on it.
> But it certainly isn't as fresh and new as it was 10 years again.  It would
> probably do both me and md a lot of good to have someone with new enthusiasm
> and new perspectives...
>
>
>>
>> Firstly then, thank you so much for your excellent work up to now. Linux
>> has what I believe to be the best software raid of all operating systems
>> thanks to you. Excellent in both features, and reliability i.e. quality
>> of code.
>>
>> And also the support through the list was great. I found so many
>> problems solved already just by looking at the archives... so many
>> people with their arse saved by you.
>>
>> I think everybody here is sorry to see you willing to go.
>>
>> Now the bad news... Regarding the MD takeover, there I think I see a
>> problem...
>> The MD code is very tersely commented, compared to its complexity!
>
> That is certainly true, but seems to be true across much of the kernel, and
> probably most code in general (though I'm sure such a comment will lead to
> people wanting to tell me their favourite exceptions ... so I'll start with
> "TeX").
>
> This is one of the reasons I offered "mentoring" to any likely candidate.
>
>
>
>>
>> - there is not much explanation of overall strategies, or the structure
>> of code. Also the flow of data between the functions is not much
>> explained most of the times.
>>
>> - It's not obvious to me what is the entry point for kernel processes
>> related to MD arrays, how are they triggered and where do they run...
>> E.g. in the past I tried to understand how did resync work, but I
>> couldn't. I thought there was a kernel process controlling resync
>> advancement, but I couldn't really find the boundaries of code inside
>> which it was executing.
>
> md_do_sync() is the heart of the resync process.  it calls into the
> personality's sync_request() function.
>
> The kernel thread is started by md_check_recovery() if it appears to be
> needed.  md_check_recovery() is regularly run by each personality's main
> controlling thread.
>
>>
>> - it's not clear what the various functions do or in what occasion they
>> are called. Except from their own name, most of them have no comments
>> just before the definition.
>
> How about this:
>  - you identify some functions for which the purpose or use isn't clear
>  - I'll explain to you when/how/why they are used
>  - You create a patch which adds comments which explains it all
>  - I'll apply that patch.
>
> deal??
>
>>
>> - the algoritms within the functions are very long and complex, but only
>> rarely they are explained by comments. I am now seeing pieces having 5
>> levels of if's nested one inside the other, and there are almost no
>> comments.
>
> I feel your pain.   I really should factor out the deeply nested levels into
> separate functions.  Sometimes I have done that but there is plenty more do
> to.  Again, I would be much more motivated to do this if I were working with
> someone who would be directly helped by it.  So if you identify specific
> problems, it'll be a lot easier for me to help fix them.
>
>
>>
>> - last but not least, variables have very short names, and for most of
>> them, it is not explained what they mean. This is mostly for local
>> variables, but sometimes even for the structs which go into metadata
>> e.g. in struct r1_private_data_s most members do not have an
>> explanation. This is pretty serious, to me at least, for understanding
>> the code.
>
> Does this help?
>
> diff --git a/drivers/md/raid1.h b/drivers/md/raid1.h
> index e0d676b..feb44ad 100644
> --- a/drivers/md/raid1.h
> +++ b/drivers/md/raid1.h
> @@ -28,42 +28,67 @@ struct r1_private_data_s {
>        mddev_t                 *mddev;
>        mirror_info_t           *mirrors;
>        int                     raid_disks;
> +
> +       /* When choose the best device for a read (read_balance())
> +        * we try to keep sequential reads one the same device
> +        * using 'last_used' and 'next_seq_sect'
> +        */
>        int                     last_used;
>        sector_t                next_seq_sect;
> +       /* During resync, read_balancing is only allowed on the part
> +        * of the array that has been resynced.  'next_resync' tells us
> +        * where that is.
> +        */
> +       sector_t                next_resync;
> +
>        spinlock_t              device_lock;
>
> +       /* list of 'r1bio_t' that need to be processed by raid1d, whether
> +        * to retry a read, writeout a resync or recovery block, or
> +        * anything else.
> +        */
>        struct list_head        retry_list;
> -       /* queue pending writes and submit them on unplug */
> -       struct bio_list         pending_bio_list;
>
> -       /* for use when syncing mirrors: */
> +       /* queue pending writes to be submitted on unplug */
> +       struct bio_list         pending_bio_list;
>
> +       /* for use when syncing mirrors:
> +        * We don't allow both normal IO and resync/recovery IO at
> +        * the same time - resync/recovery can only happen when there
> +        * is no other IO.  So when either is active, the other has to wait.
> +        * See more details description in raid1.c near raise_barrier().
> +        */
> +       wait_queue_head_t       wait_barrier;
>        spinlock_t              resync_lock;
>        int                     nr_pending;
>        int                     nr_waiting;
>        int                     nr_queued;
>        int                     barrier;
> -       sector_t                next_resync;
> -       int                     fullsync;  /* set to 1 if a full sync is needed,
> -                                           * (fresh device added).
> -                                           * Cleared when a sync completes.
> -                                           */
> -       int                     recovery_disabled; /* when the same as
> -                                                   * mddev->recovery_disabled
> -                                                   * we don't allow recovery
> -                                                   * to be attempted as we
> -                                                   * expect a read error
> -                                                   */
>
> -       wait_queue_head_t       wait_barrier;
> +       /* Set to 1 if a full sync is needed, (fresh device added).
> +        * Cleared when a sync completes.
> +        */
> +       int                     fullsync
>
> -       struct pool_info        *poolinfo;
> +       /* When the same as mddev->recovery_disabled we don't allow
> +        * recovery to be attempted as we expect a read error.
> +        */
> +       int                     recovery_disabled;
>
> -       struct page             *tmppage;
>
> +       /* poolinfo contains information about the content of the
> +        * mempools - it changes when the array grows or shrinks
> +        */
> +       struct pool_info        *poolinfo;
>        mempool_t *r1bio_pool;
>        mempool_t *r1buf_pool;
>
> +       /* temporary buffer to synchronous IO when attempting to repair
> +        * a read error.
> +        */
> +       struct page             *tmppage;
> +
> +
>        /* When taking over an array from a different personality, we store
>         * the new thread here until we fully activate the array.
>         */
>
>>
>> - ... maybe more I can't think of right now ...
>>
>> Your code is of excellent quality, as I wrote, I wish there were more
>> programmers like you, but if you now want to leave, THEN I start to be
>> worried! Would you please comment it (much) more before leaving? Fully
>> understanding your code I think is going to take other people a lot of
>> time otherwise, and you might not find a replacement easily and/or s/he
>> might do mistakes.
>
> I'm not planning on leaving - not for quite some time anyway.
> But I know the code so well that it is hard to see which bits need
> documenting, and what sort of documentation would really help.
> I would love it if you (or anyone) would review the code and point to parts
> that particularly need improvement.
>
>
>>
>> There were times in the past when I had ideas and I wanted to contribute
>> code, but when I looked inside MD and tried to understand where should I
>> put my changes, I realized I wasn't able to understand what current code
>> was doing. Maybe I am not a good enough C programmer, but I was able to
>> change things in other occasions.
>
> Don't be afraid to ask... But sometimes you do need a bit of persistence
> though. :-)  Not always easy to find time for that.
>
>
>>
>>
>> I hope you won't get these critiques bad...
>
> Not at all.
>
>> and thanks for all your efforts, really, in the name of, I think, everybody.
>> Aapo L.
>
> Thanks for your valuable feedback.
> Being able to see problems is of significant value.  One of the reasons that
> I pay close attention to this list is because it shows me where the problems
> with md and mdadm are.  People often try things that I would never even dream
> of trying (because I know they won't work).  See this helps me know where the
> code and be improved - either so what they try does work, or so it fails more
> gracefully and helpfully.
>
> Thanks,
> NeilBrown
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html