Re: [RFC PATCH 1/3] md/isrt: base infrastructure and metadata loading

NeilBrown <neilb@xxxxxxx> · Fri, 25 Apr 2014 09:44:54 +1000

On Thu, 24 Apr 2014 10:33:31 -0700 Dan Williams <dan.j.williams@xxxxxxxxx>
wrote:

> On Thu, Apr 24, 2014 at 1:02 AM, NeilBrown <neilb@xxxxxxx> wrote:
> > On Thu, 24 Apr 2014 00:38:01 -0700 Dan Williams <dan.j.williams@xxxxxxxxx>
> > wrote:
> >
> >> On Thu, Apr 24, 2014 at 12:24 AM, NeilBrown <neilb@xxxxxxx> wrote:
> >> > On Wed, 23 Apr 2014 23:18:49 -0700 Dan Williams <dan.j.williams@xxxxxxxxx>
> >> > wrote:
> >> >
> >> >> Initial md / block boilerplate for the Intel (R) Smart Response
> >> >> Technology compatibility driver.  Supports reading the packed  metadata
> >> >> and parsing it into a cache lookup tree.
> >> >>
> >> >> Cc: Dave Jiang <dave.jiang@xxxxxxxxx>
> >> >> Cc: Artur Paszkiewicz <artur.paszkiewicz@xxxxxxxxx>
> >> >> Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> >> >
> >> > It would really help in reviewing this to have a glossary.
> >> >
> >> > There are frames and segments and sectors and pages.
> >> >
> >> > I hope sectors are 512 bytes and pages are PAGE_SIZE, which may or may not be
> >> > 4096.
> >> >
> >> > And there are 16 sectors per frame, so I guess space is allocated in the
> >> > cache in 8K aligned frames ??
> >> >
> >> > There are 64 segments per page so if pages did happen to be 4096 bytes, that
> >> > makes 64 bytes per segment.  What are they?
> >> >
> >> > There is a list somewhere of 32byte frame descriptors which is read into a
> >> > single vmalloced region (why? you keep page pointers, so why not read it into
> >> > separate pages?)
> >> > How is this organised?  I might be able to work that out from the code, but
> >> > I'd rather not.
> >> >
> >> > Please don't make me guess, I'm not good at it.
> >> >
> >> > I guess it didn't help that diff out the header after the code.  I got bored
> >> > before I got there and didn't read all to words, so maybe some answers are in
> >> > there.  They don't really stand out though.
> >>
> >> No, they don't.  Let me throw together a proper cheat sheet.
> >
> > Thanks.
> >
> >
> >>
> >> > You've chosen '8' for the 'level' number.
> >>
> >> Hmm, ok.  I use -12 in the mdadm bits, I neglected to go back and fix
> >> up the kernel.
> >>
> >> > As this is an array which doesn't have redundancy, I'd rather a number <= 0.
> >> > I think there are places where I assume >=1 has redundancy and understands
> >> > spares etc.
> >> >
> >> > Should conf->count be a kref??? Just a thought, not a requirement.
> >>
> >> Doesn't kref == 0 imply object destroyed?  It's a count of pending
> >> metadata events.
> >
> > kref means that in a kobject.  Elsewhere it means whatever you want.
> >
> > mpb_read_endio would kref_put(&conf->ref, release)
> >
> > where release would get the conf and wake_up(&conf->eventq);
> >
> > It probably isn't a big win..
> 
> Yes, but I should train my brain that kref != object lifetime, it's
> just a ref...
> 
> ...or is it?
> static inline void kref_get(struct kref *kref)
> {
>         /* If refcount was 0 before incrementing then we have a race
>          * condition when this kref is freeing by some other thread right now.
>          * In this case one should use kref_get_unless_zero()
>          */
>         WARN_ON_ONCE(atomic_inc_return(&kref->refcount) < 2);
> }
> 
> Seems we would need kref_get_zero_ok(), right?

That doesn't sound like a good idea, the only value in using kref would be if
it made the code more obvious.  That wouldn't :-(

So I looked more closely at the code, and now wonder why 'count' is in 'conf'
at all.
You don't really need a global count of requests at all.
There two places where it is used.

One is in isrt_mpb_read() where a single bio is submitted and waited for.
That could use submit_bio_wait() (not 'should', just 'could').

The other is in isrt_read_packed_md() where multiple bios are submitted and
then waited for.

In both cases the counter (and wait queue) could be on the stack (like the
completion is in submit_bio_wait().

struct multi_complete { atomic_t count; wait_queue_head_t wait; unsigned long
state; };

static void multi_read_endio(struct bio *bio, int error)
{
   struct multi_complete *mc = bio->private;
   if (error || !test_bit(BIO_UPTODATE(&bio->bi_flags)))
      set_bit(ISRT_ERROR, &mc->state);
   if (atomic_dec_and_test(&mc->count))
      wake_up(&mc->wait);
   bio_put(bio);
}

isrt_read_packed_md(struct mddev *mdev)
{
   ...
   struct multi_complete mc = { ATOMIC_INIT(0),
           __WAIT_QUEUE_HEAD_INITIALIZER(mc.wait), 0};
   ....
      atomic_inc(&mc,count);
      bio->bi_private = &mc;
      submit_bio(READ, bio);
   .....

   wait_event(&mc.wait, atomic_read(&mc.count)==0);

   ....
}

It isn't really a big improvement, so maybe it isn't worth it.
But it does make it obvious that we are only waiting for the reads that we
just submitted.  With the current code I see conf->count and wonder what else
we could be waiting for.

Up to you - maybe just leave it as it is.
(But please use 'READ', not '0' as the first arg to submit_bio).

NeilBrown
Attachment:
signature.asc

Description: PGP signature