Re: [PATCH 1/2] lld busy status exporting interface

Kiyoshi Ueda <k-ueda@xxxxxxxxxxxxx> · Fri, 19 Sep 2008 19:11:22 -0400 (EDT)

Hi Andrew,

On Fri, 19 Sep 2008 14:33:44 -0700, Andrew Morton wrote:
> On Fri, 19 Sep 2008 10:48:54 -0400 (EDT)
> Kiyoshi Ueda <k-ueda@xxxxxxxxxxxxx> wrote:
> 
> > This patch adds an interface to check lld's busy status
> > from the block layer.
> > This resolves a performance problem on request stacking devices below.
> > 
> > 
> > Some drivers like scsi mid layer stop dispatching request when
> > they detect busy state on its low-level device like host/bus/device.
> > It allows other requests to stay in the I/O scheduler's queue
> > for a chance of merging.
> > 
> > Request stacking drivers like request-based dm should follow
> > the same logic.
> > However, there is no generic interface for the stacked device
> > to check if the underlying device(s) are busy.
> > If the request stacking driver dispatches and submits requests to
> > the busy underlying device, the requests will stay in
> > the underlying device's queue without a chance of merging.
> > This causes performance problem on burst I/O load.
> > 
> > With this patch, busy state of the underlying device is exported
> > via the state flag of queue's backing_dev_info.  So the request
> > stacking driver can check it and stop dispatching requests if busy.
> > 
> > The underlying device driver must set/clear the flag appropriately:
> >    ON:  when the device driver can't process requests immediately.
> >    OFF: when the device driver can process requests immediately,
> >         including abnormal situations where the device driver needs
> >         to kill all requests.
> > 
> > 
> > Signed-off-by: Kiyoshi Ueda <k-ueda@xxxxxxxxxxxxx>
> > Signed-off-by: Jun'ichi Nomura <j-nomura@xxxxxxxxxxxxx>
> > Cc: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
> > ---
> >  include/linux/backing-dev.h |    8 ++++++++
> >  mm/backing-dev.c            |   12 ++++++++++++
> >  2 files changed, 20 insertions(+)
> > 
> > Index: scsi-misc-2.6/include/linux/backing-dev.h
> > ===================================================================
> > --- scsi-misc-2.6.orig/include/linux/backing-dev.h
> > +++ scsi-misc-2.6/include/linux/backing-dev.h
> > @@ -26,6 +26,7 @@ enum bdi_state {
> >  	BDI_pdflush,		/* A pdflush thread is working this device */
> >  	BDI_write_congested,	/* The write queue is getting full */
> >  	BDI_read_congested,	/* The read queue is getting full */
> > +	BDI_lld_congested,	/* The device/host is busy */
> >  	BDI_unused,		/* Available bits start here */
> >  };
> >  
> > @@ -226,8 +227,15 @@ static inline int bdi_rw_congested(struc
> >  				  (1 << BDI_write_congested));
> >  }
> >  
> > +static inline int bdi_lld_congested(struct backing_dev_info *bdi)
> > +{
> > +	return bdi_congested(bdi, 1 << BDI_lld_congested);
> > +}
> > +
> >  void clear_bdi_congested(struct backing_dev_info *bdi, int rw);
> >  void set_bdi_congested(struct backing_dev_info *bdi, int rw);
> > +void clear_bdi_lld_congested(struct backing_dev_info *bdi);
> > +void set_bdi_lld_congested(struct backing_dev_info *bdi);
> >  long congestion_wait(int rw, long timeout);
> >  
> >  
> > Index: scsi-misc-2.6/mm/backing-dev.c
> > ===================================================================
> > --- scsi-misc-2.6.orig/mm/backing-dev.c
> > +++ scsi-misc-2.6/mm/backing-dev.c
> > @@ -279,6 +279,18 @@ void set_bdi_congested(struct backing_de
> >  }
> >  EXPORT_SYMBOL(set_bdi_congested);
> >  
> > +void clear_bdi_lld_congested(struct backing_dev_info *bdi)
> > +{
> > +	clear_bit(BDI_lld_congested, &bdi->state);
> > +}
> > +EXPORT_SYMBOL_GPL(clear_bdi_lld_congested);
> > +
> > +void set_bdi_lld_congested(struct backing_dev_info *bdi)
> > +{
> > +	set_bit(BDI_lld_congested, &bdi->state);
> > +}
> > +EXPORT_SYMBOL_GPL(set_bdi_lld_congested);
> > +
> >  /**
> >   * congestion_wait - wait for a backing_dev to become uncongested
> >   * @rw: READ or WRITE
> 
> Is this really the right way to do it?

I think so, but I may not understand what you mean correctly.
So please elaborate your concern if my explanation below doesn't
satisfy what you want to know.

> Back in the days when we first did the backing_dev_info.congested_fn()
> logic it was decided that there basically was no single place in which
> the congested state could be stored.
> 
> So we ended up deciding that whenever a caller wants to know a
> backing_dev's congested status, it has to call in to the
> ->congested_fn() and that congested_fn would then call down into all
> the constituent low-level drivers/queues/etc asking each one if it is
> congested.

bdi_lld_congested() also does that using bdi_congested(), which calls
->congested_fn().
And only real device drivers (e.g. scsi, ide) set/clear the flag.
Stacking drivers like request-based dm don't.
So stacking drivers always check the BDI_lld_congested flag of
the bottom device of the device stack.

BDI_[write|read]_congested flags have been using for queue's
congestion, so that I/O queueing/merging can be proceeded even if
the lld is congested.  So I added a new flag.

> I mean, as a simple example which is probably wrong - what happens if a
> single backing_dev is implemented via two different disks and
> controllers, and they both become congested and then one of them comes
> uncongested.  Is there no way in which the above implemention can
> incorrectly flag the backing_dev as being uncongested?

Do you mean that "a single backing_dev via two disks/controllers" is
a dm device (e.g. a dm-multipath device having 2 paths, a dm-mirror
device having 2 disks)?

If so, dm doesn't set/clear the flag, and the decision, whether
the dm device itself is congested or not, depends on dm's target driver.

For example of dm-multipath,
  o call bdi_lld_congested() for each path.
  o if one of the paths are uncongested, dm-multipath will decide
    the dm device is uncongested and dispatch incoming I/Os to
    the uncongested path.

For example of dm-mirror,
  o call bdi_lld_congested() for each disk.
  o if the incoming I/O is READ, same logic as dm-multipath above.
    if the incoming I/O is WRITE, dm-mirror will decide the dm device
    is uncongested only when all disks are uncongested.

Thanks,
Kiyoshi Ueda
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html