Re: [PATCHSET block#for-2.6.36-post] block: replace barrier with sequenced flush

Mike Snitzer <snitzer@xxxxxxxxxx> · Tue, 24 Aug 2010 13:52:16 -0400

On Tue, Aug 24 2010 at 12:59pm -0400,
Tejun Heo <tj@xxxxxxxxxx> wrote:

> Hello,
> 
> On 08/24/2010 12:24 PM, Kiyoshi Ueda wrote:
> > Yes, checking whether it's a transport error in lower layer is
> > the right solution.
> > (Since I know it's not available yet, I just hoped if upper layers
> >  had some other options.)
> > 
> > Anyway, only reporting errors for REQ_FLUSH to upper layer without
> > such a solution would make dm-multipath almost unusable in real world,
> > although it's better than implicit data loss.
> 
> I see.
> 
> >>> Maybe just turn off barrier support in mpath for now?
> > 
> > If it's possible, it could be a workaround for a short term.
> > But how can you do that?
> > 
> > I think it's not enough to just drop REQ_FLUSH flag from q->flush_flags.
> > Underlying devices of a mpath device may have write-back cache and
> > it may be enabled.
> > So if a mpath device doesn't set REQ_FLUSH flag in q->flush_flags, it
> > becomes a device which has write-back cache but doesn't support flush.
> > Then, upper layer can do nothing to ensure cache flush?
> 
> Yeah, I was basically suggesting to forget about cache flush w/ mpath
> until it can be fixed.  You're saying that if mpath just passes
> REQ_FLUSH upwards without retrying, it will be almost unuseable,
> right?  I'm not sure how to proceed here.

Seems clear that we must fix mpath to receive the SCSI errors, in some
form, so it can decide if a retry is required/valid or not.

Such error processing was a big selling point for the transition from
bio-based to request-based multipath; so it's unfortunate that this
piece has been left until now.

> How much work would discerning between transport and IO errors take?

Hannes already proposed some patches:
https://patchwork.kernel.org/patch/61282/
https://patchwork.kernel.org/patch/61283/
https://patchwork.kernel.org/patch/61596/

This work was discussed at LSF, see "Error Handling - Hannes Reinecke"
here: http://lwn.net/Articles/400589/

I thought James, Alasdair and others offered some guidance on what he'd
like to see...

Unfortunately, even though I was at this LSF session, I can't recall any
specific consensus on how Hannes' work should be refactored (to avoid
adding SCSI sense processing code directly in dm-mpath).  Maybe James,
Hannes or others remember?

Was it enough to just have the SCSI sense processing code split out in a
new sub-section of the SCSI midlayer -- and then DM calls that code?

> If it can't be done quickly enough the retry logic can be kept around
> to keep the old behavior but that already was a broken behavior, so...
> :-(

I'll have to review this thread again to understand why mpath's existing
retry logic is broken behavior.  mpath is used with more capable SCSI
devices so I'm missing why a failed FLUSH implies data loss.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html