Re: [PATCH 03/11] block: add rq->resid_len

Tejun Heo <tj@xxxxxxxxxx> · Wed, 13 May 2009 15:30:10 +0900

Hello, James.

James Bottomley wrote:
>> Shouldn't those be request successful w/ sense data?  Please note that
>> the term "error" in this context means failure of block layer request
>> not SCSI layer CHECK SENSE.
> 
> Heh, well, this is where we get into interpretations.  For SG_IO
> requests, we have three separate ways of returning error.  The error
> return to the block layer, the results return and the sense code.  The
> error to block is a somewhat later addition to the layer, so not all
> cases are handled right or agreed (for instance we just altered BLOCK_PC
> with recovered error from -EIO to no error).  So hopefully we've just
> decided that deferred and current but recovered all fall into the no
> error to block, but results and sense to the user.
> 
> Note that the error to block is basically discarded from SG_IO before we
> return to the user, so the user *only* has results and sense to go by,
> thus the concept of residual not valid on error to block is something
> the user can't check.  That's why a consistent definition in all cases
> (i.e. the amount of data the HBA transferred) is the correct one and
> allows userspace to make the determination of what it should do based on
> the returns it gets.

Okay, I was thinking SG_IO will return error for rq failures and I
remember pretty clearly following the failure path recently while
debugging eject problem but my memory is pretty unreliable.
Checking... oops, yeap, you're right.

>> I'm still reluctant to do it because...
>>
>> * Its definition still isn't clear (well, at least to me) and if it's
>>   defined as the number of valid bytes on request success and the
>>   number of bytes HBA transferred on request failure, I don't think
>>   it's all that useful.
> 
> It's not valid bytes in either case ... it's number transferred.  One
> can infer from a successful SCSI status code that number transferred ==
> valid bytes, but I'd rather we didn't say that.
> 
>> * Seen from userland, residue count on request failure has never been
>>   guaranteed and there doesn't seem to be any valid in kernel user.
> 
> But that's the point ... we don't define for userland what request
> failure is very well.
> 
>> * It would be extra code explicitly setting the residue count to full
>>   on failure path.  If it's something necessary, full residue count on
>>   failure needs to be made default.  If not, it will only add more
>>   confusion.
> 
> OK, so if what you're asking is that we can regard the residue as
> invalid if SG_IO itself returns an error, then I can agree ... but not
> if blk_end_request() returns error, because that error gets ignored by
> SG_IO.

I was confused that rq failure would cause error return from SG_IO.
Sorry about that.  There still is a problem tho.  Buffer for a bounced
SG_IO request is copied back on failure but when a bounced kernel PC
request fails, the data is not copied back in bio_copy_kern_endio().
This is what would break Boaz's code.

So, it seems what we should do is

1. Always copy back bounced buffer whether the request failed or not.
   Whether resid_len should be considered while copying back, I'm not
   sure about given that resid_len isn't properly implemented in some
   drivers.

2. Revert the original behavior of setting resid_len to full on
   request issue and audit the affected code paths.

How does it sound?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html