On Tue, 2008-02-05 at 09:43 +0900, Tejun Heo wrote: > Hello, > > James Bottomley wrote: > >> No, it doesn't. Drain needs to extend the sg table too > > > > The patches do extend the sg table. > > Hmm... Where? It's the block component; it's already in git head with this patch: commit fa0ccd837e3dddb44c7db2f128a8bb7e4eabc21a Author: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> Date: Thu Jan 10 11:30:36 2008 -0600 block: implement drain buffers The description is pretty reasonable. The way it works is by making block add the drain buffer to the sg table as the last element. The way it works is if you simply use the length block gives you, you find yourself doing DMA to every element in the sg table except the last one. If you expand the length to anything up to blk_rq_drain_size() you then use the last element for the drain. > >> and it makes controllers lax about buffer overruns for commands they shouldn't be. > > > > So adjust the qc->nbytes to show no drain element in libata ... although > > the extra element gets mapped, the HBA will be none the wiser if there's > > zero length allocated to it. Thus the entire system behaves exactly > > like the drain element isn't present. The block layer is really just > > making the drain element available ... you don't have to use it. > > Adjusting qc->nbytes isn't enough. libata will have to trim at the end > of sglist. With both draining and padding, it means libata will have to > trim one sg entry and a few more bytes. Block layer is behaving > differently and in a very noticeable way, it's sending inconsistent > values in data_len and sg list. No, that's the point. With all the space available, you don't need to trim at all ... you simply don't use it. The rationale is the way the sg element stuff works. If you program a DMA table (PRD table in IDE speak) into a controller, it doesn't care if you don't use the last few pieces. With the default length set, the DMA controller won't use the last programmed element, but it's their if you expand the length or tell it to for an overrun. > >> Gee.. What's the deal with extra four bytes? If you're worried about > >> the size of struct request, chopping off PC request part into a separate > >> struct would be far better strategy than making what's already confusing > >> more confusing. > > > > Well, we are trying to slim down the request structure and scsi_cmnd > > structure. However, there's no inherent advantage to counting this all > > in the block layer. Think of it just as a transparent passthrough. > > You've told the block layer to make sure you have room for a drain > > element when it constructs the SG list, which it does. Whether you use > > it (by adding blk_rq_drain_size() on to the actual length you hand down) > > or not is entirely up to you ... there's no need to push the decision on > > that one into the block layer, since use or not is pretty much a cost > > free thing. > > Okay, I see your point. The biggest problem is that by doing that blk > layer breaks a pretty important rule - keeping data len consistent with > sg list and the bugs it's gonna cause will be very subtle and dangerous. > This is definitely where we can spend four extra bytes. The lengths don't have to be consistent (otherwise we wouldn't need to calculate them separately) the request reported length just has to be less than the sg table actual length. We even make use of this property in SCSI when we can't fit a request into a single command ... we are allowed to chunk up the request as we complete it. I'm not saying it's best practice, but it is acceptable behaviour. James - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html