On Fri, 2008-11-07 at 07:14 -0500, Ric Wheeler wrote: > Jens Axboe wrote: > > On Thu, Nov 06 2008, David Woodhouse wrote: > > > >> On Thu, 6 Nov 2008, James Bottomley wrote: > >> > >>> The way to do this properly would be to run a chequerboard of partials, > >>> but this would effectively have trim region tracking done in the block > >>> layer ... is this worth it? > >>> > >>> By the way, the latest (from 2 days ago) version of the Thin > >>> Provisioning proposal is here: > >>> > >>> http://www.t10.org/ftp/t10/document.08/08-149r4.pdf > >>> > >>> I skimmed it but don't see any update implying that trim might be > >>> ineffective if we align wrongly ... where is this? > >>> > >> I think we should be content to declare such devices 'broken'. > >> > >> They have to keep track of individual sectors _anyway_, and dropping > >> information for small discard requests is just careless. > >> > > > > I agree, seems pretty pointless. Lets let evolution take care of this > > issue. I have to say I'm surprised that it really IS an issue to begin > > with, are array firmwares really that silly? > > > > It's not that it would be hard to support (and it would eliminate the > > need to do discard merging in the block layer), but it seems like one of > > those things that will be of little use in even in the near future. > > Discard merging should be useful, I have no problem merging something > > like that. > > > > > I think that discard merging would be helpful (especially for devices > with more reasonable sized unmap chunks). One of the ways the unmap command is set up is with a disjoint scatterlist, so we can send a large number of unmaps together. Whether they're merged or not really doesn't matter. The probable way a discard system would work if we wanted to endure the complexity would be to have the discard system in the underlying device driver (or possibly just above it in block, but different devices like SCSI or ATA have different discard characteristics). It would just accumulate block discard requests as ranges (and it would have to poke holes in the ranges as it sees read/write requests) which it flushes periodically. The reason for doing it this way is that discards are "special" as long as we don't discard a rewritten sector, the time at which they're sent down is irrelevant to integrity and thus we can potentially accumulate over vastly different timescales than the regular block merging. If we're really going to respect this discard block size, we could accumulate the irrelevant discards the array would ignore anyway for virtually infinite time. Note, I'm not saying we *should* do this ... I think something like this would be much better done in the device ... but if we *are* going to do it, then at least lets get it right. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html