Matthew Wilcox wrote: > On Tue, Aug 12, 2008 at 04:38:48PM -0400, Knight, Frederick wrote: >> I don't see how it doesn't match T13 TRIM command? Both can do single >> ranges. In both cases, you can have 1 LBA and 1 length. There is >> nothing requiring > 1 range to be sent via the SCSI proposal. In both >> cases, you pass the same values to the H/W driver. In one H/W driver it >> will load a bunch of values (including the LBA/length) into a set of >> registers (PATA) of a memory structure (SATA). In the other H/W driver, >> it will load a bunch of values into memory structures (CDB/buffer), and >> then tweek the H/W to send the memory structures. > > If you consider a SATL implemented in an array device, it can receive a > PUNCH command with multiple ranges. It must then send multiple TRIM > commands, one for each range. > > The proposal also suboptimal if the common case is just one range. The SCSI > driver has to allocate a 20-byte block and do a DATA OUT command. > >> Most SCSI drivers I've seen that have tagged queuing enabled turn off >> their elevator algorithms (since the drive itself is doing it's own >> optimizations) > > In Linux, we try not to have elevators in the device drivers themselves > (though I believe there are still a few which have their own). Instead we > have an elevator in the block layer where typically we have much more > information about which IOs can be merged and which IOs cannot pass > each other, which OS process submitted the IO (and hence can do fair > scheduling between different users) and so on. > > Each request queue (~= SCSI LUN) can choose which elevator controls its > behaviour, so if it works out better to have the drive do the scheduling, > it can be disabled by switching to the noop elevator. This is not completely true: the generic elevator code does attempt some merge tries, and the NOOP I/O scheduler also performs a primitive sort. Recent kernels have the "nomerges" tunable added under /sys/block/*/queue which can turn off the more complicated merge attempts (for any scheduler). > >> There is no difference at the filesystem de-allocator level. The only >> difference is how the H/W sends the values to the other end of the wire, >> and there will always be differences at that layer. > > I think Dave's point is that batching all the discards together into one > list isn't a natural interface for a filesystem; they prefer an > interface which is a single extent. Is it expected that the file system code would emit PUNCH directives in "specially marked" struct bio's through the block I/O storage system? Then the I/O schedulers would be responsible for discriminating between PUNCH bio's and "normal" read/write bio's when it performed merging (and sorting?). In either case, would the block I/O layer then build "specially marked" PUNCH requests to the underlying physical drivers? Alan -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html