On Fri, Nov 07, 2008 at 01:21:49PM -0700, Matthew Wilcox wrote: > > I think we would have a full-throated discussion about whether the > right thing to do was to put the tracking in the block layer or in LVM. > Rather similar to what we're doing now, in fact. Agreed. I'm just saying that what the array vendors are pushing for is not totally unreasonable. This problem can be separated into two issues. One is whether or not trim requests have to be 4 meg (or some other size substantially bigger than filesystem block size) aligned, and the other is whether the provisioning chunk size is 4 meg. The latter still would most ideally work well with filesystems which are aware of this fact and try hard to allocate to keep as many 4 meg chunks as possible completely unused, and to try very hard to allocate using 4 meg chunks that are already partially unused. Where the trim request coalescing happens is a more interesting question. You can either do it in the filesystem, in the block device layer, or in the storage arraydevice itself. One interesting thought is that perhaps it may actually make more sense to do it in the filesystem. Since the filesystem has block allocation data structures that already tell it which blocks are in use or not, there's no point replicating that in the data array --- and so the filesystem can detect when the last 4k block in a 4 meg chunk has been freed, and then issue the trim request for the 4 meg TRIM/UNMAP request to the block array. One advantage of doing it in the filesystem is that the block allocation data structures are already journaled, and so by keying this off filesystem's block allocation structures, we won't lose any potential TRIM requests even across a reboot. (In contrast, if the block device or the storage array is managing a list of trim requests and in hopes of merging enough pieces to cover a 4 meg aligned TRIM request, the in-memory rbtree is transient and would be lost if the machine reboots.) Sure, no filesystemsdo this now, but it's a just a Small Matter of Programming --- and array vendors like EMC (cough, cough), could easily pay for some filesystem hackers to implement this for some popular Linux filesystem. It could even be a directed funding program through the Linux Foundation if EMC doesn't feel it has sufficient people who have expertise in the upstream kernel development process. :-) - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html