>>>>> "Ted" == Theodore Tso <tytso@xxxxxxx> writes: Ted> Let's be just a *little* bit fair here. Suppose we wanted to Ted> implement thin-provisioned disks using devicemapper and LVM; Ted> consider that LVM uses a default PE size of 4M for some very good Ted> reasons. Asking filesystems to be a little smarter about Ted> allocation policies so that we allocate in existing 4M chunks Ted> before going onto the next, and asking the block layer to pool Ted> trim requests to 4M chunks is not totally unreasonable. It would also be much easier for the array folks if we never wrote anything less than 768KB and always on a 768KB boundary. Ted> Array vendors use chunk sizes > than typical filesystem chunk Ted> sizes for the same reason that LVM does. So to say that this is Ted> due to purely a "broken firmware architecture" is a little Ted> unfair. Why? What is the advantage of doing it in Linux as opposed to in the array firmware? The issue at hand here is that we'll be issuing discards/trims/unmaps and if they don't end up being multiples of 768KB starting on a 768KB boundary the array is just going to ignore the command. They expect us to keep track of what's used and what's unused within that single chunk and let them know when we've completely cleared it out. The alternative is to walk the fs metadata occasionally, look for properly aligned, completely unused chunks and them submit UNMAPs to the array. That really seems like 1980's defrag technology to me. I don't have a problem with arrays user bigger chunk sizes internally. That's fine. What I don't see if why we have to carry the burden of keeping in track of what's being used and what's not based upon some quasi-random value. Especially given that the array is going to silently ignore any UNMAP requests that it doesn't like. Array folks already have to keep track of their internal virtual to physical mapping. Why shouldn't they have to maintain a bitmap or an extent list as part of their internal metadata? Why should we have to carry that burden? And why would we want to go through all this hassle when it's not a problem for disks or (so far) for mid-range storage devices that use exactly the same command set? What I'm objecting to is not coalescing of discard requests. Or laying out filesystems intelligently. That's fine and I think we should do it (heck, I'm working on that). What I'm heavily against is having Linux carry the burden of keeping state around for stuff that's really internal to the array firmware. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html