On Thu, 2009-08-13 at 14:15 -0400, Greg Freemyer wrote: > On Thu, Aug 13, 2009 at 12:33 PM, <david@xxxxxxx> wrote: > > On Thu, 13 Aug 2009, Markus Trippelsdorf wrote: > > > >> On Thu, Aug 13, 2009 at 08:13:12AM -0700, Matthew Wilcox wrote: > >>> > >>> I am planning a complete overhaul of the discard work. Users can send > >>> down discard requests as frequently as they like. The block layer will > >>> cache them, and invalidate them if writes come through. Periodically, > >>> the block layer will send down a TRIM or an UNMAP (depending on the > >>> underlying device) and get rid of the blocks that have remained unwanted > >>> in the interim. > >> > >> That is a very good idea. I've tested your original TRIM implementation on > >> my Vertex yesterday and it was awful ;-). The SSD needs hundreds of > >> milliseconds to digest a single TRIM command. And since your > >> implementation > >> sends a TRIM for each extent of each deleted file, the whole system is > >> unusable after a short while. > >> An optimal solution would be to consolidate the discard requests, bundle > >> them and send them to the drive as infrequent as possible. > > > > or queue them up and send them when the drive is idle (you would need to > > keep track to make sure the space isn't re-used) > > > > as an example, if you would consider spinning down a drive you don't hurt > > performance by sending accumulated trim commands. > > > > David Lang > > An alternate approach is the block layer maintain its own bitmap of > used unused sectors / blocks. Unmap commands from the filesystem just > cause the bitmap to be updated. No other effect. > > (Big unknown: Where will the bitmap live between reboots? Require DM > volumes so we can have a dedicated bitmap volume in the mix to store > the bitmap to? Maybe on mount, the filesystem has to be scanned to > initially populate the bitmap? Other options?) I wouldn't really have it live anywhere. Discard is best effort; it's not required for fs integrity. As long as we don't discard an in-use block we're free to do anything else (including forget to discard, rediscard a discarded block etc). It is theoretically possible to run all of this from user space using the fs mappings, a bit like a defrag command. One other option would just be to scan on mount, discard everything empty and redo on next mount ... this might be just the thing for laptops. > Assuming we have a persistent bitmap in place, have a background > scanner that kicks in when the cpu / disk is idle. It just > continuously scans the bitmap looking for contiguous blocks of unused > sectors. Each time it finds one, it sends the largest possible unmap > down the block stack and eventually to the device. > > When normal cpu / disk activity kicks in, this process goes to sleep. > > That way much of the smarts are concentrated in the block layer, not > in the filesystem code. And it is being done when the disk is > otherwise idle, so you don't have the ncq interference. > > Even laptop users should have enough idle cpu available to manage > this. Enterprise would get the large discards it wants, and > unmentioned in the previous discussion, mdraid gets the large discards > it also wants. > > ie. If a mdraid raid5/raid6 volume is built of SSDs, it will only be > able to discard a full stripe at a time. Otherwise the P=D1 ^ D2 logic > is lost. > > Another benefit of the above is the code should be extremely safe and testable. Actually, I think, if we go in-kernel, the discard might be better tied into the block plugging mechanism. The real test might be no outstanding commands and queue plugged, keep plugged and begin discarding. James -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html