On Thu, Aug 13, 2009 at 12:33 PM, <david@xxxxxxx> wrote: > On Thu, 13 Aug 2009, Markus Trippelsdorf wrote: > >> On Thu, Aug 13, 2009 at 08:13:12AM -0700, Matthew Wilcox wrote: >>> >>> I am planning a complete overhaul of the discard work. Users can send >>> down discard requests as frequently as they like. The block layer will >>> cache them, and invalidate them if writes come through. Periodically, >>> the block layer will send down a TRIM or an UNMAP (depending on the >>> underlying device) and get rid of the blocks that have remained unwanted >>> in the interim. >> >> That is a very good idea. I've tested your original TRIM implementation on >> my Vertex yesterday and it was awful ;-). The SSD needs hundreds of >> milliseconds to digest a single TRIM command. And since your >> implementation >> sends a TRIM for each extent of each deleted file, the whole system is >> unusable after a short while. >> An optimal solution would be to consolidate the discard requests, bundle >> them and send them to the drive as infrequent as possible. > > or queue them up and send them when the drive is idle (you would need to > keep track to make sure the space isn't re-used) > > as an example, if you would consider spinning down a drive you don't hurt > performance by sending accumulated trim commands. > > David Lang An alternate approach is the block layer maintain its own bitmap of used unused sectors / blocks. Unmap commands from the filesystem just cause the bitmap to be updated. No other effect. (Big unknown: Where will the bitmap live between reboots? Require DM volumes so we can have a dedicated bitmap volume in the mix to store the bitmap to? Maybe on mount, the filesystem has to be scanned to initially populate the bitmap? Other options?) Assuming we have a persistent bitmap in place, have a background scanner that kicks in when the cpu / disk is idle. It just continuously scans the bitmap looking for contiguous blocks of unused sectors. Each time it finds one, it sends the largest possible unmap down the block stack and eventually to the device. When normal cpu / disk activity kicks in, this process goes to sleep. That way much of the smarts are concentrated in the block layer, not in the filesystem code. And it is being done when the disk is otherwise idle, so you don't have the ncq interference. Even laptop users should have enough idle cpu available to manage this. Enterprise would get the large discards it wants, and unmentioned in the previous discussion, mdraid gets the large discards it also wants. ie. If a mdraid raid5/raid6 volume is built of SSDs, it will only be able to discard a full stripe at a time. Otherwise the P=D1 ^ D2 logic is lost. Another benefit of the above is the code should be extremely safe and testable. Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer Preservation and Forensic processing of Exchange Repositories White Paper - <http://www.norcrossgroup.com/forms/whitepapers/tng_whitepaper_fpe.html> The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html