On Wed, Jan 05, 2011 at 11:07:35PM +0100, Michael Monnerie wrote: > On Mittwoch, 5. Januar 2011 Lukas Czerner wrote: > > If we > > notice that we are running out of space in advance (how much in > > advance?), we can start trimming smaller chunks, until we reach > > reasonable a reasonable pool of reclaimed space, or until we trim > > the whole device. > > Would it be possible that all blocks that have been in use since the > last FITRIM run can be logged? Like this, we would only need to clean > those. If you have a 2TB volume, probably only 25% of it have been > rewritten (=500GB) since the last run, and of that maybe 80% are still > in use at the time we run FITRIM, so only 100GB would need the cleanup. > Maybe each AG could store a bitmap of written blocks, that are reset by > a FITRIM run. That could be an asynchronous written bitmap and shouldn't > disturb performance too much. Maybe it's even only needed to store a bit > per sunit*swidth blocks, to keep that table small. A mount option could > be used to enable that feature, so only those which use thin > provisioning or SSDs or similar devices enable it at wish. Not easily. It would need a second set of free space btrees for tracking freed but untrimmed extents. The idea of the background trim is that it doesn't need all that complexity because all the status information on where the trim process is up to can be kept in userspace. This is basically the same mode of functioning as the period background xfs_fsr defragmentation mode - run it for an hour every couple of nights,and it will slowly work it way through the entire filesystem over a period of weeks. No state or additional on-disk structures are needed for xfs_fsr to do it's work.... The background trim is intended to enable even the slowest of devices to be trimmed over time, while introducing as little runtime overhead and complexity as possible. Hence adding complexity and runtime overhead to optimise background trimming tends to defeat the primary design goal.... > Especially for 100TB size devices that seems like something that should > be thought of, as maybe if you run FITRIM once a week there, only <10TB > have been rewritten, if at all, and such a table would boost a FITRIM > run a lot. If we want optimised, only-trim-what-we-free behaviour, we need to hook into the transaction subsystem and issue TRIM commands at the time extents are actually freed. That is much more complex to implement but much easier to optimise because it doesn't require persistent state on disk. However, most devices are simply not ready to handle the flood of TRIM commands this generates, with performance degrading by ~10-20% for the best of devices and _10-100x_ for the worst... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs