On Wed, Jul 01, 2020 at 12:51:11PM -0400, Brian Foster wrote: > Now that relog reservation is available and relog state tracking is > in place, all that remains to automatically relog items is the relog > mechanism itself. An item with relogging enabled is basically pinned > from writeback until relog is disabled. Instead of being written > back, the item must instead be periodically committed in a new > transaction to move it forward in the physical log. The purpose of > moving the item is to avoid long term tail pinning and thus avoid > log deadlocks for long running operations. > > The ideal time to relog an item is in response to tail pushing > pressure. This accommodates the current workload at any given time > as opposed to a fixed time interval or log reservation heuristic, > which risks performance regression. This is essentially the same > heuristic that drives metadata writeback. XFS already implements > various log tail pushing heuristics that attempt to keep the log > progressing on an active fileystem under various workloads. > > The act of relogging an item simply requires to add it to a > transaction and commit. This pushes the already dirty item into a > subsequent log checkpoint and frees up its previous location in the > on-disk log. Joining an item to a transaction of course requires > locking the item first, which means we have to be aware of > type-specific locks and lock ordering wherever the relog takes > place. > > Fundamentally, this points to xfsaild as the ideal location to > process relog enabled items. xfsaild already processes log resident > items, is driven by log tail pushing pressure, processes arbitrary > log item types through callbacks, and is sensitive to type-specific > locking rules by design. The fact that automatic relogging > essentially diverts items between writeback or relog also suggests > xfsaild as an ideal location to process items one way or the other. > > Of course, we don't want xfsaild to process transactions as it is a > critical component of the log subsystem for driving metadata > writeback and freeing up log space. Therefore, similar to how > xfsaild builds up a writeback queue of dirty items and queues writes > asynchronously, make xfsaild responsible only for directing pending > relog items into an appropriate queue and create an async > (workqueue) context for processing the queue. The workqueue context > utilizes the pre-reserved log reservation to drain the queue by > rolling a permanent transaction. > > Update the AIL pushing infrastructure to support a new RELOG item > state. If a log item push returns the relog state, queue the item > for relog instead of writeback. On completion of a push cycle, > schedule the relog task at the same point metadata buffer I/O is > submitted. This allows items to be relogged automatically under the > same locking rules and pressure heuristics that govern metadata > writeback. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> A note while it's still fresh in my mind: memory reclaim is going to force relogging of items whether they need it or not. The inode shrinker pushes the AIL to it's highest current LSN, which means the first shrinker invocation will relog the items. Sustained memory pressure will result in this sort of behaviour AIL AIL relog workqueue cycle 1: relog item -> move to relog queue relog item -> move to relog queue .... relog item -> move to relog queue queue work to AIL relog workqueue <sleep 20ms> iterates relog items ->relog commit cycle 2: relog item already queued marks AIL for log force relog item already queued marks AIL for log force .... relog item -> move to relog queue <sleep 20ms> cycle 3: xfs_log_force(XFS_LOG_SYNC) -> CIL flush log io log IO completes relogged items reinserted in AIL .... relog item -> move to relog queue relog item -> move to relog queue .... relog item -> move to relog queue queue work to AIL relog workqueue <sleep 20ms> iterates relog items ->relog commit <repeat> So it looks like when there is memory pressure we are going to trigger a relog every second AIL push cycle, and a synchronous log force every other log cycle. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx