On Fri, Sep 17, 2010 at 08:21:40AM -0500, Alex Elder wrote: > On Tue, 2010-09-14 at 20:55 +1000, Dave Chinner wrote: > > This patchset has grown quite a bit - it started out as a "convert > > the buffer cache to rbtrees" patch, and has gotten bigger as I > > peeled the onion from one bottleneck to another. > > I know you're going to re-submit this series. I would > like you to split it into several smaller series if you > don't mind. Some of these are simpler than others, > and there are some somewhat logical groupings (you > even described them here as two sets). But beyond > that it would be nice to get at least some of them > committed before the full series is perfected. > > To be constructive, here's a grouping based on what > seems to be a change of significance somehow. I'm > not suggesting they all be separated, but I'm just > trying to identify the many things you're doing with > this series. > > [01/18] xfs: single thread inode cache shrinking. This is separate, and I'm completely reworking this patch. Fundamentally, single threading the shrinker is not robust enough to prevent OOM situations because reclaim relies on the shrinkers to throttle reclaim sufficiently that some memory is reclaimed while they are running. Hence every shrinker call needs to do something, otherwise the direct reclaim priority will wind up very quickly and declare OOM instead of waiting for IO completion to clean and free objects. I've introduced per-ag reclaim walk locks to prevent multiple shrinkers from walking the same AG, and this has prevented most of the lock contention. instead, the shrinkers burn CPU walking instead. hence I've needed to add batched lookups (new patch) and optimise the way we check whether an inode is reclaimable by avoiding locking the inode first. I've also added a scan cursor that tracks where the last reclaim walk got up to, and restarts the next shrinker reclaim walk from that inode. This means it is not walkiagn over the same unreclaimable inodes over and over again. Then I've changed shrinker reclaim to be sort-of-blocking. That is, it doesn't block on locks (because that would cause deadlocks), but it does does SYNC_WAIT inode cluster IO when it finds a dirty inode. This provides sufficient throttling for reclaim not to OOM. To make matters even more complex, the inode cache shrinker was being overloaded by the VM because the buffer cache lru shrinker was not being asked to do enough work. As a result, the buffer cache would grow to 1.5GB, and the VM would spent all it's time trying to shrinker the inode cache! The cause of this was leaving stale buffers on the LRU, so avoiding leaving them on the LRU has prevented buffer cache windup on parallel unlink workloads. The reclaim code is not as efficient as the "just do a full, single threaded non-blocking reclaim pass" code, but it is still ~4x faster than the existing code and does not randomly go OOM on parallel unlink workloads. > [02/18] xfs: reduce the number of CIL lock round trips during commit Stand-alone fix. > [05/18] xfs: convert inode cache lookups to use RCU locking > [06/18] xfs: convert pag_ici_lock to a spin lock So up to this point I now have: d10d7d6 xfs: reduce the number of CIL lock round trips during commit e9a1a2a xfs: remove debug assert for per-ag reference counting 0a055a3 xfs: lockless per-ag lookups 3403f75 xfs: convert inode cache lookups to use RCU locking 84c5b79 xfs: convert pag_ici_lock to a spin lock 033bc9b xfs: batch per-ag inode lookup walks (NEW) c5bbc30 xfs: rework inode cache shrinking. (NEW) > [07/18] xfs: don't use vfs writeback for pure metadata modifications Standalone. > [08/18] xfs: rename xfs_buf_get_nodaddr to be more appropriate > [09/18] xfs: introduced uncached buffer read primitve > [10/18] xfs: store xfs_mount in the buftarg instead of in the xfs_buf > [11/18] xfs: kill XBF_FS_MANAGED buffers > [12/18] xfs: use unhashed buffers for size checks > [13/18] xfs: remove buftarg hash for external devices cleanups needed for rbtree conversion. > [03/18] xfs: remove debug assert for per-ag reference counting > [04/18] xfs: lockless per-ag lookups Stand alone. > [14/18] xfs: convert buffer cache hash to rbtree Goes with the cleanups. > [15/18] xfs; pack xfs_buf structure more tightly > [16/18] xfs: convert xfsbud shrinker to a per-buftarg shrinker. > [17/18] xfs: add a lru to the XFS buffer cache > [18/18] xfs: stop using the page cache to back the buffer cache All together, with the LRU code being reworked a bit w.r.t. stale buffers and shrinker behaviour. In reality, though, i don't think that separating them into separate series make much sense. The order they are in right now is bisectable and fairly logical.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs