This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "XFS development tree". The branch, master has been updated d0eb2f3 xfs: convert grant head manipulations to lockless algorithm 3f16b98 xfs: introduce new locks for the log grant ticket wait queues c8a09ff xfs: convert log grant heads to atomic variables 1c3cb9e xfs: convert l_tail_lsn to an atomic variable. 84f3c68 xfs: convert l_last_sync_lsn to an atomic variable 2ced19c xfs: make AIL tail pushing independent of the grant lock eb40a87 xfs: use wait queues directly for the log wait queues a69ed03 xfs: combine grant heads into a single 64 bit integer 663e496 xfs: rework log grant space calculations 3f336c6 xfs: fact out common grant head/log tail verification code 1054794 xfs: convert log grant ticket queues to list heads 9552e7f xfs: use AIL bulk delete function to implement single delete e605994 xfs: use AIL bulk update function to implement single updates 3013683 xfs: remove all the inodes on a buffer from the AIL in bulk c90821a xfs: consume iodone callback items on buffers as they are processed e677d0f xfs: reduce the number of AIL push wakeups 0e57f6a xfs: bulk AIL insertion during transaction commit eb3efa1 xfs: clean up xfs_ail_delete() b199c8a xfs: Pull EFI/EFD handling out from under the AIL lock 9c5f841 xfs: fix EFI transaction cancellation. 821eb21 xfs: connect up buffer reclaim priority hooks 430cbeb xfs: add a lru to the XFS buffer cache ff57ab2 xfs: convert xfsbud shrinker to a per-buftarg shrinker. 1a427ab xfs: convert pag_ici_lock to a spin lock 1a3e8f3 xfs: convert inode cache lookups to use RCU locking d95b7aa xfs: rcu free inodes 6e85756 xfs: don't truncate prealloc from frequently accessed inodes 055388a xfs: dynamic speculative EOF preallocation 622d814 xfs: use KM_NOFS for allocations during attribute list operations dcfcf20 xfs: provide a inode iolock lockdep class from 489a150f6454e2cd93d9e0ee6d7c5a361844f62a (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit d0eb2f38b250b7d6c993adf81b0e4ded0565497e Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Dec 21 12:29:14 2010 +1100 xfs: convert grant head manipulations to lockless algorithm The only thing that the grant lock remains to protect is the grant head manipulations when adding or removing space from the log. These calculations are already based on atomic variables, so we can already update them safely without locks. However, the grant head manpulations require atomic multi-step calculations to be executed, which the algorithms currently don't allow. To make these multi-step calculations atomic, convert the algorithms to compare-and-exchange loops on the atomic variables. That is, we sample the old value, perform the calculation and use atomic64_cmpxchg() to attempt to update the head with the new value. If the head has not changed since we sampled it, it will succeed and we are done. Otherwise, we rerun the calculation again from a new sample of the head. This allows us to remove the grant lock from around all the grant head space manipulations, and that effectively removes the grant lock from the log completely. Hence we can remove the grant lock completely from the log at this point. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 3f16b9850743b702380f098ab5e0308cd6af1792 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Dec 21 12:29:01 2010 +1100 xfs: introduce new locks for the log grant ticket wait queues The log grant ticket wait queues are currently protected by the log grant lock. However, the queues are functionally independent from each other, and operations on them only require serialisation against other queue operations now that all of the other log variables they use are atomic values. Hence, we can make them independent of the grant lock by introducing new locks just to protect the lists operations. because the lists are independent, we can use a lock per list and ensure that reserve and write head queuing do not contend. To ensure forced shutdowns work correctly in conjunction with the new fast paths, ensure that we check whether the log has been shut down in the grant functions once we hold the relevant spin locks but before we go to sleep. This is needed to co-ordinate correctly with the wakeups that are issued on the ticket queues so we don't leave any processes sleeping on the queues during a shutdown. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit c8a09ff8ca2235bccdaea8a52fbd5349646a8ba4 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Sat Dec 4 00:02:40 2010 +1100 xfs: convert log grant heads to atomic variables Convert the log grant heads to atomic64_t types in preparation for converting the accounting algorithms to atomic operations. his patch just converts the variables; the algorithmic changes are in a separate patch for clarity. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 1c3cb9ec07fabf0c0970adc46fd2a1f09c1186dd Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Dec 21 12:28:39 2010 +1100 xfs: convert l_tail_lsn to an atomic variable. log->l_tail_lsn is currently protected by the log grant lock. The lock is only needed for serialising readers against writers, so we don't really need the lock if we make the l_tail_lsn variable an atomic. Converting the l_tail_lsn variable to an atomic64_t means we can start to peel back the grant lock from various operations. Also, provide functions to safely crack an atomic LSN variable into it's component pieces and to recombined the components into an atomic variable. Use them where appropriate. This also removes the need for explicitly holding a spinlock to read the l_tail_lsn on 32 bit platforms. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> commit 84f3c683c4d3f36d3c3ed320babd960a332ac458 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Fri Dec 3 22:11:29 2010 +1100 xfs: convert l_last_sync_lsn to an atomic variable log->l_last_sync_lsn is updated in only one critical spot - log buffer Io completion - and is protected by the grant lock here. This requires the grant lock to be taken for every log buffer IO completion. Converting the l_last_sync_lsn variable to an atomic64_t means that we do not need to take the grant lock in log buffer IO completion to update it. This also removes the need for explicitly holding a spinlock to read the l_last_sync_lsn on 32 bit platforms. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 2ced19cbae5448b720919a494606c62095d4f4db Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Dec 21 12:09:20 2010 +1100 xfs: make AIL tail pushing independent of the grant lock The xlog_grant_push_ail() currently takes the grant lock internally to sample the tail lsn, last sync lsn and the reserve grant head. Most of the callers already hold the grant lock but have to drop it before calling xlog_grant_push_ail(). This is a left over from when the AIL tail pushing was done in line and hence xlog_grant_push_ail had to drop the grant lock. AIL push is now done in another thread and hence we can safely hold the grant lock over the entire xlog_grant_push_ail call. Push the grant lock outside of xlog_grant_push_ail() to simplify the locking and synchronisation needed for tail pushing. This will reduce traffic on the grant lock by itself, but this is only one step in preparing for the complete removal of the grant lock. While there, clean up the formatting of xlog_grant_push_ail() to match the rest of the XFS code. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit eb40a87500ac2f6be7eaf8ebb35610e6d0e60e9a Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Dec 21 12:09:01 2010 +1100 xfs: use wait queues directly for the log wait queues The log grant queues are one of the few places left using sv_t constructs for waiting. Given we are touching this code, we should convert them to plain wait queues. While there, convert all the other sv_t users in the log code as well. Seeing as this removes the last users of the sv_t type, remove the header file defining the wrapper and the fragments that still reference it. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit a69ed03c24d4a336c23b7116127713d5a8c5ac4d Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Dec 21 12:08:20 2010 +1100 xfs: combine grant heads into a single 64 bit integer Prepare for switching the grant heads to atomic variables by combining the two 32 bit values that make up the grant head into a single 64 bit variable. Provide wrapper functions to combine and split the grant heads appropriately for calculations and use them as necessary. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 663e496a720a3a9fc08ea70b29724e8906b34e43 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Dec 21 12:06:05 2010 +1100 xfs: rework log grant space calculations The log grant space calculations are repeated for both write and reserve grant heads. To make it simpler to convert the calculations toa different algorithm, factor them so both the gratn heads use the same calculation functions. Once this is done we can drop the wrappers that are used in only a couple of place to update both grant heads at once as they don't provide any particular value. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 3f336c6fa17c2b3d14b3dd1bd6e64e9cc97b6359 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Dec 21 12:02:52 2010 +1100 xfs: fact out common grant head/log tail verification code Factor repeated debug code out of grant head manipulation functions into a separate function. This removes ifdef DEBUG spagetti from the code and makes the code easier to follow. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 1054794198e39103cb986618c4c10ec2252b7089 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Dec 21 12:02:25 2010 +1100 xfs: convert log grant ticket queues to list heads The grant write and reserve queues use a roll-your-own double linked list, so convert it to a standard list_head structure and convert all the list traversals to use list_for_each_entry(). We can also get rid of the XLOG_TIC_IN_Q flag as we can use the list_empty() check to tell if the ticket is in a list or not. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 9552e7f2f3dd13a7580e488a7a3582332daad4f5 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Mon Dec 20 12:36:15 2010 +1100 xfs: use AIL bulk delete function to implement single delete We now have two copies of AIL delete operations that are mostly duplicate functionality. The single log item deletes can be implemented via the bulk updates by turning xfs_trans_ail_delete() into a simple wrapper. This removes all the duplicate delete functionality and associated helpers. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit e60599492990d1b52c70e9ed2f8e062fe11ca937 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Mon Dec 20 12:34:26 2010 +1100 xfs: use AIL bulk update function to implement single updates We now have two copies of AIL insert operations that are mostly duplicate functionality. The single log item updates can be implemented via the bulk updates by turning xfs_trans_ail_update() into a simple wrapper. This removes all the duplicate insert functionality and associated helpers. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 3013683253ad04f67d8cfaa25be708353686b90a Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Mon Dec 20 12:03:17 2010 +1100 xfs: remove all the inodes on a buffer from the AIL in bulk When inode buffer IO completes, usually all of the inodes are removed from the AIL. This involves processing them one at a time and taking the AIL lock once for every inode. When all CPUs are processing inode IO completions, this causes excessive amount sof contention on the AIL lock. Instead, change the way we process inode IO completion in the buffer IO done callback. Allow the inode IO done callback to walk the list of IO done callbacks and pull all the inodes off the buffer in one go and then process them as a batch. Once all the inodes for removal are collected, take the AIL lock once and do a bulk removal operation to minimise traffic on the AIL lock. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit c90821a26a8c90ad1e3116393b8a8260ab46bffb Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Fri Dec 3 17:00:52 2010 +1100 xfs: consume iodone callback items on buffers as they are processed To allow buffer iodone callbacks to consume multiple items off the callback list, first we need to convert the xfs_buf_do_callbacks() to consume items and always pull the next item from the head of the list. The means the item list walk is never dependent on knowing the next item on the list and hence allows callbacks to remove items from the list as well. This allows callbacks to do bulk operations by scanning the list for identical callbacks, consuming them all and then processing them in bulk, negating the need for multiple callbacks of that type. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit e677d0f9548e2245ee3c2977661ca8ca165af188 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Fri Dec 17 20:08:04 2010 +1100 xfs: reduce the number of AIL push wakeups The xfaild often tries to rest to wait for congestion to pass of for IO to complete, but is regularly woken in tail-pushing situations. In severe cases, the xfsaild is getting woken tens of thousands of times a second. Reduce the number needless wakeups by only waking the xfsaild if the new target is larger than the old one. Further make short sleeps uninterruptible as they occur when the xfsaild has decided it needs to back off to allow some IO to complete and being woken early is counter-productive. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 0e57f6a36f9be03e5abb755f524ee91c4aebe854 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Mon Dec 20 12:02:19 2010 +1100 xfs: bulk AIL insertion during transaction commit When inserting items into the AIL from the transaction committed callbacks, we take the AIL lock for every single item that is to be inserted. For a CIL checkpoint commit, this can be tens of thousands of individual inserts, yet almost all of the items will be inserted at the same point in the AIL because they have the same index. To reduce the overhead and contention on the AIL lock for such operations, introduce a "bulk insert" operation which allows a list of log items with the same LSN to be inserted in a single operation via a list splice. To do this, we need to pre-sort the log items being committed into a temporary list for insertion. The complexity is that not every log item will end up with the same LSN, and not every item is actually inserted into the AIL. Items that don't match the commit LSN will be inserted and unpinned as per the current one-at-a-time method (relatively rare), while items that are not to be inserted will be unpinned and freed immediately. Items that are to be inserted at the given commit lsn are placed in a temporary array and inserted into the AIL in bulk each time the array fills up. As a result of this, we trade off AIL hold time for a significant reduction in traffic. lock_stat output shows that the worst case hold time is unchanged, but contention from AIL inserts drops by an order of magnitude and the number of lock traversal decreases significantly. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit eb3efa1249b6413be930bdf13d10b6238028a440 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Fri Dec 3 16:42:57 2010 +1100 xfs: clean up xfs_ail_delete() xfs_ail_delete() has a needlessly complex interface. It returns the log item that was passed in for deletion (which the callers then assert is identical to the one passed in), and callers of xfs_ail_delete() still need to invalidate current traversal cursors. Make xfs_ail_delete() return void, move the cursor invalidation inside it, and clean up the callers just to use the log item pointer they passed in. While cleaning up, remove the messy and unnecessary "/* ARGUSED */" comments around all these functions. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit b199c8a4ba11879df87daad496ceee41fdc6aa82 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Mon Dec 20 11:59:49 2010 +1100 xfs: Pull EFI/EFD handling out from under the AIL lock EFI/EFD interactions are protected from races by the AIL lock. They are the only type of log items that require the the AIL lock to serialise internal state, so they need to be separated from the AIL lock before we can do bulk insert operations on the AIL. To acheive this, convert the counter of the number of extents in the EFI to an atomic so it can be safely manipulated by EFD processing without locks. Also, convert the EFI state flag manipulations to use atomic bit operations so no locks are needed to record state changes. Finally, use the state bits to determine when it is safe to free the EFI and clean up the code to do this neatly. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 9c5f8414efd5eeed9f498d4170337a3eb126341f Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Mon Dec 20 11:57:24 2010 +1100 xfs: fix EFI transaction cancellation. XFS_EFI_CANCELED has not been set in the code base since xfs_efi_cancel() was removed back in 2006 by commit 065d312e15902976d256ddaf396a7950ec0350a8 ("[XFS] Remove unused iop_abort log item operation), and even then xfs_efi_cancel() was never called. I haven't tracked it back further than that (beyond git history), but it indicates that the handling of EFIs in cancelled transactions has been broken for a long time. Basically, when we get an IOP_UNPIN(lip, 1); call from xfs_trans_uncommit() (i.e. remove == 1), if we don't free the log item descriptor we leak it. Fix the behviour to be correct and kill the XFS_EFI_CANCELED flag. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 821eb21d97a8b686649c08b7284d0b9f34d0e138 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Thu Dec 2 16:31:13 2010 +1100 xfs: connect up buffer reclaim priority hooks Now that the buffer reclaim infrastructure can handle different reclaim priorities for different types of buffers, reconnect the hooks in the XFS code that has been sitting dormant since it was ported to Linux. This should finally give use reclaim prioritisation that is on a par with the functionality that Irix provided XFS 15 years ago. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 430cbeb86fdcbbdabea7d4aa65307de8de425350 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Thu Dec 2 16:30:55 2010 +1100 xfs: add a lru to the XFS buffer cache Introduce a per-buftarg LRU for memory reclaim to operate on. This is the last piece we need to put in place so that we can fully control the buffer lifecycle. This allows XFS to be responsibile for maintaining the working set of buffers under memory pressure instead of relying on the VM reclaim not to take pages we need out from underneath us. The implementation introduces a b_lru_ref counter into the buffer. This is currently set to 1 whenever the buffer is referenced and so is used to determine if the buffer should be added to the LRU or not when freed. Effectively it allows lazy LRU initialisation of the buffer so we do not need to touch the LRU list and locks in xfs_buf_find(). Instead, when the buffer is being released and we drop the last reference to it, we check the b_lru_ref count and if it is none zero we re-add the buffer reference and add the inode to the LRU. The b_lru_ref counter is decremented by the shrinker, and whenever the shrinker comes across a buffer with a zero b_lru_ref counter, if released the LRU reference on the buffer. In the absence of a lookup race, this will result in the buffer being freed. This counting mechanism is used instead of a reference flag so that it is simple to re-introduce buffer-type specific reclaim reference counts to prioritise reclaim more effectively. We still have all those hooks in the XFS code, so this will provide the infrastructure to re-implement that functionality. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit ff57ab21995a8636cfc72efeebb09cc6034d756f Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Nov 30 17:27:57 2010 +1100 xfs: convert xfsbud shrinker to a per-buftarg shrinker. Before we introduce per-buftarg LRU lists, split the shrinker implementation into per-buftarg shrinker callbacks. At the moment we wake all the xfsbufds to run the delayed write queues to free the dirty buffers and make their pages available for reclaim. However, with an LRU, we want to be able to free clean, unused buffers as well, so we need to separate the xfsbufd from the shrinker callbacks. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Alex Elder <aelder@xxxxxxx> commit 1a427ab0c1b205d1bda8da0b77ea9d295ac23c57 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Thu Dec 16 17:08:41 2010 +1100 xfs: convert pag_ici_lock to a spin lock now that we are using RCU protection for the inode cache lookups, the lock is only needed on the modification side. Hence it is not necessary for the lock to be a rwlock as there are no read side holders anymore. Convert it to a spin lock to reflect it's exclusive nature. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Alex Elder <aelder@xxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 1a3e8f3da09c7082d25b512a0ffe569391e4c09a Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Fri Dec 17 17:29:43 2010 +1100 xfs: convert inode cache lookups to use RCU locking With delayed logging greatly increasing the sustained parallelism of inode operations, the inode cache locking is showing significant read vs write contention when inode reclaim runs at the same time as lookups. There is also a lot more write lock acquistions than there are read locks (4:1 ratio) so the read locking is not really buying us much in the way of parallelism. To avoid the read vs write contention, change the cache to use RCU locking on the read side. To avoid needing to RCU free every single inode, use the built in slab RCU freeing mechanism. This requires us to be able to detect lookups of freed inodes, so enÑ?ure that ever freed inode has an inode number of zero and the XFS_IRECLAIM flag set. We already check the XFS_IRECLAIM flag in cache hit lookup path, but also add a check for a zero inode number as well. We canthen convert all the read locking lockups to use RCU read side locking and hence remove all read side locking. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Alex Elder <aelder@xxxxxxx> commit d95b7aaf9ab6738bef1ebcc52ab66563085e44ac Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Thu Dec 16 16:41:39 2010 +1100 xfs: rcu free inodes Introduce RCU freeing of XFS inodes so that we can convert lookup traversals to use rcu_read_lock() protection. This patch only introduces the RCU freeing to minimise the potential conflicts with mainline if this is merged into mainline via a VFS patchset. It abuses the i_dentry list for the RCU callback structure because the VFS patches make this a union so it is safe to use like this and simplifies and merge issues. This patch uses basic RCU freeing rather than SLAB_DESTROY_BY_RCU. The later lookup patches need the same "found free inode" protection regardless of the RCU freeing method used, so once again the RCU freeing method can be dealt with apprpriately at merge time without affecting any other code. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> commit 6e857567dbbfe14dd6cc3f7414671b047b1ff5c7 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Thu Dec 23 12:02:31 2010 +1100 xfs: don't truncate prealloc from frequently accessed inodes A long standing problem for streaming writeÑ? through the NFS server has been that the NFS server opens and closes file descriptors on an inode for every write. The result of this behaviour is that the ->release() function is called on every close and that results in XFS truncating speculative preallocation beyond the EOF. This has an adverse effect on file layout when multiple files are being written at the same time - they interleave their extents and can result in severe fragmentation. To avoid this problem, keep track of ->release calls made on a dirty inode. For most cases, an inode is only going to be opened once for writing and then closed again during it's lifetime in cache. Hence if there are multiple ->release calls when the inode is dirty, there is a good chance that the inode is being accessed by the NFS server. Hence set a flag the first time ->release is called while there are delalloc blocks still outstanding on the inode. If this flag is set when ->release is next called, then do no truncate away the speculative preallocation - leave it there so that subsequent writes do not need to reallocate the delalloc space. This will prevent interleaving of extents of different inodes written concurrently to the same AG. If we get this wrong, it is not a big deal as we truncate speculative allocation beyond EOF anyway in xfs_inactive() when the inode is thrown out of the cache. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit 055388a3188f56676c21e92962fc366ac8b5cb72 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Jan 4 11:35:03 2011 +1100 xfs: dynamic speculative EOF preallocation Currently the size of the speculative preallocation during delayed allocation is fixed by either the allocsize mount option of a default size. We are seeing a lot of cases where we need to recommend using the allocsize mount option to prevent fragmentation when buffered writes land in the same AG. Rather than using a fixed preallocation size by default (up to 64k), make it dynamic by basing it on the current inode size. That way the EOF preallocation will increase as the file size increases. Hence for streaming writes we are much more likely to get large preallocations exactly when we need it to reduce fragementation. For default settings, the size of the initial extents is determined by the number of parallel writers and the amount of memory in the machine. For 4GB RAM and 4 concurrent 32GB file writes: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..1048575]: 1048672..2097247 0 (1048672..2097247) 1048576 1: [1048576..2097151]: 5242976..6291551 0 (5242976..6291551) 1048576 2: [2097152..4194303]: 12583008..14680159 0 (12583008..14680159) 2097152 3: [4194304..8388607]: 25165920..29360223 0 (25165920..29360223) 4194304 4: [8388608..16777215]: 58720352..67108959 0 (58720352..67108959) 8388608 5: [16777216..33554423]: 117440584..134217791 0 (117440584..134217791) 16777208 6: [33554424..50331511]: 184549056..201326143 0 (184549056..201326143) 16777088 7: [50331512..67108599]: 251657408..268434495 0 (251657408..268434495) 16777088 and for 16 concurrent 16GB file writes: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..262143]: 2490472..2752615 0 (2490472..2752615) 262144 1: [262144..524287]: 6291560..6553703 0 (6291560..6553703) 262144 2: [524288..1048575]: 13631592..14155879 0 (13631592..14155879) 524288 3: [1048576..2097151]: 30408808..31457383 0 (30408808..31457383) 1048576 4: [2097152..4194303]: 52428904..54526055 0 (52428904..54526055) 2097152 5: [4194304..8388607]: 104857704..109052007 0 (104857704..109052007) 4194304 6: [8388608..16777215]: 209715304..218103911 0 (209715304..218103911) 8388608 7: [16777216..33554423]: 452984848..469762055 0 (452984848..469762055) 16777208 Because it is hard to take back specualtive preallocation, cases where there are large slow growing log files on a nearly full filesystem may cause premature ENOSPC. Hence as the filesystem nears full, the maximum dynamic prealloc size Ñ?s reduced according to this table (based on 4k block size): freespace max prealloc size >5% full extent (8GB) 4-5% 2GB (8GB >> 2) 3-4% 1GB (8GB >> 3) 2-3% 512MB (8GB >> 4) 1-2% 256MB (8GB >> 5) <1% 128MB (8GB >> 6) This should reduce the amount of space held in speculative preallocation for such cases. The allocsize mount option turns off the dynamic behaviour and fixes the prealloc size to whatever the mount option specifies. i.e. the behaviour is unchanged. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> commit 622d81494fa32343a4b97b607619656c7a4a6d1a Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Thu Dec 23 11:57:37 2010 +1100 xfs: use KM_NOFS for allocations during attribute list operations When listing attributes, we are doiing memory allocations under the inode ilock using only KM_SLEEP. This allows memory allocation to recurse back into the filesystem and do writeback, which may the ilock we already hold on the current inode. THis will deadlock. Hence use KM_NOFS for such allocations outside of transaction context to ensure that reclaim recursion does not occur. Reported-by: Nick Piggin <npiggin@xxxxxxxxx> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> commit dcfcf20512cb517ac18b9433b676183fa1257911 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Thu Dec 23 11:57:13 2010 +1100 xfs: provide a inode iolock lockdep class The XFS iolock needs to be re-initialised to a new lock class before it enters reclaim to prevent lockdep false positives. Unfortunately, this is not sufficient protection as inodes in the XFS_IRECLAIMABLE state can be recycled and not re-initialised before being reused. We need to re-initialise the lock state when transfering out of XFS_IRECLAIMABLE state to XFS_INEW, but we need to keep the same class as if the inode was just allocated. Hence we need a specific lockdep class variable for the iolock so that both initialisations use the same class. While there, add a specific class for inodes in the reclaim state so that it is easy to tell from lockdep reports what state the inode was in that generated the report. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> ----------------------------------------------------------------------- Summary of changes: fs/xfs/linux-2.6/sv.h | 59 ---- fs/xfs/linux-2.6/xfs_buf.c | 235 +++++++++----- fs/xfs/linux-2.6/xfs_buf.h | 22 +- fs/xfs/linux-2.6/xfs_linux.h | 1 - fs/xfs/linux-2.6/xfs_super.c | 22 +- fs/xfs/linux-2.6/xfs_sync.c | 92 ++++-- fs/xfs/linux-2.6/xfs_trace.h | 30 +- fs/xfs/quota/xfs_dquot.c | 1 - fs/xfs/xfs_ag.h | 2 +- fs/xfs/xfs_attr_leaf.c | 4 +- fs/xfs/xfs_btree.c | 9 +- fs/xfs/xfs_buf_item.c | 32 ++- fs/xfs/xfs_extfree_item.c | 97 +++--- fs/xfs/xfs_extfree_item.h | 11 +- fs/xfs/xfs_fsops.c | 1 + fs/xfs/xfs_iget.c | 90 ++++- fs/xfs/xfs_inode.c | 54 +++- fs/xfs/xfs_inode.h | 15 +- fs/xfs/xfs_inode_item.c | 92 +++++- fs/xfs/xfs_iomap.c | 84 +++++- fs/xfs/xfs_log.c | 739 +++++++++++++++++++----------------------- fs/xfs/xfs_log_cil.c | 17 +- fs/xfs/xfs_log_priv.h | 121 ++++++-- fs/xfs/xfs_log_recover.c | 35 +-- fs/xfs/xfs_mount.c | 23 ++- fs/xfs/xfs_mount.h | 14 + fs/xfs/xfs_trans.c | 79 +++++- fs/xfs/xfs_trans.h | 2 +- fs/xfs/xfs_trans_ail.c | 232 +++++++------- fs/xfs/xfs_trans_extfree.c | 8 +- fs/xfs/xfs_trans_priv.h | 35 ++- fs/xfs/xfs_vnodeops.c | 61 +++-- 32 files changed, 1403 insertions(+), 916 deletions(-) delete mode 100644 fs/xfs/linux-2.6/sv.h hooks/post-receive -- XFS development tree
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs