This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "XFS development tree". The branch, master-pre-3.1-rc9-rebase has been created at 3dda1f2ef7f90231dad44a9334ea58285699fea5 (commit) - Log ----------------------------------------------------------------- commit 3dda1f2ef7f90231dad44a9334ea58285699fea5 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Oct 10 16:52:53 2011 +0000 xfs: do not flush data workqueues in xfs_flush_buftarg When we call xfs_flush_buftarg (generally from sync or umount) it already is too late to flush the data workqueues, as I/O completion is signalled for them and we are thus already done with the data we would flush here. There are places where flushing them might be useful, but the current sync interface doesn't give us that opportunity. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 380f1bfe86522c48e0efde956caf82e99f3043ff Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Oct 10 16:52:52 2011 +0000 xfs: remove XFS_bflush Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 93a091c78c261bdac5873067a08e2fb8d55d7dd0 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Oct 10 16:52:51 2011 +0000 xfs: remove xfs_buf_target_name The calling convention that returns a pointer to a static buffer is fairly nasty, so just opencode it in the only caller that is left. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 73e215d722e349558f94bf6eae073516e425ddc1 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Oct 10 16:52:50 2011 +0000 xfs: use xfs_ioerror_alert in xfs_buf_iodone_callbacks Use xfs_ioerror_alert instead of opencoding a very similar error message. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 7638214c9061016036ad3ee53e6f529a75430aa0 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Oct 10 16:52:49 2011 +0000 xfs: clean up xfs_ioerror_alert Instead of passing the block number and mount structure explicitly get them off the bp and fix make the argument order more natural. Also move it to xfs_buf.c and stop printing the device name given that we already get the fs name as part of xfs_alert, and we know what device is operates on because of the caller that gets printed, finally rename it to xfs_buf_ioerror_alert and pass __func__ as argument where it makes sense. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit d54b997ac3e8eb4da692031a8becae81ce23f026 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Oct 10 16:52:48 2011 +0000 xfs: clean up buffer allocation Change _xfs_buf_initialize to allocate the buffer directly and rename it to xfs_buf_alloc now that is the only buffer allocation routine. Also remove the xfs_buf_deallocate wrapper around the kmem_zone_free calls for buffers. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 5ea31d2bf8827b8bfb9a2130c50916e7b687dabe Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Oct 10 16:52:47 2011 +0000 xfs: remove buffers from the delwri list in xfs_buf_stale For each call to xfs_buf_stale we call xfs_buf_delwri_dequeue either directly before or after it, or are guaranteed by the surrounding conditionals that we are never called on delwri buffers. Simply this situation by moving the call to xfs_buf_delwri_dequeue into xfs_buf_stale. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 49cc0fe89077077b9ac94f1248fb5c37841a0435 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Oct 10 16:52:46 2011 +0000 xfs: remove XFS_BUF_STALE and XFS_BUF_SUPER_STALE Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit eca13e85108acea424e24d7d8fa6d02d84e5bf23 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Oct 10 16:52:45 2011 +0000 xfs: remove XFS_BUF_SET_VTYPE and XFS_BUF_SET_VTYPE_REF Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 0ef5ca7eb86747229946ab9d0a588b894ff0bd1d Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Oct 10 16:52:44 2011 +0000 xfs: remove XFS_BUF_FINISH_IOWAIT Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit e484044847340fce73a6dce4ac53a50ec531aa88 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Oct 10 16:52:43 2011 +0000 xfs: remove xfs_get_buftarg_list The code is unused and under a config option that doesn't exist, remove it. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit ab03e6ad834d81f95f24f66231bfab6b9a8ef82c Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Wed Sep 14 14:08:26 2011 +0000 xfs: fix buffer flushing during unmount The code to flush buffers in the umount code is a bit iffy: we first flush all delwri buffers out, but then might be able to queue up a new one when logging the sb counts. On a normal shutdown that one would get flushed out when doing the synchronous superblock write in xfs_unmountfs_writesb, but we skip that one if the filesystem has been shut down. Fix this by moving the delwri list flushing until just before unmounting the log, and while we're at it also remove the superflous delwri list and buffer lru flusing for the rt and log device that can never have cached or delwri buffers. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reported-by: Amit Sahrawat <amit.sahrawat83@xxxxxxxxx> Tested-by: Amit Sahrawat <amit.sahrawat83@xxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 6f76e76852b85216d518d6163ff1e84bd73a624d Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Oct 2 14:25:16 2011 +0000 xfs: optimize fsync on directories Directories are only updated transactionally, which means fsync only needs to flush the log the inode is currently dirty, but not bother with checking for dirty data, non-transactional updates, and most importanly doesn't have to flush disk caches except as part of a transaction commit. While the first two optimizations can't easily be measured, the latter actually makes a difference when doing lots of fsync that do not actually have to commit the inode, e.g. because an earlier fsync already pushed the log far enough. The new xfs_dir_fsync is identical to xfs_nfs_commit_metadata except for the prototype, but I'm not sure creating a common helper for the two is worth it given how simple the functions are. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit edc3615f7fd97dc78ea2cd872f55c4b382c46bb5 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Fri Sep 30 04:45:03 2011 +0000 xfs: reduce the number of log forces from tail pushing The AIL push code will issue a log force on ever single push loop that it exits and has encountered pinned items. It doesn't rescan these pinned items until it revisits the AIL from the start. Hence we only need to force the log once per walk from the start of the AIL to the target LSN. This results in numbers like this: xs_push_ail_flush..... 1456 xs_log_force......... 1485 For an 8-way 50M inode create workload - almost all the log forces are coming from the AIL pushing code. Reduce the number of log forces by only forcing the log if the previous walk found pinned buffers. This reduces the numbers to: xs_push_ail_flush..... 665 xs_log_force......... 682 For the same test. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit fcf219b77f2cb05bc22fc3d6cf490629e40ccc39 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Fri Sep 30 04:45:02 2011 +0000 xfs: Don't allocate new buffers on every call to _xfs_buf_find Stats show that for an 8-way unlink @ ~80,000 unlinks/s we are doing ~1 million cache hit lookups to ~3000 buffer creates. That's almost 3 orders of magnitude more cahce hits than misses, so optimising for cache hits is quite important. In the cache hit case, we do not need to allocate a new buffer in case of a cache miss, so we are effectively hitting the allocator for no good reason for vast the majority of calls to _xfs_buf_find. 8-way create workloads are showing similar cache hit/miss ratios. The result is profiles that look like this: samples pcnt function DSO _______ _____ _______________________________ _________________ 1036.00 10.0% _xfs_buf_find [kernel.kallsyms] 582.00 5.6% kmem_cache_alloc [kernel.kallsyms] 519.00 5.0% __memcpy [kernel.kallsyms] 468.00 4.5% __ticket_spin_lock [kernel.kallsyms] 388.00 3.7% kmem_cache_free [kernel.kallsyms] 331.00 3.2% xfs_log_commit_cil [kernel.kallsyms] Further, there is a fair bit of work involved in initialising a new buffer once a cache miss has occurred and we currently do that under the rbtree spinlock. That increases spinlock hold time on what are heavily used trees. To fix this, remove the initialisation of the buffer from _xfs_buf_find() and only allocate the new buffer once we've had a cache miss. Initialise the buffer immediately after allocating it in xfs_buf_get, too, so that is it ready for insert if we get another cache miss after allocation. This minimises lock hold time and avoids unnecessary allocator churn. The resulting profiles look like: samples pcnt function DSO _______ _____ ___________________________ _________________ 8111.00 9.1% _xfs_buf_find [kernel.kallsyms] 4380.00 4.9% __memcpy [kernel.kallsyms] 4341.00 4.8% __ticket_spin_lock [kernel.kallsyms] 3401.00 3.8% kmem_cache_alloc [kernel.kallsyms] 2856.00 3.2% xfs_log_commit_cil [kernel.kallsyms] 2625.00 2.9% __kmalloc [kernel.kallsyms] 2380.00 2.7% kfree [kernel.kallsyms] 2016.00 2.3% kmem_cache_free [kernel.kallsyms] Showing a significant reduction in time spent doing allocation and freeing from slabs (kmem_cache_alloc and kmem_cache_free). Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 86671dafd1b90d73c9f8453ea8ec35fbfce0418b Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Sep 19 15:00:54 2011 +0000 xfs: simplify xfs_trans_ijoin* again There is no reason to keep a reference to the inode even if we unlock it during transaction commit because we never drop a reference between the ijoin and commit. Also use this fact to merge xfs_trans_ijoin_ref back into xfs_trans_ijoin - the third argument decides if an unlock is needed now. I'm actually starting to wonder if allowing inodes to be unlocked at transaction commit really is worth the effort. The only real benefit is that they can be unlocked earlier when commiting a synchronous transactions, but that could be solved by doing the log force manually after the unlock, too. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 91409f1253ecdc9368bddd6674a71141bbb188d8 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:47:51 2011 +0000 xfs: unlock the inode before log force in xfs_change_file_space Let the transaction commit unlock the inode before it potentially causes a synchronous log force. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 0b618fd2d100e82cef4e5f8ac56adabac9bcaabd Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:47:50 2011 +0000 xfs: unlock the inode before log force in xfs_fs_nfs_commit_metadata Only read the LSN we need to push to with the ilock held, and then release it before we do the log force to improve concurrency. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit b35e4f2d235e0a2aa9fde7899d27552b5e59545e Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Sep 19 14:55:51 2011 +0000 xfs: unlock the inode before log force in xfs_fsync Only read the LSN we need to push to with the ilock held, and then release it before we do the log force to improve concurrency. This also removes the only direct caller of _xfs_trans_commit, thus allowing it to be merged into the plain xfs_trans_commit again. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 0093b1064a18f2e3b6408dda542769076fc7b233 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Mon Sep 26 09:14:34 2011 +0000 xfs: XFS_TRANS_SWAPEXT is not a valid flag for xfs_trans_commit XFS_TRANS_SWAPEXT is a transaction type, not a flag for xfs_trans_commit, so don't pass it in xfs_swap_extents. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit e49f565973deb3408c0e1dd83d1f8dac5bcaa374 Author: Lukas Czerner <lczerner@xxxxxxxxxx> Date: Wed Sep 21 09:42:30 2011 +0000 xfs: fix possible overflow in xfs_ioc_trim() In xfs_ioc_trim it is possible that computing the last allocation group to discard might overflow for big start & len values, because the result might be bigger then xfs_agnumber_t which is 32 bit long. Fix this by not allowing the start and end block of the range to be beyond the end of the file system. Note that if the start is beyond the end of the file system we have to return -EINVAL, but in the "end" case we have to truncate it to the fs size. Also introduce "end" variable, rather than using start+len which which might be more confusing to get right as this bug shows. Signed-off-by: Lukas Czerner <lczerner@xxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit ef49624be283c67c40dcdac94ca125e1ddda8ff6 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:41:07 2011 +0000 xfs: cleanup xfs_bmap.h Convert all function prototypes to the short form used elsewhere, and remove duplicates of comments already placed at the function body. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit b32ccb3fa61f36ea07b370baf76f7020488d2364 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:41:06 2011 +0000 xfs: dont ignore error code from xfs_bmbt_update Fix a case in xfs_bmap_add_extent_unwritten_real where we aren't passing the returned error on. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit e1e360817f91dc68a73c755a15ed9d84a21be46c Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:41:05 2011 +0000 xfs: pass bmalloca to xfs_bmap_add_extent_hole_real All the parameters passed to xfs_bmap_add_extent_hole_real() are in the xfs_bmalloca structure now. Just pass the bmalloca parameter to the function instead of 8 separate parameters. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit c763ccc7d1ad6e8751d6d6d0fdd814ca2169dd67 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:41:04 2011 +0000 xfs: pass bmalloca to xfs_bmap_add_extent_delay_real All the parameters passed to xfs_bmap_add_extent_delay_real() are in the xfs_bmalloca structure now. Just pass the bmalloca parameter to the function instead of 8 separate parameters. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit d8e079d401e675d73316b138f840e16ae37fa825 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:41:02 2011 +0000 xfs: move logflags into bmalloca Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 459a52d68d721717b084c1a1957721072423cff9 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Sun Sep 18 20:41:01 2011 +0000 xfs: move lastx and nallocs into bmalloca Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 2ac3d5909c0f4900cded43bfee65847783a976de Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Sun Sep 18 20:41:00 2011 +0000 xfs: move btree cursor into bmalloca Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 5e54c9d694b56de4f827f69ef57f444e6c832c42 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Sun Sep 18 20:40:59 2011 +0000 xfs: do not keep local copies of allocation ranges in xfs_bmapi_allocate Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 9eb095d2b1089a4105135241037191fdc6c1050e Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Sun Sep 18 20:40:58 2011 +0000 xfs: rename allocation range fields in struct xfs_bmalloca Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 1631b20d18b7fccb8113b9a8e6a7d4a57207a6fa Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Sun Sep 18 20:40:57 2011 +0000 xfs: move firstblock and bmap freelist cursor into bmalloca structure Rather than passing the firstblock and freelist structure around, embed it into the bmalloca structure and remove it from the function parameters. This also enables the minleft parameter to be set only once in xfs_bmapi_write(), and the freelist cursor directly queried in xfs_bmapi_allocate to clear it when the lowspace algorithm is activated. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 79c893656d59c13e2542ddfe7de1a22c8f15309c Author: Dave Chinner <david@xxxxxxxxxxxxx> Date: Sun Sep 18 20:40:56 2011 +0000 xfs: move extent records into bmalloca structure Rather that putting extent records on the stack and then pointing to them in the bmalloca structure which is in the same stack frame, put the extent records directly in the bmalloca structure. This reduces the number of args that need to be passed around. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 73a8fde4f33a630c6d401acf88b7172e5525627c Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Sun Sep 18 20:40:55 2011 +0000 xfs: pass bmalloca structure to xfs_bmap_isaeof All the variables xfs_bmap_isaeof() is passed are contained within the xfs_bmalloca structure. Pass that instead. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 7d3d6c30e25708d9ba78e8e1f36316ebeafce793 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:40:54 2011 +0000 xfs: remove xfs_bmap_add_extent There is no real need to the xfs_bmap_add_extent, as the callers know what kind of extents they need to it. Removing it means duplicating the extents to btree conversion logic in three places, but overall it's still much simpler code and quite a bit less code. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit cb300d65eb4d4e2e96fbc9c08cc9d858464232a9 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:40:53 2011 +0000 xfs: introduce xfs_bmap_last_extent Add a common helper for finding the last extent in a file. Largely based on a patch from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 5bfa7e311949b022d91f459200d56bb7a3dc7f3a Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Sun Sep 18 20:40:52 2011 +0000 xfs: rename xfs_bmapi to xfs_bmapi_write Now that all the read-only users of xfs_bmapi have been converted to use xfs_bmapi_read(), we can remove all the read-only handling cases from xfs_bmapi(). Once this is done, rename xfs_bmapi to xfs_bmapi_write to reflect the fact it is for allocation only. This enables us to kill the XFS_BMAPI_WRITE flag as well. Also clean up xfs_bmapi_write to the style used in the newly added xfs_bmapi_read/delay functions. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 0d1c5f2655cacc4d044871c940237168ca618e61 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Sun Sep 18 20:40:51 2011 +0000 xfs: factor unwritten extent map manipulations out of xfs_bmapi To further improve the readability of xfs_bmapi(), factor the unwritten extent conversion out into a separate function. This removes large block of logic from the xfs_bmapi() code loop and makes it easier to see the operational logic flow for xfs_bmapi(). Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit c59a0b0bdac51b7f96f805f8c1eb8660a1a52b1d Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Sun Sep 18 20:40:50 2011 +0000 xfs: factor extent allocation out of xfs_bmapi To further improve the readability of xfs_bmapi(), factor the extent allocation out into a separate function. This removes a large block of logic from the xfs_bmapi() code loop and makes it easier to see the operational logic flow for xfs_bmapi(). Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 32855a9804b4a967e2230b82fcf6caba82c5525b Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:40:49 2011 +0000 xfs: do not use xfs_bmap_add_extent for adding delalloc extents We can just call xfs_bmap_add_extent_hole_delay directly to add a delayed allocated regions to the extent tree, instead of going through all the complexities of xfs_bmap_add_extent that aren't needed for this simple case. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 00a1896173a9acff320c70cb4e40592d0344e428 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:40:48 2011 +0000 xfs: introduce xfs_bmapi_delay() Delalloc reservations are much simpler than allocations, so give them a separate bmapi-level interface. Using the previously added xfs_bmapi_reserve_delalloc we get a function that is only minimally more complicated than xfs_bmapi_read, which is far from the complexity in xfs_bmapi. Also remove the XFS_BMAPI_DELAY code after switching over the only user to xfs_bmapi_delay. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 5003bdf58e0649cfca322eb554b6ab9dba201b30 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:40:47 2011 +0000 xfs: factor delalloc reservations out of xfs_bmapi Move the reservation of delayed allocations, and addition of delalloc regions to the extent trees into a new helper function. For now this adds some twisted goto logic to xfs_bmapi, but that will be cleaned up in the following patches. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 5bc34846735f610ce06de0789c9287756c857160 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Sun Sep 18 20:40:46 2011 +0000 xfs: remove xfs_bmapi_single() Now we have xfs_bmapi_read, there is no need for xfs_bmapi_single(). Change the remaining caller over and kill the function. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit bd8c50cefff88bfd0700fad2be05045db4c61c1c Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Sun Sep 18 20:40:45 2011 +0000 xfs: introduce xfs_bmapi_read() xfs_bmapi() currently handles both extent map reading and allocation. As a result, the code is littered with "if (wr)" branches to conditionally do allocation operations if required. This makes the code much harder to follow and causes significant indent issues with the code. Given that read mapping is much simpler than allocation, we can split out read mapping from xfs_bmapi() and reuse the logic that we have already factored out do do all the hard work of handling the extent map manipulations. The results in a much simpler function for the common extent read operations, and will allow the allocation code to be simplified in another commit. Once xfs_bmapi_read() is implemented, convert all the callers of xfs_bmapi() that are only reading extents to use the new function. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 2f7effbf70fe04560a1dc5f4fefc1bfa01595d74 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Sun Sep 18 20:40:44 2011 +0000 xfs: factor extent map manipulations out of xfs_bmapi To further improve the readability of xfs_bmapi(), factor the pure extent map manipulations out into separate functions. This removes large blocks of logic from the xfs_bmapi() code loop and makes it easier to see the operational logic flow for xfs_bmapi(). Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 86051fad1607f9c4c50c9f55853186df0c9ef992 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:40:43 2011 +0000 xfs: remove the nextents variable in xfs_bmapi Instead of using a local variable that needs to updated when we modify the extent map just check ifp->if_bytes directly where we use it. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 1989208fa60ec04f80cceb8fc528c6a541803210 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:40:42 2011 +0000 xfs: remove impossible to read code in xfs_bmap_add_extent_delay_real We already have the worst case blocks reserved, so xfs_icsb_modify_counters won't fail in xfs_bmap_add_extent_delay_real. In fact we've had an assert to catch this case since day and it never triggered. So remove the code to try smaller reservations, and just return the error for that case in addition to keeping the assert. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 1342c23e864e3de71c97dcd73fa9691606febee2 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sun Sep 18 20:40:41 2011 +0000 xfs: remove the first extent special case in xfs_bmap_add_extent Both xfs_bmap_add_extent_hole_delay and xfs_bmap_add_extent_hole_real already contain code to handle the case where there is no extent to merge with, which is effectively the same as the code duplicated here. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 3c9feb308b1802a8538860bba8926f3dfe255612 Author: Mitsuo Hayasaka <mitsuo.hayasaka.hu@xxxxxxxxxxx> Date: Sat Sep 17 13:38:38 2011 +0000 xfs: Return -EIO when xfs_vn_getattr() failed An attribute of inode can be fetched via xfs_vn_getattr() in XFS. Currently it returns EIO, not negative value, when it failed. As a result, the system call returns not negative value even though an error occured. The stat(2), ls and mv commands cannot handle this error and do not work correctly. This patch fixes this bug, and returns -EIO, not EIO when an error is detected in xfs_vn_getattr(). Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@xxxxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 3c6ea024e76ec47bf702e31b558a5b48c3daff2e Author: Chandra Seetharaman <sekharan@xxxxxxxxxx> Date: Thu Sep 8 20:18:50 2011 +0000 xfs: Fix the incorrect comment in the header of _xfs_buf_find Fix the incorrect comment in the header of the function _xfs_buf_find(). Signed-off-by: Chandra Seetharaman <sekharan@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 649305012961fce689c0533082a5e841f36f82cb Author: Chandra Seetharaman <sekharan@xxxxxxxxxx> Date: Tue Sep 20 13:56:55 2011 +0000 xfs: Check the return value of xfs_trans_get_buf() Check the return value of xfs_trans_get_buf() and fail appropriately. Signed-off-by: Chandra Seetharaman <sekharan@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 1d53227f803773a50cb2c25756c6c68a3e120775 Author: Chandra Seetharaman <sekharan@xxxxxxxxxx> Date: Wed Sep 7 19:37:54 2011 +0000 xfs: Check the return value of xfs_buf_get() Check the return value of xfs_buf_get() and fail appropriately. Signed-off-by: Chandra Seetharaman <sekharan@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 380f4f32878f67ce53c407b16c1deb6dff156731 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Wed Aug 24 05:59:25 2011 +0000 xfs: improve ioend error handling Return unwritten extent conversion errors to aio_complete. Skip both unwritten extent conversion and size updates if we had an I/O error or the filesystem has been shut down. Return -EIO to the aio/buffer completion handlers in case of a forced shutdown. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 05d3202e28eb437a185d6c56fbf8fa8e1f638e6e Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sat Aug 27 14:42:53 2011 +0000 xfs: avoid direct I/O write vs buffered I/O race Currently a buffered reader or writer can add pages to the pagecache while we are waiting for the iolock in xfs_file_dio_aio_write. Prevent this by re-checking mapping->nrpages after we got the iolock, and if nessecary upgrade the lock to exclusive mode. To simplify this a bit only take the ilock inside of xfs_file_aio_write_checks. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit b73d8f7667aa82ece204a9a2e5467b54a8ecd059 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sat Aug 27 14:45:11 2011 +0000 xfs: avoid synchronous transactions when deleting attr blocks Currently xfs_attr_inactive causes a synchronous transactions if we are removing a file that has any extents allocated to the attribute fork, and thus makes XFS extremely slow at removing files with out of line extended attributes. The code looks a like a relict from the days before the busy extent list, but with the busy extent list we avoid reusing data and attr extents that have been freed but not commited yet, so this code is just as superflous as the synchronous transactions for data blocks. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reported-by: Bernd Schubert <bernd.schubert@xxxxxxxxxxxxxxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 1baaffdc386f83b85c1e91e7b25f0db02848ca59 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Tue Aug 23 08:28:13 2011 +0000 xfs: remove i_iocount We now have an i_dio_count filed and surrounding infrastructure to wait for direct I/O completion instead of i_icount, and we have never needed to iocount waits for buffered I/O given that we only set the page uptodate after finishing all required work. Thus remove i_iocount, and replace the actually needed waits with calls to inode_dio_wait. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 1e60cfa02430a9d1f0a051ca4bf521e71f562a33 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Tue Aug 23 08:28:12 2011 +0000 xfs: wait for I/O completion when writing out pages in xfs_setattr_size The current code relies on the xfs_ioend_wait call later on to make sure all I/O actually has completed. The xfs_ioend_wait call will go away soon, so prepare for that by using the waiting filemap function. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 6790d7b01fc5fb77952c1a96a12d594aab50cebc Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Tue Aug 23 08:28:11 2011 +0000 xfs: reduce ioend latency There is no reason to queue up ioends for processing in user context unless we actually need it. Just complete ioends that do not convert unwritten extents or need a size update from the end_io context. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit a826361aa4afca6ee735e73f4f0c63c4c8439c51 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Tue Aug 23 08:28:10 2011 +0000 xfs: defer AIO/DIO completions We really shouldn't complete AIO or DIO requests until we have finished the unwritten extent conversion and size update. This means fsync never has to pick up any ioends as all work has been completed when signalling I/O completion. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 4e95434b5bb70f62fcdf11b98ef4aa5ff0ee1a24 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Tue Aug 23 08:28:09 2011 +0000 xfs: remove dead ENODEV handling in xfs_destroy_ioend No driver returns ENODEV from it bio completion handler, not has this ever been documented. Remove the dead code dealing with it. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 184e80f2a2075263db3eec6d7cee8fdb9f2d118a Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Tue Aug 23 08:28:08 2011 +0000 xfs: use the "delwri" terminology consistently And also remove the strange local lock and delwri list pointers in a few functions. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit b57a4ed207854e6d722abf1ce26c2bd9113fd57b Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Tue Aug 23 08:28:07 2011 +0000 xfs: let xfs_bwrite callers handle the xfs_buf_relse Remove the xfs_buf_relse from xfs_bwrite and let the caller handle it to mirror the delwri and read paths. Also remove the mount pointer passed to xfs_bwrite, which is superflous now that we have a mount pointer in the buftarg. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit ef216bdc80eb74c5d30cff8dc77df61ff071edc3 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Tue Aug 23 08:28:06 2011 +0000 xfs: call xfs_buf_delwri_queue directly Unify the ways we add buffers to the delwri queue by always calling xfs_buf_delwri_queue directly. The xfs_bdwrite functions is removed and opencoded in its callers, and the two places setting XBF_DELWRI while a buffer is locked and expecting xfs_buf_unlock to pick it up are converted to call xfs_buf_delwri_queue directly, too. Also replace the XFS_BUF_UNDELAYWRITE macro with direct calls to xfs_buf_delwri_dequeue to make the explicit queuing/dequeuing more obvious. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 9b28cfc60532bbd20e157f17c13dcd6ace27867b Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Tue Aug 23 08:28:05 2011 +0000 xfs: move more delwri setup into xfs_buf_delwri_queue Do not transfer a reference held by the caller to the buffer on the list, or decrement it in xfs_buf_delwri_queue, but instead grab a new reference if needed, and let the caller drop its own reference. Also move setting of the XBF_DELWRI and XBF_ASYNC flags into xfs_buf_delwri_queue, and only do it if needed. Note that for now xfs_buf_unlock already has XBF_DELWRI, but that will change in the following patches. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 3724504d4abfcfd7d9e5892a9e5b1bf2d7c4a522 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Tue Aug 23 08:28:04 2011 +0000 xfs: remove the unlock argument to xfs_buf_delwri_queue We can just unlock the buffer in the caller, and the decrement of b_hold would also be needed in the !unlock, we just never hit that case currently given that the caller handles that case. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 4f9d18351358c0ad814f7507c75dcebce5cd9f54 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Tue Aug 23 08:28:03 2011 +0000 xfs: remove delwri buffer handling from xfs_buf_iorequest We cannot ever reach xfs_buf_iorequest for a buffer with XBF_DELWRI set, given that all write handlers make sure that the buffer is remove from the delwri queue before, and we never do reads with the XBF_DELWRI flag set (which the code would not handle correctly anyway). Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 297db93bb74cf687510313eb235a7aec14d67e97 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sat Aug 27 05:57:55 2011 +0000 xfs: fix ->write_inode return values Currently we always redirty an inode that was attempted to be written out synchronously but has been cleaned by an AIL pushed internall, which is rather bogus. Fix that by doing the i_update_core check early on and return 0 for it. Also include async calls for it, as doing any work for those is just as pointless. While we're at it also fix the sign for the EIO return in case of a filesystem shutdown, and fix the completely non-sensical locking around xfs_log_inode. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit da6742a5a4cc844a9982fdd936ddb537c0747856 Author: Christoph Hellwig <hch@xxxxxxxxxxxxx> Date: Sat Aug 27 05:57:44 2011 +0000 xfs: fix xfs_mark_inode_dirty during umount During umount we do not add a dirty inode to the lru and wait for it to become clean first, but force writeback of data and metadata with I_WILL_FREE set. Currently there is no way for XFS to detect that the inode has been redirtied for metadata operations, as we skip the mark_inode_dirty call during teardown. Fix this by setting i_update_core nanually in that case, so that the inode gets flushed during inode reclaim. Alternatively we could enable calling mark_inode_dirty for inodes in I_WILL_FREE state, and let the VFS dirty tracking handle this. I decided against this as we will get better I/O patterns from reclaim compared to the synchronous writeout in write_inode_now, and always marking the inode dirty in some way from xfs_mark_inode_dirty is a better safetly net in either case. Signed-off-by: Christoph Hellwig <hch@xxxxxx> Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 57b5a91db28542a8d8a697b9e3da2bd0e062f7d3 Author: Dave Chinner <david@xxxxxxxxxxxxx> Date: Thu Aug 25 07:17:02 2011 +0000 xfs: don't serialise adjacent concurrent direct IO appending writes For append write workloads, extending the file requires a certain amount of exclusive locking to be done up front to ensure sanity in things like ensuring that we've zeroed any allocated regions between the old EOF and the start of the new IO. For single threads, this typically isn't a problem, and for large IOs we don't serialise enough for it to be a problem for two threads on really fast block devices. However for smaller IO and larger thread counts we have a problem. Take 4 concurrent sequential, single block sized and aligned IOs. After the first IO is submitted but before it completes, we end up with this state: IO 1 IO 2 IO 3 IO 4 +-------+-------+-------+-------+ ^ ^ | | | | | | | \- ip->i_new_size \- ip->i_size And the IO is done without exclusive locking because offset <= ip->i_size. When we submit IO 2, we see offset > ip->i_size, and grab the IO lock exclusive, because there is a chance we need to do EOF zeroing. However, there is already an IO in progress that avoids the need for IO zeroing because offset <= ip->i_new_size. hence we could avoid holding the IO lock exlcusive for this. Hence after submission of the second IO, we'd end up this state: IO 1 IO 2 IO 3 IO 4 +-------+-------+-------+-------+ ^ ^ | | | | | | | \- ip->i_new_size \- ip->i_size There is no need to grab the i_mutex of the IO lock in exclusive mode if we don't need to invalidate the page cache. Taking these locks on every direct IO effective serialises them as taking the IO lock in exclusive mode has to wait for all shared holders to drop the lock. That only happens when IO is complete, so effective it prevents dispatch of concurrent direct IO writes to the same inode. And so you can see that for the third concurrent IO, we'd avoid exclusive locking for the same reason we avoided the exclusive lock for the second IO. Fixing this is a bit more complex than that, because we need to hold a write-submission local value of ip->i_new_size to that clearing the value is only done if no other thread has updated it before our IO completes..... Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> commit 37b652ec6445be99d0193047d1eda129a1a315d3 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Thu Aug 25 07:17:01 2011 +0000 xfs: don't serialise direct IO reads on page cache checks There is no need to grab the i_mutex of the IO lock in exclusive mode if we don't need to invalidate the page cache. Taking these locks on every direct IO effective serialises them as taking the IO lock in exclusive mode has to wait for all shared holders to drop the lock. That only happens when IO is complete, so effective it prevents dispatch of concurrent direct IO reads to the same inode. Fix this by taking the IO lock shared to check the page cache state, and only then drop it and take the IO lock exclusively if there is work to be done. Hence for the normal direct IO case, no exclusive locking will occur. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> Tested-by: Joern Engel <joern@xxxxxxxxx> Reviewed-by: Christoph Hellwig <hch@xxxxxx> Signed-off-by: Alex Elder <aelder@xxxxxxx> ----------------------------------------------------------------------- hooks/post-receive -- XFS development tree _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs