From: Dave Chinner <dchinner@xxxxxxxxxx> TO get rid of bufferheads from the writepage path, we have to get rid of the bufferhead chaining that is done in the ioends to keep track of the blocks under IO. We also mark the page clean indirectly through bufferhead IO completion callbacks. To move away from bufferheads, we need to track bios rather than bufferheads, and on ioend completion we need to mark pages clean directly. This makes it "interesting" for filesystems with sub-page block size, because the bufferheads are used to track sub-page dirty state. That is, only when all the bufferheads are clean is the page marked clean. For now, we will ignore the sub-page block size problem and address the block size = page size configuration first. Once the bio/page handling infrastructure is in place we can add support of sub-page block sizes. Right now an xfs_ioend tracks a sequential region via a bufferhead chain that is, at IO submission, converted to bios and then submitted. A single xfs_ioend may require multiple bios to be submitted, and so the ioend keeps a reference count of the number of bios it needs completions from before it can process the IO completion of the bufferhead chain across that region. As such, we have a dual layer IO submission/completion process. Assumming block size = page size, what we have is this: pages +-+-+-+-+-+-+-+-+-+ bufferhead +-+-+-+-+-+-+-+-+-+ xfs_ioend +eeeeeee+eeeeeeeee+ bios +bbb+bbb+bbbbbb+bb+ So IO submission looks like: - .writepage is given a page - XFS creates an ioend or pulls the existing one from the writepage context, - XFS walks the bufferheads on the page and adds the bufferheads to it. - XFS will chains ioends together when some kind of IO discontiguity occurs - When all the page walks are complete, XFS "submits" the ioend - XFS walks the bufferheads, marking them as under async writeback - XFS walks the bufferheads again, building bios from the pages backing the bufferheads. When bios are too large to have more pages added to them or there is a discontinuity in the IO mapping, the bio is submitted and anew one is started. On IO completion: - xfs grabs the ioend from the bio, drops the bio and decrements the reference count on the ioend. - ioend reference count goes to zero, runs endio callbacks (e.g. size update, unwritten extent conversion). - ioend is destroyed - destroy walks bufferhead chain on ioend, calling bufferhead IO completion - bufferhead IO completion calls page_end_writeback appropriately. IOWs, the xfs_ioend is really a mapping layer between bufferheads and bios, and the bufferheads kind of hide us from pages in the IO submission path. To get rid of bufferheads, we have to get rid of the dependency on bufferhead chaining for building bios and marking pages clean on IO completion. What we really want is this: pages +-+-+-+-+-+-+-+-+-+ xfs_ioend +eeeeeee+eeeeeeeee+ bios +bbb+bbb+bbbbbb+bb+ And for us to be able to hold on to the bios being completed until they are all done before we start ioend processing. It looks like we can use chaining via the bi_private field (i.e. a single linked list) to attach all the bios to the ioend prior to submission, we replace that with a reference count and apointer to the ioend during submission, and then rebuild the chain during IO completion. We then don't drop the bio references until we destroy the ioend, after we've walked all the pages held by the bios and ended writeback on them. This will also handle sub-page block sizes that may require multiple bios to clean a page as long as submission always creates page granularity ioends. Hence IO submission should look like: - .writepage is given a page - XFS creates an ioend or pulls the existing one from the writepage context - XFS grabs the iomap from from the wpc or gets a new one - XFS checks page is adjacent to previous. Yes, checks mapping is valid. No to either, grabs new iomap, create new bio, chain bio to ioend. Then add page to bio, mark page as under io. - When all the page walks are complete, XFS "submits" the ioend - XFS walks the bio chain, removing them, taking references to the ioend, bi_private = ioend, and then submitting i them in order. On IO completion: - xfs grabs the ioend for the bio, chains the bio back to the ioend. Stashes the error in the ioend. drops the refernce to the ioend. - ioend reference count goes to zero, runs endio callbacks (e.g. size update, unwritten extent conversion). - ioend is destroyed - destroy walks the bio chain, calling page_end_writeback() on the pages within, dropping bio references to free them. Simples, yes? In a few patches time, writepage will no longer have any bufferheads in it. However, until we get rid of bufferheads completely, we still need to make sure their state reflects the page state. Hence as a stop-gap measure, the ioend bio submission and destruction will need to walk the buffers on the pages and change their state appropriately. This will be a wart on the side that will get removed when bufferheads are removed from the other buffered IO paths in XFS. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> --- fs/xfs/xfs_aops.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 08a0205..e52eb0e 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -36,6 +36,7 @@ #include <linux/pagevec.h> #include <linux/writeback.h> + /* * structure owned by writepages passed to individual writepage calls */ -- 2.5.0 _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs