This patch series builds on top the current patch queue I posted yesterday. This series replaces the struct xfs-dabuf with an xfs_buf that can serve the same purpose. Directory buffers may be made up of multiple extents, but are currently formed by creating individual buffers and then copying the data out of them into a linear memory region in a dabuf structure. All dabuf operations then require walking all the underlying buffers to change the state of the underlying buffers, and once a dabuf is modified the contents need to be copied back to the underlying buffers before they are logged. All of these operations can be done on a normal xfs_buf, but the normal xfs_buf does not support multiple disk block ranges or doing multiple disjoint I/Os to read or write a buffer. Supporting multiple disk block ranges is not difficult - we simply need to attach an iovec-like array to the buffer rather than just using a single block number and length. Splitting the buffer up into multiple IOs for read and write is not difficult, either. We already track the number of IO remaining to complete an IO, so this can be used to wait for the multiple IO dispatched to complete (for both read and write). The only interesting twist to this is logging the changes. We can treat the discontiguous buffer as a single buffer for most purposes except for formatting the changes into the log. When formatting, we need to split the changes into a format item per underlying region so that recovery does not need to know about compound buffers and can recover each segment of a directory block indivdually as it does now. The fact that recovery will replay all or none of the transaction ensures this process is still atomic from a change recovery point of view. Further, even though log recovery doesn't use discontiguous buffers, there will be no confusion between a short buffer written by recovery and a discontiguous buffer read by the directory code after mount because the lengths of the buffer will be different. hence we need no changes to mount or log recovery processing as we already ensure that all log recovery changes hve been written to disk before we finish the mount process. The reason for making this changes is that we can now use a buffer cache callback to do all the metadata CRC calculations and verifications across both contiguous and discontiguous directory blocks. It greatly simplifies the implementation of this code and makes it consistent with all other metadata buffers. It should also provide a performance improvement because it avoids double copying and reduces the number of cached buffers. I've tested this on 4k/4k (FSB/DB sizes), 512b/64k, and 4k/64k combinations with xfstests and some dbench, fsmark and compilebench stress loads. More testing is welcome.... _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs