Hi folks, As ben and I discussed during the review of the initial CRC series, inode allocation needs to log the entire inode to ensure the replayed create transaction results in an inode with the correct CRC. This means that the logging overhead of inode create doubled for 256 byte inodes, and is close to 5x higher for 512 byte inodes. Ben suggested that having a transaction to initialise buffers to zero without needing to log them physically might be a way to solve the problem. It would solve the problem, but I already have a patchset from a few years back that introduces a new inode create transaction that doesn't require any physical logging on inodes at all. This patch series is a forward port of my original work from 2009 (hence the SOBs being from david@xxxxxxxxxxxxx) with a couple of more recent patches that will also help reduce inode buffer lookups and hence improve performance. The first two patches are for reducing he number of inode buffer lookups. When we are allocating a new inode, the only reason we look up the inode buffer is to read the generation number so we can increment it. This patch replaces the inode buffer read with radomly calculating a new generation number, resulting in an inode allocation being a purely in-memory operation requiring no IO. There is a caveat to that - for people using noikeep, we still need to ensure the generation number increments monotonically so we only take the new path if that mount option is not set. This reduces buffer lookups under create heavy workloads by roughly 10%. The second patch removes a buffer lookup and modification on unlink that was added for coherency with bulkstat back when bulkstat did non-cohernet inode lookups. bulkstat is using coherent lookups again, so the code in unlink is not necessary any more. The remaining 5 patches are the new icreate transaction series. The first patch introduced ordered buffers. These are buffers that are modified in transactions but are not logged by the transaction. They have an identical lifecycle to a normal buffer, and so pin the tail ofthe log until they are written back. This enables us to do log a logical change and have all the physical changes behave as though physical logging had been performed. This is used for the inode buffers by the new icreate transaction. The rest of the patches are simply mechanical - introducing the inode create log item, the changes to transaction reservations (uses less space in the log), converting the code to selectively use the new logging method and adding recovery support to it. Right now the code will use this transaction if the filesystem is CRC enabled. Given that CRC enabled filesystems are experimental at this point, adding a new log item type should not be a major problem for anyone using them - just make sure the log is clean before downgrading to an older kernel... The patchset passes xfstests on non-CRC filesystems without new regressions and the initial two patches are resulting in a ~10% improvement in 8-way create speed and a ~15% improvement in 8-way unlink speed. I don't have any numbers on CRC enabled filesystems as I've been working on the userspace CRC patchset and getting that into shape rather than tesing and benchmarking kernel CRC code... Comments, thoughts, flames? -Dave. PS. I'm working on an equivalent patchset for unlink that logs the the unlinked list as part of the inode core for CRC enabled filesystems. That's a little bit away from working yet, though... _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs