From: Dave Chinner <dchinner@xxxxxxxxxx> One of the biggest problems with inode allocation performance right now is that searching for a free inode requires an exhaustive scan of the inode btree to find a record with a free inode in it. IOWs, the inode btree indexes inode chunks, not free inodes. To speed up the search for a free inode, introduce a new per-AG btree rooted in the AGI that tracks records with free inodes in them. This requires an inode chunk allocation to insert a record into two AGI btrees - one for the allocated inode chunk, and one for the free inodes record. When we allocate a free inode, we now will need to modify two records - one in each tree - and potentially remove a record from the free inode btree. That is, if a record has no free inodes, then it is removed from the btree. This means we have to ensure that the transaction reservation for a free inode modification has enough space in it for a inode btree merge. Finally, it means that freeing an inode can insert a record into the free inode btree. This can cause a split of the tree and hence we need to ensure that the transaction reservation takes this into account. This structure means that the free inode btree only tracks inode chunks with free inodes in them and hence will always provide extremely fast lookup of the closest free inode to the allocation target. When the free inode btree exists, we will no longer use the allocated inode chunk btree for allocation lookups - only the free inode btree will be used. This functionality requires that we use a read-only compatible feature flag - older kernels can still read the filesystem structure just fine, but they aren't allowed to modify it as that will result in the new free inode btree not being updated correctly. Another advantage of the second btree is that we now have some redundant metadata pointing to inode chunks. it's not complete, but it certainly will help determining if an inode is supposed to be free or not when corruptions occur. i.e. it is no longer a single bit of data in a single btree record. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> --- fs/xfs/xfs_ag.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/xfs/xfs_ag.h b/fs/xfs/xfs_ag.h index eb25689..1a97646 100644 --- a/fs/xfs/xfs_ag.h +++ b/fs/xfs/xfs_ag.h @@ -166,6 +166,9 @@ typedef struct xfs_agi { __be32 agi_pad32; __be64 agi_lsn; /* last write sequence */ + __be32 agi_free_root; + __be32 agi_free_level; + /* structure must be padded to 64 bit alignment */ } xfs_agi_t; -- 1.8.3.2 _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs