From: Dave Chinner <dchinner@xxxxxxxxxx> Now that physical inode allocation is being done in the background and separated from the high level free inode allocation operations, we can start to optimise the way we allocate physical inode chunks based on observation of inode chunk allocation requirements. To start with, we need to determine the approximate rate at which we are allocating inode chunks. This will tell us how many inode chunks we should allocate at a time to try to minimise the amount of time free inode allocation stalls waiting for chunk allocation to occur. Ideally we want to allocate in large enough chunks that we rarely block free inode allocation. Assuming a typical inode allocation rate of approximately 20,000 per second per CPU (~2GHz Xeon CPUs run at around this rate), then we are allocating roughly 300 inode chunks per second. We can assume that this is the rate at which we can allocate from a single AG, as inode allocation within an AG is single threaded. Hence trying to keep a "chunks allocated per second" measure probably has sufficient resolution to provide a stable rate which we can use to allocate an appropriate number of chunks ahead of time. This would also allow us to determine a low watermark at which the inode allocation ticket subsystem can use to kick chunk allocation before we run out of inodes and force free inode allocation to block. Once we have a determined rate, we can use that to allocate a number of inode chunks in a single execution of the worker. Ideally, we want the worker to allocate enough inode chunks so that it only needs to run a couple of times a second, and to be able to do this allocation in a manner that results in large contiguous regions of inode chunks. For v4 superblocks, just iterate the existing inode chunk allocation transaction to allocate a chunk at a time. For v5 superblocks, we have the logical inode create transaction which allows us to initialise an arbitrary number of inode chunks at a time. The limit of chunks we can support right now with the current transaction reservation is the maximum number of sequential records we can insert into the inode btree while guaranteeing only a single leaf to root split will occur. This will probably require a special new btree operation for a bulk record insert with a single index path update once the split and insert is done. This is probably sufficiently complex that it will require a series of several patches to do. Once we can allocate multiple inode chunks in a single operation, we can optimise inode chunk layout for stripe unit/width extremely well. i.e. we should allocate a fully aligned stripe unit at a time, and potentially larger if allowed by the limits of a bulk record insert. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> --- fs/xfs/xfs_ag.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/xfs/xfs_ag.h b/fs/xfs/xfs_ag.h index 317aa86..eb25689 100644 --- a/fs/xfs/xfs_ag.h +++ b/fs/xfs/xfs_ag.h @@ -249,6 +249,8 @@ typedef struct xfs_perag { xfs_agino_t pagi_freecount; /* number of free inodes */ xfs_agino_t pagi_count; /* number of allocated inodes */ + int pagi_chunk_alloc_rate; + /* * Inode allocation search lookup optimisation. * If the pagino matches, the search for new inodes -- 1.8.3.2 _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs