On Mon, Nov 30, 2020 at 02:35:21PM +0530, Chandan Babu R wrote: > The comment in xfs_rmapbt_compute_maxlevels() mentions that with > reflink enabled, XFS will run out of AG blocks before reaching maximum > levels of XFS_BTREE_MAXLEVELS (i.e. 9). This is easy to prove for 4k > block size case: > > Considering theoretical limits, maximum height of rmapbt can be, > max btree height = Log_(min_recs)(total recs) > max_rmapbt_height = Log_45(2^64) = 12. > > Detailed calculation: > nr-levels = 1; nr-leaf-blks = 2^64 / 84 = 2e17; > nr-levels = 2; nr-blks = 2e17 / 45 = 5e15; > nr-levels = 3; nr-blks = 5e15 / 45 = 1e14; > nr-levels = 4; nr-blks = 1e14 / 45 = 2e12; > nr-levels = 5; nr-blks = 2e12 / 45 = 5e10; > nr-levels = 6; nr-blks = 5e10 / 45 = 1e9; > nr-levels = 7; nr-blks = 1e9 / 45 = 3e7; > nr-levels = 8; nr-blks = 3e7 / 45 = 6e5; > nr-levels = 9; nr-blks = 6e5 / 45 = 1e4; > nr-levels = 10; nr-blks = 1e4 / 45 = 3e2; > nr-levels = 11; nr-blks = 3e2 / 45 = 6; > nr-levels = 12; nr-blks = 1; > > Total number of blocks = 2e17 > > Here, 84 is the minimum number of leaf records and 45 is the minimum > number of node records in the rmapbt when using 4k block size. 2^64 is > the maximum possible rmapbt records > (i.e. max_rmap_entries_per_disk_block (2^32) * max_nr_agblocks > (2^32)). > > i.e. theoretically rmapbt height can go upto 12. > > But as the comment in xfs_rmapbt_compute_maxlevels() suggests, we will > run out of per-ag blocks trying to build an rmapbt of height > XFS_BTREE_MAXLEVELS (i.e. 9). > > Since number of nodes grows as a geometric series, > nr_nodes (roughly) = (45^9 - 1) / (45 - 1) = 10e12 > > i.e. 10e12 blocks > max ag blocks (2^32 == 4e9) > > > However, with 1k block size we are not close to consuming all of 2^32 > AG blocks as shown by the below calculations, > > - rmapbt with maximum of 9 levels will have roughly (11^9 - 1) / (11 - > 1) = 2e8 blocks. > - 11 is the minimum number of recs in a non-leaf node with 1k block size. > - Also, Total number of records (roughly) = (nr_leaves * 11) = 11^8 * 11 > = 2e9 (this value will be used later). > > - refcountbt > - Maximum number of records theoretically = maximum number of blocks > in an AG = 2^32 > - Total (leaves and non-leaf nodes) blocks required to hold 2^32 records > Leaf min recs = 20; Node min recs = 60 (with 1k as the block size). > - Detailed calculation: > nr-levels = 1; nr-leaf-blks = 2^32 / 20 = 2e8; > nr-levels = 2; nr-blks = 2e8 / 60 = 4e6 > nr-levels = 3; nr-blks = 4e6 / 60 = 6e4 > nr-levels = 4; nr-blks = 6e4 / 60 = 1.0e3 > nr-levels = 5; nr-blks = 1.0e3 / 60 = 2e1 > nr-levels = 6; nr-blks = 1 > > - Total block count = 2e8 > > - Bmbt (assuming all the rmapbt records have the same inode as owner) > - Total (leaves and non-leaf nodes) blocks required to hold 2e9 records > Leaf min recs = 29; Node min recs = 29 (with 1k as the block size). > (2e9 is the maximum rmapbt records with rmapbt height 9 and 1k block size). > nr-levels = 1; nr-leaf-blks = 2e9 / 29 = 7e7 > nr-levels = 2; nr-blks = 7e7 / 29 = 2e6 > nr-levels = 3; nr-blks = 2e6 / 29 = 8e4 > nr-levels = 4; nr-blks = 8e4 / 29 = 3e3 > nr-levels = 5; nr-blks = 3e3 / 29 = 1e2 > nr-levels = 6; nr-blks = 1e2 / 29 = 3 > nr-levels = 7; nr-blks = 1 > > - Total block count = 7e7 > > Total blocks used across rmapbt, refcountbt and bmbt = 2e8 + 2e8 + 7e7 = 5e8. > > Since 5e8 < 4e9(i.e. 2^32), we have not run out of blocks trying to > build a rmapbt with XFS_BTREE_MAXLEVELS (i.e 9) levels. > > Please let me know if my understanding is incorrect. I have no idea what is the real upper limit on the number of rmap records on a reflink filesystem -- as the rmap btree gets bigger, the maximum number of records that one could need to store in that btree gets smaller because rmap btree blocks cannot be shared and all have the same owner (_OWN_AG). But your reasoning seems correct. > I have come across a "log reservation" calculation issue when > increasing XFS_BTREE_MAXLEVELS to 10 which is in turn required for Hmm. That will increase the size of the btree cursor structure even farther. It's already gotten pretty bad with the realtime rmap and reflink patchsets since the realtime volume can have 2^63 blocks, which implies a theoretical maximum rtrmapbt height of 21 levels and a maximum rtrefcountbt height of 13 levels. (These heights are absurd, since they imply a data device of 2^63 blocks...) I suspect that we need to split MAXLEVELS into two values -- one for per-AG btrees, and one for per-file btrees, and then refactor the btree cursor so that the level data are a single VLA at the end. I started a patchset to do all that[1], but it's incomplete. [1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=btree-dynamic-depth&id=692f761838dd821cd8cc5b3d1c66d6b1ac8ec05b > extending data fork extent count to 48 bits. To proceed further, I > need to have a correct understanding of problem I have described w.r.t > 1k filesystem block size. <nod> --D > > -- > chandan > > >