Re: [PATCH 2/2] xfs: Extend xattr extent counter to 32-bits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday, April 23, 2020 4:00 AM Dave Chinner wrote: 
> On Wed, Apr 22, 2020 at 03:08:00PM +0530, Chandan Rajendra wrote:
> > On Monday, April 20, 2020 10:08 AM Chandan Rajendra wrote: 
> > > On Tuesday, April 14, 2020 12:25 AM Darrick J. Wong wrote: 
> > > > That said, it was very helpful to point out that the current MAXEXTNUM /
> > > > MAXAEXTNUM symbols stop short of using all 32 (or 16) bits.
> > > > 
> > > > Can we use this new feature flag + inode flag to allow 4294967295
> > > > extents in either fork?
> > > 
> > > Sure.
> > > 
> > > I have already tested that having 4294967295 as the maximum data extent count
> > > does not cause any regressions.
> > > 
> > > Also, Dave was of the opinion that data extent counter be increased to
> > > 64-bit. I think I should include that change along with this feature flag
> > > rather than adding a new one in the near future.
> > > 
> > > 
> > 
> > Hello Dave & Darrick,
> > 
> > Can you please look into the following design decision w.r.t using 32-bit and
> > 64-bit unsigned counters for xattr and data extents.
> > 
> > Maximum extent counts.
> > |-----------------------+----------------------|
> > | Field width (in bits) |          Max extents |
> > |-----------------------+----------------------|
> > |                    32 |           4294967295 |
> > |                    48 |      281474976710655 |
> > |                    64 | 18446744073709551615 |
> > |-----------------------+----------------------|
> 
> These huge numbers are impossible to compare visually.  Once numbers
> go beyond 7-9 digits, you need to start condensing them in reports.
> Humans are, in general, unable to handle strings of digits longer
> than 7-9 digits at all well...
> 
> Can you condense them by using scientific representation i.e. XEy,
> which gives:
> 
> |-----------------------+-------------|
> | Field width (in bits) | Max extents |
> |-----------------------+-------------|
> |                    32 |      4.3E09 |
> |                    48 |      2.8E14 |
> |                    64 |      1.8E19 |
> |-----------------------+-------------|
> 
> It's much easier to compare differences visually because it's not
> only 4 digits, not 20. The other alternative is to use k,m,g,t,p,e
> suffixes to indicate magnitude (4.3g, 280t, 18e), but using
> exponentials make the numbers easier to do calculations on
> directly...
>

Sorry about that. I will use scientific notation for representing large
numbers.

> > |-------------------+-----|
> > | Minimum node recs | 125 |
> > | Minimum leaf recs | 125 |
> > |-------------------+-----|
>

Yes, your assumption of 4k block size is correct. I will include detailed
calculation steps in my future mails.

> Please show your working. I'm assuming this is 50% * 4kB /
> sizeof(bmbt_rec), so you are working out limits based on 4kB block
> size? Realistically, worse case behaviour will be with the minimum
> supported block size, which in this case will be 1kB....
> 
> > Data bmbt tree height (MINDBTPTRS == 3)
> > |-------+------------------------+-------------------------|
> > | Level | Number of nodes/leaves |           Total Nr recs |
> > |       |                        | (nr nodes/leaves * 125) |
> > |-------+------------------------+-------------------------|
> > |     0 |                      1 |                       3 |
> > |     1 |                      3 |                     375 |
> > |     2 |                    375 |                   46875 |
> > |     3 |                  46875 |                 5859375 |
> > |     4 |                5859375 |               732421875 |
> > |     5 |              732421875 |             91552734375 |
> > |     6 |            91552734375 |          11444091796875 |
> > |     7 |         11444091796875 |        1430511474609375 |
> > |     8 |       1430511474609375 |      178813934326171875 |
> > |     9 |     178813934326171875 |    22351741790771484375 |
> > |-------+------------------------+-------------------------|
> > 
> > For counting data extents, even though we theoretically have 64 bits at our
> > disposal, I think we should have (2 ** 48) - 1 as the maximum number of
> > extents. This gives 281474976710655 (i.e. ~281 trillion extents). With this,
> > bmbt tree's height grows by just two more levels (i.e. it grows from the
> > current maximum height of 5 to 7). Please let me know your opinion on this.
> 
> We shouldn't make up arbitrary limits when we can calculate them exactly.
> i.e. 2^63 max file size, 1kB block size (2^10), means max fragments
> is 2^53 entries. On a 64kB block size (2^16), we have a max extent
> count of 2^47....
> 
> i.e. 2^48 would be an acceptible limit for 1kB block size, but it is
> not correct for 64kB block size filesystems....

You are right about this. I will set the max data extent count to 2^47.

> 
> > Attr bmbt tree height (MINABTPTRS == 2)
> > |-------+------------------------+-------------------------|
> > | Level | Number of nodes/leaves |           Total Nr recs |
> > |       |                        | (nr nodes/leaves * 125) |
> > |-------+------------------------+-------------------------|
> > |     0 |                      1 |                       2 |
> > |     1 |                      2 |                     250 |
> > |     2 |                    250 |                   31250 |
> > |     3 |                  31250 |                 3906250 |
> > |     4 |                3906250 |               488281250 |
> > |     5 |              488281250 |             61035156250 |
> > |-------+------------------------+-------------------------|
> > 
> > For xattr extents, (2 ** 32) - 1 = 4294967295 (~ 4 billion extents). So this
> > will cause the corresponding bmbt's maximum height to go from 3 to 5.
> > This probably won't cause any regression.
> 
> We already have the XFS_DA_NODE_MAXDEPTH set to 5, so changing the
> attr fork extent count makes no difference to the attribute fork
> bmbt reservations. i.e. the bmbt reservations are defined by the
> dabtree structure limits, not the maximum extent count the fork can
> hold.

I think the dabtree structure limits is because of the following ...

How many levels of dabtree would be needed to hold ~100 million xattrs?
- name len = 16 bytes
         struct xfs_parent_name_rec {
               __be64  p_ino;
               __be32  p_gen;
               __be32  p_diroffset;
       };
  i.e. 64 + 32 + 32 = 128 bits = 16 bytes;
- Value len = file name length = Assume ~40 bytes
- Formula for number of node entries (used in column 3 in the table given
  below) at any level of the dabtree,
  nr_blocks * ((block size - sizeof(struct xfs_da3_node_hdr)) / sizeof(struct
  xfs_da_node_entry))
  i.e. nr_blocks * ((block size - 64) / 8)
- Formula for number of leaf entries (used in column 4 in the table given
  below),
  (block size - sizeof(xfs_attr_leaf_hdr_t)) /
  (sizeof(xfs_attr_leaf_entry_t) + valuelen + namelen + nameval)
  i.e. nr_blocks * ((block size - 32) / (8 + 2 + 1 + 16 + 40))

Here I have assumed block size to be 4k.

|-------+------------------+--------------------------+--------------------------|
| Level | Number of blocks | Number of entries (node) | Number of entries (leaf) |
|-------+------------------+--------------------------+--------------------------|
|     0 |              1.0 |                      5e2 |                    6.1e1 |
|     1 |              5e2 |                    2.5e5 |                    3.0e4 |
|     2 |            2.5e5 |                    1.3e8 |                    1.5e7 |
|     3 |            1.3e8 |                   6.6e10 |                    7.9e9 |
|-------+------------------+--------------------------+--------------------------|

Hence we would need a tree of height 3.
Total number of blocks = 1 + 5e2 + 2.5e5 + 1.3e8 = ~1.3e8
... which is < 2^32 (4.3e9)

> 
> The data fork to 64 bits has no impact on the directory
> reservations, either, because the number of extents in the directory
> is bound by the directory segment size of 32GB. i.e. a directory can
> hold, at most, 32GB of dirent data, which means there's a hard limit
> on the number of dabtree entries somewhere in the order of a few
> hundred million. That's where XFS_DA_NODE_MAXDEPTH comes from - it's
> large enough to index a max sized directory, and the BMBT overhead
> is derived from that...

Ok. Thanks for explaining that.

> 
> > Meanwhile, I will work on finding the impact of increasing the
> > height of these two trees on log reservation.
> 
> It should not change it substantially - 2 blocks per bmbt
> reservation per transaction is what I'd expect from the numbers
> presented...

I still haven't got to this task yet. I will respond soon. I spent time in
figuring out how directories are organized in XFS and also arriving at the
above mentioned calculations for xattr extent counter. 

-- 
chandan






[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux