Re: [PATCH 2/2] xfs: Extend xattr extent counter to 32-bits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 22, 2020 at 03:08:00PM +0530, Chandan Rajendra wrote:
> On Monday, April 20, 2020 10:08 AM Chandan Rajendra wrote: 
> > On Tuesday, April 14, 2020 12:25 AM Darrick J. Wong wrote: 
> > > That said, it was very helpful to point out that the current MAXEXTNUM /
> > > MAXAEXTNUM symbols stop short of using all 32 (or 16) bits.
> > > 
> > > Can we use this new feature flag + inode flag to allow 4294967295
> > > extents in either fork?
> > 
> > Sure.
> > 
> > I have already tested that having 4294967295 as the maximum data extent count
> > does not cause any regressions.
> > 
> > Also, Dave was of the opinion that data extent counter be increased to
> > 64-bit. I think I should include that change along with this feature flag
> > rather than adding a new one in the near future.
> > 
> > 
> 
> Hello Dave & Darrick,
> 
> Can you please look into the following design decision w.r.t using 32-bit and
> 64-bit unsigned counters for xattr and data extents.
> 
> Maximum extent counts.
> |-----------------------+----------------------|
> | Field width (in bits) |          Max extents |
> |-----------------------+----------------------|
> |                    32 |           4294967295 |
> |                    48 |      281474976710655 |
> |                    64 | 18446744073709551615 |
> |-----------------------+----------------------|

These huge numbers are impossible to compare visually.  Once numbers
go beyond 7-9 digits, you need to start condensing them in reports.
Humans are, in general, unable to handle strings of digits longer
than 7-9 digits at all well...

Can you condense them by using scientific representation i.e. XEy,
which gives:

|-----------------------+-------------|
| Field width (in bits) | Max extents |
|-----------------------+-------------|
|                    32 |      4.3E09 |
|                    48 |      2.8E14 |
|                    64 |      1.8E19 |
|-----------------------+-------------|

It's much easier to compare differences visually because it's not
only 4 digits, not 20. The other alternative is to use k,m,g,t,p,e
suffixes to indicate magnitude (4.3g, 280t, 18e), but using
exponentials make the numbers easier to do calculations on
directly...

> |-------------------+-----|
> | Minimum node recs | 125 |
> | Minimum leaf recs | 125 |
> |-------------------+-----|

Please show your working. I'm assuming this is 50% * 4kB /
sizeof(bmbt_rec), so you are working out limits based on 4kB block
size? Realistically, worse case behaviour will be with the minimum
supported block size, which in this case will be 1kB....

> Data bmbt tree height (MINDBTPTRS == 3)
> |-------+------------------------+-------------------------|
> | Level | Number of nodes/leaves |           Total Nr recs |
> |       |                        | (nr nodes/leaves * 125) |
> |-------+------------------------+-------------------------|
> |     0 |                      1 |                       3 |
> |     1 |                      3 |                     375 |
> |     2 |                    375 |                   46875 |
> |     3 |                  46875 |                 5859375 |
> |     4 |                5859375 |               732421875 |
> |     5 |              732421875 |             91552734375 |
> |     6 |            91552734375 |          11444091796875 |
> |     7 |         11444091796875 |        1430511474609375 |
> |     8 |       1430511474609375 |      178813934326171875 |
> |     9 |     178813934326171875 |    22351741790771484375 |
> |-------+------------------------+-------------------------|
> 
> For counting data extents, even though we theoretically have 64 bits at our
> disposal, I think we should have (2 ** 48) - 1 as the maximum number of
> extents. This gives 281474976710655 (i.e. ~281 trillion extents). With this,
> bmbt tree's height grows by just two more levels (i.e. it grows from the
> current maximum height of 5 to 7). Please let me know your opinion on this.

We shouldn't make up arbitrary limits when we can calculate them exactly.
i.e. 2^63 max file size, 1kB block size (2^10), means max fragments
is 2^53 entries. On a 64kB block size (2^16), we have a max extent
count of 2^47....

i.e. 2^48 would be an acceptible limit for 1kB block size, but it is
not correct for 64kB block size filesystems....

> Attr bmbt tree height (MINABTPTRS == 2)
> |-------+------------------------+-------------------------|
> | Level | Number of nodes/leaves |           Total Nr recs |
> |       |                        | (nr nodes/leaves * 125) |
> |-------+------------------------+-------------------------|
> |     0 |                      1 |                       2 |
> |     1 |                      2 |                     250 |
> |     2 |                    250 |                   31250 |
> |     3 |                  31250 |                 3906250 |
> |     4 |                3906250 |               488281250 |
> |     5 |              488281250 |             61035156250 |
> |-------+------------------------+-------------------------|
> 
> For xattr extents, (2 ** 32) - 1 = 4294967295 (~ 4 billion extents). So this
> will cause the corresponding bmbt's maximum height to go from 3 to 5.
> This probably won't cause any regression.

We already have the XFS_DA_NODE_MAXDEPTH set to 5, so changing the
attr fork extent count makes no difference to the attribute fork
bmbt reservations. i.e. the bmbt reservations are defined by the
dabtree structure limits, not the maximum extent count the fork can
hold.

The data fork to 64 bits has no impact on the directory
reservations, either, because the number of extents in the directory
is bound by the directory segment size of 32GB. i.e. a directory can
hold, at most, 32GB of dirent data, which means there's a hard limit
on the number of dabtree entries somewhere in the order of a few
hundred million. That's where XFS_DA_NODE_MAXDEPTH comes from - it's
large enough to index a max sized directory, and the BMBT overhead
is derived from that...

> Meanwhile, I will work on finding the impact of increasing the
> height of these two trees on log reservation.

It should not change it substantially - 2 blocks per bmbt
reservation per transaction is what I'd expect from the numbers
presented...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux