Re: xfs_alloc_ag_vextent_near() takes about 30ms to complete

Brian Foster <bfoster@xxxxxxxxxx> · Tue, 23 Oct 2018 10:53:40 -0400

On Tue, Oct 23, 2018 at 03:56:51PM +0800, Mao Cheng wrote:
> Sorry for trouble again. I just wrote wrong function name in previous
> sending, so resend it.
> If you have received previous email please ignore it, thanks
> 
> we have a XFS mkfs with "-k" and mount with the default options(
> rw,relatime,attr2,inode64,noquota), the size is about 2.2TB，and
> exported via samba.
> 
> [root@test1 home]# xfs_info /dev/sdk
> meta-data=/dev/sdk               isize=512    agcount=4, agsize=131072000 blks
>          =                       sectsz=4096  attr=2, projid32bit=1
>          =                       crc=1        finobt=0 spinodes=0
> data     =                       bsize=4096   blocks=524288000, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
> log      =internal               bsize=4096   blocks=256000, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> free space about allocation groups:
>    from      to extents  blocks    pct
>       1       1       9       9   0.00
>       2       3   14291   29124   0.19
>       4       7    5689   22981   0.15
>       8      15     119    1422   0.01
>      16      31  754657 15093035  99.65
>      32      63       1      33   0.00
> total free extents 774766
> total free blocks 15146604
> average free extent size 19.5499
>    from      to extents  blocks    pct
>       1       1     253     253   0.00
>       2       3    7706   16266   0.21
>       4       7    7718   30882   0.39
>       8      15      24     296   0.00
>      16      31  381976 7638130  96.71
>      32      63     753   38744   0.49
>  131072  262143       1  173552   2.20
> total free extents 398431
> total free blocks 7898123
> average free extent size 19.8231
>    from      to extents  blocks    pct
>       1       1     370     370   0.00
>       2       3    2704    5775   0.01
>       4       7    1016    4070   0.01
>       8      15      24     254   0.00
>      16      31  546614 10931743  20.26
>      32      63   19191 1112600   2.06
>      64     127       2     184   0.00
>  131072  262143       1  163713   0.30
>  524288 1048575       2 1438626   2.67
> 1048576 2097151       4 5654463  10.48
> 2097152 4194303       1 3489060   6.47
> 4194304 8388607       2 12656637  23.46
> 16777216 33554431       1 18502975  34.29
> total free extents 569932
> total free blocks 53960470
> average free extent size 94.6788
>    from      to extents  blocks    pct
>       1       1       8       8   0.00
>       2       3    5566   11229   0.06
>       4       7    9622   38537   0.21
>       8      15      57     686   0.00
>      16      31  747242 14944852  80.31
>      32      63     570   32236   0.17
> 2097152 4194303       1 3582074  19.25
> total free extents 763066
> total free blocks 18609622
> average free extent size 24.38
> 

So it looks like free space in 3 out of 4 AGs is mostly fragmented to
16-31 block extents. Those same AGs appear to have a much higher number
(~15k-20k) of even smaller extents.

> we copy small files(about 150kb) from windows to xfs via SMB protocal,
> sometines  kworker process consumes 100% of one CPU, and "perf top"
> shows xfs_extent_busy_trim() and xfs_btree_increment()  consume too much
> cpu resources, ftrace also show xfs_alloc_ag_vextent_near takes about 30ms to
> complete.
> 

This is kind of a vague performance report. Some process consumes a full
CPU and this is a problem for some (??) reason given unknown CPU and
unknown storage (with unknown amount of RAM). I assume that kworker task
is writeback, but you haven't really specified that either.

xfs_alloc_ag_vextent_near() is one of the several block allocation
algorithms in XFS. That function itself includes a couple different
algorithms for the "near" allocation based on the state of the AG. One
looks like an intra-block search of the by-size free space btree (if not
many suitably sized extents are available) and the second looks like an
outward sweep of the by-block free space btree to find a suitably sized
extent. I could certainly see the latter taking some time for certain
sized allocation requests under fragmented free space conditions. If you
wanted more detail over what's going on here, I'd suggest to capture a
sample of the xfs_alloc* (and perhaps xfs_extent_busy*) tracepoints
during the workload.

That aside, it's probably best to step back and describe for the list
the overall environment, workload and performance problem you observed
that caused this level of digging in the first place. For example, has
throughput degraded over time? Latency increased? How many writers are
active at once? Is preallocation involved (I thought Samba/Windows
triggered it certain cases, but I don't recall)?

Brian

> In addition all tests were performed on Centos7.4(3.10.0-693.el7.x86_64).
> 
> Any suggestions are welcome.