Hi Brian, Thanks for your response. Brian Foster <bfoster@xxxxxxxxxx> 于2018年10月23日周二 下午10:53写道: > > On Tue, Oct 23, 2018 at 03:56:51PM +0800, Mao Cheng wrote: > > Sorry for trouble again. I just wrote wrong function name in previous > > sending, so resend it. > > If you have received previous email please ignore it, thanks > > > > we have a XFS mkfs with "-k" and mount with the default options( > > rw,relatime,attr2,inode64,noquota), the size is about 2.2TB,and > > exported via samba. > > > > [root@test1 home]# xfs_info /dev/sdk > > meta-data=/dev/sdk isize=512 agcount=4, agsize=131072000 blks > > = sectsz=4096 attr=2, projid32bit=1 > > = crc=1 finobt=0 spinodes=0 > > data = bsize=4096 blocks=524288000, imaxpct=5 > > = sunit=0 swidth=0 blks > > naming =version 2 bsize=4096 ascii-ci=0 ftype=1 > > log =internal bsize=4096 blocks=256000, version=2 > > = sectsz=4096 sunit=1 blks, lazy-count=1 > > realtime =none extsz=4096 blocks=0, rtextents=0 > > > > free space about allocation groups: > > from to extents blocks pct > > 1 1 9 9 0.00 > > 2 3 14291 29124 0.19 > > 4 7 5689 22981 0.15 > > 8 15 119 1422 0.01 > > 16 31 754657 15093035 99.65 > > 32 63 1 33 0.00 > > total free extents 774766 > > total free blocks 15146604 > > average free extent size 19.5499 > > from to extents blocks pct > > 1 1 253 253 0.00 > > 2 3 7706 16266 0.21 > > 4 7 7718 30882 0.39 > > 8 15 24 296 0.00 > > 16 31 381976 7638130 96.71 > > 32 63 753 38744 0.49 > > 131072 262143 1 173552 2.20 > > total free extents 398431 > > total free blocks 7898123 > > average free extent size 19.8231 > > from to extents blocks pct > > 1 1 370 370 0.00 > > 2 3 2704 5775 0.01 > > 4 7 1016 4070 0.01 > > 8 15 24 254 0.00 > > 16 31 546614 10931743 20.26 > > 32 63 19191 1112600 2.06 > > 64 127 2 184 0.00 > > 131072 262143 1 163713 0.30 > > 524288 1048575 2 1438626 2.67 > > 1048576 2097151 4 5654463 10.48 > > 2097152 4194303 1 3489060 6.47 > > 4194304 8388607 2 12656637 23.46 > > 16777216 33554431 1 18502975 34.29 > > total free extents 569932 > > total free blocks 53960470 > > average free extent size 94.6788 > > from to extents blocks pct > > 1 1 8 8 0.00 > > 2 3 5566 11229 0.06 > > 4 7 9622 38537 0.21 > > 8 15 57 686 0.00 > > 16 31 747242 14944852 80.31 > > 32 63 570 32236 0.17 > > 2097152 4194303 1 3582074 19.25 > > total free extents 763066 > > total free blocks 18609622 > > average free extent size 24.38 > > > > So it looks like free space in 3 out of 4 AGs is mostly fragmented to > 16-31 block extents. Those same AGs appear to have a much higher number > (~15k-20k) of even smaller extents. > > > we copy small files(about 150kb) from windows to xfs via SMB protocal, > > sometines kworker process consumes 100% of one CPU, and "perf top" > > shows xfs_extent_busy_trim() and xfs_btree_increment() consume too much > > cpu resources, ftrace also show xfs_alloc_ag_vextent_near takes about 30ms to > > complete. > > > > This is kind of a vague performance report. Some process consumes a full > CPU and this is a problem for some (??) reason given unknown CPU and > unknown storage (with unknown amount of RAM). I assume that kworker task > is writeback, but you haven't really specified that either. yes, the kworker task is writeback, and the storage is XFS-formatted disk that is alos the target we copy files to. > > xfs_alloc_ag_vextent_near() is one of the several block allocation > algorithms in XFS. That function itself includes a couple different > algorithms for the "near" allocation based on the state of the AG. One > looks like an intra-block search of the by-size free space btree (if not > many suitably sized extents are available) and the second looks like an > outward sweep of the by-block free space btree to find a suitably sized > extent. I could certainly see the latter taking some time for certain > sized allocation requests under fragmented free space conditions. If you > wanted more detail over what's going on here, I'd suggest to capture a > sample of the xfs_alloc* (and perhaps xfs_extent_busy*) tracepoints > during the workload. > > That aside, it's probably best to step back and describe for the list > the overall environment, workload and performance problem you observed > that caused this level of digging in the first place. For example, has > throughput degraded over time? Latency increased? How many writers are > active at once? Is preallocation involved (I thought Samba/Windows > triggered it certain cases, but I don't recall)? We share an xfs filesystem to windows via SMB protocol. There are about 5 windows copy small files to the samba share at the same time. The main problem is the throughput degraded from 30MB/s to around 10KB/s periodically and recovered about 5s later. The kworker consumes 100% of one CPU when the throughput degraded and kworker task is wrteback. /proc/vmstat shows nr_dirty is very close to nr_dirty_threshold and nr_writeback is too small(is that means there too many dirty pages in page cache and can't be written out to disk?) Mao > > Brian > > > In addition all tests were performed on Centos7.4(3.10.0-693.el7.x86_64). > > > > Any suggestions are welcome.