On Sat, Feb 24, 2024 at 04:12:31AM +0000, Matthew Wilcox wrote: > On Fri, Feb 23, 2024 at 03:59:58PM -0800, Luis Chamberlain wrote: > > ~86 GiB/s on pmem DIO on xfs with 64k block size, 1024 XFS agcount on x86_64 > > Vs > > ~ 7,000 MiB/s with buffered IO > > Profile? My guess is that you're bottlenecked on the xa_lock between > memory reclaim removing folios from the page cache and the various > threads adding folios to the page cache. If it was lock contention I was hoping to use perf lock record on fio, then perf lock report -F acquired,contended,avg_wait,wait_total If the contention was on locking xa_lock, it would creep up here, no? Name acquired contended avg wait total wait cgroup_rstat_lock 90132 90132 26.41 us 2.38 s event_mutex 32538 32538 1.40 ms 45.61 s 23476 23476 123.48 us 2.90 s 20803 20803 47.58 us 989.73 ms 11921 11921 31.19 us 371.82 ms 9389 9389 102.65 us 963.80 ms 7763 7763 21.86 us 169.69 ms 1736 1736 15.49 us 26.89 ms 743 743 308.30 us 229.07 ms 667 667 269.69 us 179.88 ms 522 522 36.64 us 19.13 ms 335 335 19.38 us 6.49 ms 328 328 157.10 us 51.53 ms 296 296 278.22 us 82.35 ms 288 288 214.82 us 61.87 ms 282 282 314.38 us 88.65 ms 275 275 128.98 us 35.47 ms 269 269 141.99 us 38.19 ms 264 264 277.73 us 73.32 ms 260 260 160.02 us 41.61 ms event_mutex 251 251 242.03 us 60.75 ms 248 248 12.47 us 3.09 ms 246 246 328.33 us 80.77 ms 245 245 189.83 us 46.51 ms 245 245 275.17 us 67.42 ms 235 235 152.49 us 35.84 ms 235 235 38.55 us 9.06 ms 228 228 137.27 us 31.30 ms 224 224 94.65 us 21.20 ms 221 221 198.13 us 43.79 ms 220 220 411.64 us 90.56 ms 214 214 291.08 us 62.29 ms 209 209 132.94 us 27.79 ms 207 207 364.20 us 75.39 ms 204 204 346.68 us 70.72 ms 194 194 169.77 us 32.94 ms 181 181 137.87 us 24.95 ms 181 181 154.78 us 28.01 ms 172 172 145.11 us 24.96 ms 169 169 124.30 us 21.01 ms 168 168 378.92 us 63.66 ms 161 161 91.64 us 14.75 ms 161 161 264.51 us 42.59 ms 153 153 85.53 us 13.09 ms 150 150 383.28 us 57.49 ms 148 148 91.24 us 13.50 ms I'll have to nose dive some more.. but for the life of me I can't see the expected xa_lock contention. Luis