It's important, because fio shows 6.9us clat for reads and 13us for writes, long time..., access to the PMEM just a few hundred ns. So I need to understand where we spend time. read: IOPS=36.7k, BW=1865MiB/s (1956MB/s)(232GiB/127291msec) clat (nsec): min=163, max=16033k, avg=6827.63, stdev=9340.91 lat (nsec): min=188, max=16033k, avg=6860.92, stdev=9341.21 write: IOPS=697k, BW=34.6GiB/s (37.2GB/s)(4408GiB/127291msec) clat (nsec): min=82, max=48028k, avg=12479.56, stdev=13459.80 lat (nsec): min=106, max=48028k, avg=12524.94, stdev=13464.29 -----Original Message----- From: Gavriliuk, Anton (HPS Ukraine) Sent: Monday, December 04, 2017 7:12 PM To: 'Jens Axboe' <axboe@xxxxxxxxx>; Robert Elliott (Persistent Memory) <elliott@xxxxxxx>; 'Rebecca Cran' <rebecca@xxxxxxxxxxxx>; 'Sitsofe Wheeler' <sitsofe@xxxxxxxxx> Cc: 'fio@xxxxxxxxxxxxxxx' <fio@xxxxxxxxxxxxxxx>; Kani, Toshimitsu <toshi.kani@xxxxxxx> Subject: RE: fio 3.2 I have run next command several times, perf record /usr/local/bin/fio --filename=/dev/dax0.0 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=mmap --bssplit=4k/4:8k/7:16k/7:32k/15:64k/65:128k/1:256k/1 --rwmixread=5 --iodepth=1 --numjobs=16 --runtime=1800 --group_reporting --name=4-rand-rw-3xx --size=290g --numa_cpu_nodes=0 each perf report shows me toppest clock_thread_fn(), Samples: 8M of event 'cycles:ppp', Event count (approx.): 23391179916235 Overhead Command Shared Object Symbol 75.27% fio fio [.] clock_thread_fn 13.68% fio libc-2.22.so [.] __memcpy_avx_unaligned 10.05% fio fio [.] fill_random_buf Samples: 8M of event 'cycles:ppp', Event count (approx.): 28889015047191 Overhead Command Shared Object Symbol 79.97% fio fio [.] clock_thread_fn 11.08% fio libc-2.22.so [.] __memcpy_avx_unaligned 8.14% fio fio [.] fill_random_buf 0.13% fio fio [.] get_io_u Samples: 8M of event 'cycles:ppp', Event count (approx.): 57473976919058 Overhead Command Shared Object Symbol 89.94% fio fio [.] clock_thread_fn 5.57% fio libc-2.22.so [.] __memcpy_avx_unaligned 4.09% fio fio [.] fill_random_buf 0.07% fio fio [.] get_io_u Samples: 8M of event 'cycles:ppp', Event count (approx.): 160828042961056 Overhead Command Shared Object Symbol ◆ 92.30% fio fio [.] clock_thread_fn ▒ 4.10% fio [kernel.kallsyms] [k] filemap_map_pages ▒ 1.99% fio libc-2.22.so [.] __memcpy_avx_unaligned ▒ 1.46% fio fio [.] fill_random_buf ▒ 0.02% fio fio [.] get_io_u ▒ 0.02% fio fio [.] io_u_sync_complete Samples: 8M of event 'cycles:ppp', Event count (approx.): 163028870650810 Overhead Command Shared Object Symbol 87.68% fio fio [.] clock_thread_fn 8.77% fio [kernel.kallsyms] [k] run_timer_softirq 1.96% fio libc-2.22.so [.] __memcpy_avx_unaligned 1.44% fio fio [.] fill_random_buf 0.02% fio fio [.] get_io_u 0.02% fio fio [.] io_u_sync_complete 89.62 │ lock cmpxchg %edi,(%rsi) ▒ │ clock_thread_fn(): ▒ │ seq = *t->seq; ▒ │ if (seq == UINT_MAX) ▒ │ break; ▒ │ __sync_synchronize(); ▒ │ tsc = get_cpu_clock(); ▒ │ } while (seq != atomic32_compare_and_swap(t->seq, seq, seq + 1)); ▒ │ cmp %eax,%ecx ▒ │ ↓ je 118 ◆ │ uint32_t seq; ▒ │ uint64_t tsc; ▒ │ ▒ │ c->cpu = t->cpu; ▒ │ do { ▒ │ seq = *t->seq; ▒ │ c9: mov 0x68(%rbx),%rcx ▒ 10.01 │ mov (%rcx),%ecx May you please clarify, is it ok ?, what is the lock contention... Anton ▒ -----Original Message----- From: Gavriliuk, Anton (HPS Ukraine) Sent: Sunday, December 03, 2017 5:24 AM To: 'Jens Axboe' <axboe@xxxxxxxxx>; Robert Elliott (Persistent Memory) <elliott@xxxxxxx>; Rebecca Cran <rebecca@xxxxxxxxxxxx>; Sitsofe Wheeler <sitsofe@xxxxxxxxx> Cc: fio@xxxxxxxxxxxxxxx; Kani, Toshimitsu <toshi.kani@xxxxxxx> Subject: RE: fio 3.2 Ok, thanks. Now I built latest fio available :-) dl560g10spmem01:~ # /usr/local/bin/fio --filename=/dev/dax0.0 --rw=randrw --norandommap --randrepeat=0 --ioengine=mmap --bssplit=4k/4:8k/7:16k/7:32k/15:64k/65:128k/1:256k/1 --rwmixread=5 --iodepth=1 --numjobs=16 --runtime=1800 --group_reporting --name=4-rand-rw-3xx --size=290g --numa_cpu_nodes=0 4-rand-rw-3xx: (g=0): rw=randrw, bs=(R) 4096B-256KiB, (W) 4096B-256KiB, (T) 4096B-256KiB, ioengine=mmap, iodepth=1 ... fio-3.2-64-gaf7fb Starting 16 processes Jobs: 16 (f=16): [m(16)][8.0%][r=2138MiB/s,w=39.7GiB/s][r=42.2k,w=799k IOPS][eta 01m:44s] From perf top, Samples: 1M of event 'cycles:ppp', Event count (approx.): 503978848613 Overhead Shared Object Symbol ◆ 93.34% libc-2.22.so [.] __memcpy_avx_unaligned Just can't drill it down, __memcpy_avx_unaligned /lib64/libc-2.22.so Huh? No selection. Report to linux-kernel@xxxxxxxxxxxxxxx While other functions can, get_io_u /usr/local/bin/fio 0.08 │ xor %eax,%esi 0.01 │ shl $0x6,%rsi 0.02 │ and $0x1c0,%esi 0.00 │ sub %rax,%rsi 0.12 │ add %rdx,%rsi 0.00 │ sub %rax,%rdx │ nop │ /* │ * Fill offset into start of cacheline, time into end │ * of cacheline │ */ │ iptr = (void *) p + (offset << 6); │ *iptr = boffset; 16.01 │ 230: mov %rax,(%rsi,%rax,1) │ │ iptr = (void *) p + 64 - 2 * sizeof(uint64_t); │ iptr[0] = io_u->start_time.tv_sec; 7.03 │ mov 0x0(%rbp),%rcx 13.52 │ mov %rcx,0x30(%rdx,%rax,1) -----Original Message----- From: Jens Axboe [mailto:axboe@xxxxxxxxx] Sent: Thursday, November 30, 2017 11:24 PM To: Gavriliuk, Anton (HPS Ukraine) <anton.gavriliuk@xxxxxxx>; Robert Elliott (Persistent Memory) <elliott@xxxxxxx>; Rebecca Cran <rebecca@xxxxxxxxxxxx>; Sitsofe Wheeler <sitsofe@xxxxxxxxx> Cc: fio@xxxxxxxxxxxxxxx; Kani, Toshimitsu <toshi.kani@xxxxxxx> Subject: Re: fio 3.2 On 11/30/2017 07:17 AM, Gavriliuk, Anton (HPS Ukraine) wrote: > It's there any chance to fix it ? > > dl560g10spmem01:/var/work # /usr/local/bin/fio --filename=/dev/dax0.0 > --rw=randrw --refill_buffers --randrepeat=0 --ioengine=mmap > --bssplit=4k/4:8k/7:16k/7:32k/15:64k/65:128k/1:256k/1 --rwmixread=5 > --iodepth=1 --numjobs=16 --runtime=1800 --group_reporting > --name=4-rand-rw-3xx --size=290g > 4-rand-rw-3xx: (g=0): rw=randrw, bs=(R) 4096B-256KiB, (W) > 4096B-256KiB, (T) 4096B-256KiB, ioengine=mmap, iodepth=1 ... > fio-2.99 > Starting 16 processes > 4-rand-rw-3xx: failed to get file size of /dev/dax0.0 > > 4-rand-rw-3xx: failed to get file size of /dev/dax0.0 > > 4-rand-rw-3xx: failed to get file size of /dev/dax0.0 > > 4-rand-rw-3xx: failed to get file size of /dev/dax0.0 > > 4-rand-rw-3xx: failed to get file size of /dev/dax0.0 > > 4-rand-rw-3xx: failed to get file size of /dev/dax0.0 > > 4-rand-rw-3xx: failed to get file size of /dev/dax0.0 Should already be fixed as of yesterday morning. -- Jens Axboe ��.n��������+%������w��{.n�������^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�