On Tue, May 4, 2021 at 3:04 PM Gioh Kim <gi-oh.kim@xxxxxxxxx> wrote: > > On Thu, Apr 29, 2021 at 9:14 AM Gioh Kim <gi-oh.kim@xxxxxxxxx> wrote: > > > > On Wed, Apr 28, 2021 at 8:33 PM Chaitanya Kulkarni > > <Chaitanya.Kulkarni@xxxxxxx> wrote: > > > > > > On 4/27/21 23:14, Gioh Kim wrote: > > > > The IO performance test with fio after removing the likely and > > > > unlikely macros in all if-statement shows no performance drop. > > > > They do not help for the performance of rnbd. > > > > > > > > The fio test did random read on 32 rnbd devices and 64 processes. > > > > Test environment: > > > > - AMD Opteron(tm) Processor 6386 SE > > > > - 125G memory > > > > - kernel version: 5.4.86 > > > > > > why 5.4 and not linux-block/for-next ? > > > > We have done porting only 5.4 for the server machine yet. > > > > > > > > > - gcc version: gcc (Debian 8.3.0-6) 8.3.0 > > > > - Infiniband controller: InfiniBand: Mellanox Technologies MT26428 > > > > [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0) > > > > > > > > before > > > > read: IOPS=549k, BW=2146MiB/s > > > > read: IOPS=544k, BW=2125MiB/s > > > > read: IOPS=553k, BW=2158MiB/s > > > > read: IOPS=535k, BW=2089MiB/s > > > > read: IOPS=543k, BW=2122MiB/s > > > > read: IOPS=552k, BW=2154MiB/s > > > > average: IOPS=546k, BW=2132MiB/s > > > > > > > > after > > > > read: IOPS=556k, BW=2172MiB/s > > > > read: IOPS=561k, BW=2191MiB/s > > > > read: IOPS=552k, BW=2156MiB/s > > > > read: IOPS=551k, BW=2154MiB/s > > > > read: IOPS=562k, BW=2194MiB/s > > > > ----------- > > > > average: IOPS=556k, BW=2173MiB/s > > > > > > > > The IOPS and bandwidth got better slightly after removing > > > > likely/unlikely. (IOPS= +1.8% BW= +1.9%) But we cannot make sure > > > > that removing the likely/unlikely help the performance because it > > > > depends on various situations. We only make sure that removing the > > > > likely/unlikely does not drop the performance. > > > > > > Did you get a chance to collect perf numbers to see which functions are > > > getting faster ? > > Hi Chaitanya, > > I ran the perf tool to find out which functions are getting faster. > But I was not able to find it. > Could you please suggest a tool or anything to check it out? > > For your information, below is what I got with 'perf record fio > <options:8-device, 64-job, 60-second>' > The result before/after removing likely/unlikely looks the same. > > 4.15% fio [kernel.kallsyms] [k] _raw_spin_lock_irqsave > 3.19% fio [kernel.kallsyms] [k] x86_pmu_disable_all > 2.98% fio [rnbd_client] [k] rnbd_put_permit > 2.77% fio [kernel.kallsyms] [k] find_first_zero_bit > 2.49% fio [kernel.kallsyms] [k] __x86_indirect_thunk_rax > 2.21% fio [kernel.kallsyms] [k] psi_task_change > 2.00% fio [kernel.kallsyms] [k] gup_pgd_range > 1.83% fio fio [.] 0x0000000000029048 > 1.78% fio [rnbd_client] [k] rnbd_get_permit > 1.78% fio fio [.] axmap_isset > 1.63% fio [kernel.kallsyms] [k] _raw_spin_lock > 1.58% fio fio [.] fio_gettime > 1.53% fio [rtrs_client] [k] __rtrs_get_permit > 1.51% fio [rnbd_client] [k] rnbd_queue_rq > 1.51% fio [rtrs_client] [k] rtrs_clt_put_permit > 1.47% fio [kernel.kallsyms] [k] try_to_wake_up > 1.31% fio [kernel.kallsyms] [k] kmem_cache_alloc > 1.22% fio libc-2.28.so [.] 0x00000000000a2547 > 1.17% fio [mlx4_ib] [k] _mlx4_ib_post_send > 1.14% fio [kernel.kallsyms] [k] blkdev_direct_IO > 1.14% fio [kernel.kallsyms] [k] read_tsc > 1.02% fio [rtrs_client] [k] rtrs_clt_read_req > 0.92% fio [rtrs_client] [k] get_next_path_min_inflight > 0.92% fio [kernel.kallsyms] [k] sched_clock > 0.91% fio [kernel.kallsyms] [k] blk_mq_get_request > 0.87% fio [kernel.kallsyms] [k] x86_pmu_enable_all > 0.87% fio [kernel.kallsyms] [k] __sched_text_start > 0.84% fio [kernel.kallsyms] [k] insert_work > 0.82% fio [kernel.kallsyms] [k] copy_user_generic_string > 0.80% fio [kernel.kallsyms] [k] blk_attempt_plug_merge > 0.73% fio [rtrs_client] [k] rtrs_clt_update_all_stats > Hi Chaitanya, I think likely/unlikely macros are related to cache and branch prediction. So I checked cache and branch misses with perf tool. The result are same before/after removing likely/unlikely - cache misses: after 5,452%, before 5,443% - branch misses: after 2.08%, before 2.09% I would appreciate it if you would suggest something else for me to check. Below is the raw data that I got from the perf tool. after removing likely: Performance counter stats for 'fio --direct=1 --rw=randread --time_based=1 --group_reporting --ioengine=libaio --iodepth=128 --name=fiotest --fadvise_hint=0 --iodepth_batch_submit=128 --iodepth_batch_complete=128 --invalidate=0 --runtime=180 --numjobs=64 --filename=/dev/rnbd0 --filename=/dev/rnbd1 --filename=/dev/rnbd2 --filename=/dev/rnbd3 --filename=/dev/rnbd4 --filename=/dev/rnbd5 --filename=/dev/rnbd6 --filename=/dev/rnbd7 --filename=/dev/rnbd8 --filename=/dev/rnbd9 --filename=/dev/rnbd10 --filename=/dev/rnbd11 --filename=/dev/rnbd12 --filename=/dev/rnbd13 --filename=/dev/rnbd14 --filename=/dev/rnbd15 --filename=/dev/rnbd16 --filename=/dev/rnbd17 --filename=/dev/rnbd18 --filename=/dev/rnbd19 --filename=/dev/rnbd20 --filename=/dev/rnbd21 --filename=/dev/rnbd22 --filename=/dev/rnbd23 --filename=/dev/rnbd24 --filename=/dev/rnbd25 --filename=/dev/rnbd26 --filename=/dev/rnbd27 --filename=/dev/rnbd28 --filename=/dev/rnbd29 --filename=/dev/rnbd30 --filename=/dev/rnbd31': 1.834.487,82 msec task-clock # 9,986 CPUs utilized 3.128.339.845.336 cycles # 1,705 GHz (66,53%) 1.110.316.024.909 instructions # 0,35 insn per cycle (83,27%) 76.626.760.535 cache-references # 41,770 M/sec (83,26%) 4.177.366.104 cache-misses # 5,452 % of all cache refs (50,21%) 224.055.600.184 branches # 122,135 M/sec (66,85%) 4.669.404.288 branch-misses # 2,08% of all branches (83,38%) 183,707988693 seconds time elapsed 185,630125000 seconds user 1590,286666000 seconds sys before removing: Performance counter stats for 'fio --direct=1 --rw=randread --time_based=1 --group_reporting --ioengine=libaio --iodepth=128 --name=fiotest --fadvise_hint=0 --iodepth_batch_submit=128 --iodepth_batch_complete=128 --invalidate=0 --runtime=180 --numjobs=64 --filename=/dev/rnbd0 --filename=/dev/rnbd1 --filename=/dev/rnbd2 --filename=/dev/rnbd3 --filename=/dev/rnbd4 --filename=/dev/rnbd5 --filename=/dev/rnbd6 --filename=/dev/rnbd7 --filename=/dev/rnbd8 --filename=/dev/rnbd9 --filename=/dev/rnbd10 --filename=/dev/rnbd11 --filename=/dev/rnbd12 --filename=/dev/rnbd13 --filename=/dev/rnbd14 --filename=/dev/rnbd15 --filename=/dev/rnbd16 --filename=/dev/rnbd17 --filename=/dev/rnbd18 --filename=/dev/rnbd19 --filename=/dev/rnbd20 --filename=/dev/rnbd21 --filename=/dev/rnbd22 --filename=/dev/rnbd23 --filename=/dev/rnbd24 --filename=/dev/rnbd25 --filename=/dev/rnbd26 --filename=/dev/rnbd27 --filename=/dev/rnbd28 --filename=/dev/rnbd29 --filename=/dev/rnbd30 --filename=/dev/rnbd31': 1.841.874,78 msec task-clock # 10,039 CPUs utilized 3.157.131.978.349 cycles # 1,714 GHz (66,48%) 1.115.369.402.018 instructions # 0,35 insn per cycle (83,27%) 77.060.091.803 cache-references # 41,838 M/sec (83,39%) 4.194.110.754 cache-misses # 5,443 % of all cache refs (50,13%) 225.304.135.864 branches # 122,323 M/sec (66,83%) 4.716.162.562 branch-misses # 2,09% of all branches (83,42%) 183,476417386 seconds time elapsed 185,356439000 seconds user 1596,787284000 seconds sys > > > > > I knew somebody would ask for it ;-) > > No, I didn't because I have been occupied with another task. > > But I will check it soon in a few weeks. > > > > Thank you for the review. > > > > > > > >