On Sun, Feb 11, 2024 at 07:22:55PM -0500, Joel Fernandes wrote: > > > On 2/8/2024 9:51 AM, Uladzislau Rezki wrote: > > On Thu, Feb 08, 2024 at 01:53:58PM +0100, Uladzislau Rezki wrote: > >> On Thu, Feb 08, 2024 at 07:55:37AM +0100, Andrea Righi wrote: > >>> On Thu, Feb 08, 2024 at 12:54:58AM -0500, Joel Fernandes wrote: > >>> ... > >>>>>> Slightly related, but one of the things we are wondering also is how > >>>>>> much of the overhead for your nohz-full and lazy-RCU test (on top of > >>>>>> baseline - that is just CONFIG_HZ=1000 without nohz-full or nocbs) is > >>>>>> because of just using NOCB. Uladsizlau mentioned he might run a test > >>>>>> for comparing along those lines as well. > >>>>> > >>>>> Just to clarify, "lazy rcu on" results are just with rcu_nocb=all and > >>>>> lazy RCUs enabled (and HZ=1000), so without nohz_full. > >>>>> > >>>>> If I enable only nohz_full=all (without rcu_nocb) I see something like > >>>>> this: > >>>> > >>>> Ok. I did want to mention nohz_full implies rcu_nocb on the same CPUs as well. > >>>> > >>>> Its also mentioned in the boot param docs on the last line of the description: > >>>> > >>>> nohz_full= [KNL,BOOT,SMP,ISOL] > >>>> The argument is a cpu list, as described above. > >>>> In kernels built with CONFIG_NO_HZ_FULL=y, set > >>>> the specified list of CPUs whose tick will be stopped > >>>> whenever possible. The boot CPU will be forced outside > >>>> the range to maintain the timekeeping. Any CPUs > >>>> in this list will have their RCU callbacks offloaded, > >>>> just as if they had also been called out in the > >>>> rcu_nocbs= boot parameter. > >>> > >>> Ah I didn't realize that, it definitely makes sense, thanks for > >>> clarifying it. > >>> > >>> Then basically in the results that I posted the difference is > >>> "nohz_full=all+rcu_nocb=all" vs "rcu_nocb=all+lazy_RCU=on". > >>> > >> So, you say that a hrtimer_interrupt() handler takes more time in case > >> of lazy + nocb + rcu_nocb=all and for nohz_full + rcu_nocb=all it faster? > >> Could you please clarify this? I will try to measure from my side! > >> > >> I have done some basic research about hrtimer_interrupt() latency on my > >> HW with latest Linux kernel. I have compared below cases: > >> > >> case a: 1000HZ + lazy + nocb_all_cpus > >> case b: 1000HZ + nocb_all_cpus > >> > >> I used "ftrace" to measure time(in microseconds). Steps: > >> > >> echo 0 > tracing_on > >> echo function_graph > current_tracer > >> echo funcgraph-proc > trace_options > >> echo funcgraph-abstime > trace_options > >> echo hrtimer_interrupt > set_ftrace_filter > >> > >> fio --rw=write --bs=1M --size=1G --numjobs=8 --name=worker --time_based --runtime=50& > >> > >> echo 1 > tracing_on; sleep 10; echo 0 > tracing_on > >> > >> data is based on 10 seconds collection: > >> > >> <case a> > >> 6 2102 ############################################################ > >> 8 2079 ############################################################ > >> 10 1464 ########################################## > >> 7 897 ########################## > > So first column is microseconds and second one is count? > > >> 9 625 ################## > >> 12 490 ############## > >> 13 479 ############## > >> 11 289 ######### > >> 5 249 ######## > >> 14 124 #### > >> 15 72 ### > >> 16 41 ## > >> 17 24 # > >> 4 22 # > >> 18 12 # > >> 22 2 # > >> 19 1 # > >> <case a> > >> > >> <case b> > >> 9 1658 ############################################################ > >> 13 1308 ################################################ > >> 12 1224 ############################################# > > Assuming that, it does seem the "best" case is off by 3 microseconds (9 vs 6), > still would not warrant being regarded a bug and possibly just in the noise. > > >> 10 972 #################################### > >> 8 703 ########################## > >> 14 595 ###################### > >> 15 571 ##################### > >> 11 525 ################### > >> 17 350 ############# > >> 16 235 ######### > >> 7 214 ######## > >> 4 73 ### > >> 5 68 ### > >> 6 54 ## > >> 20 9 # > >> 18 9 # > >> 19 6 # > >> 33 1 # > >> 3 1 # > >> 28 1 # > >> 27 1 # > >> 25 1 # > >> 22 1 # > >> 21 1 # > >> <case b> > >> > >> I do not see the difference, there is a nose of 1/2/3 microseconds diff. > >> > > Let me further have a look at what we use for lazy in terms on hrtimer though. > > Thanks for tracing it. Yeah it would be nice to count how many counts of > do_nocb_deferred_wakeup() does the fio test trigger. If it is few, then maybe > the problem with hrtimer_interrupt() is something else. > urezki@pc638:~$ fio --rw=write --bs=1M --size=1G --numjobs=8 --name=worker --time_based --runtime=1000 > /dev/null 2>&1& [1] 697 urezki@pc638:~$ sudo cat /sys/kernel/debug/tracing/trace_pipe | grep Timer <idle>-0 [005] ..s1. 175.762689: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [002] ..s1. 186.002386: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [005] ..s1. 186.002391: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [001] ..s1. 228.497061: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [001] ..s1. 242.831956: rcu_nocb_wake: rcu_preempt 4 Timer fio-705 [004] ..s.. 272.009997: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [002] ..s1. 282.253447: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [004] ..s1. 347.786758: rcu_nocb_wake: rcu_preempt 4 Timer cron-714 [004] ..s1. 362.429287: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [007] .Ns1. 377.993814: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [002] ..s1. 393.353371: rcu_nocb_wake: rcu_preempt 0 Timer kworker/6:1H-172 [006] ..s1. 410.760933: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [007] ..s1. 427.148456: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [004] ..s1. 438.220173: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [007] .Ns1. 451.719831: rcu_nocb_wake: rcu_preempt 4 Timer kworker/6:1H-172 [006] ..s1. 472.199389: rcu_nocb_wake: rcu_preempt 4 Timer <...>-717 [000] ..s.. 482.434074: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [004] ..s1. 484.969012: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [007] ..s1. 495.750151: rcu_nocb_wake: rcu_preempt 4 Timer fio-702 [000] ..s1. 516.439595: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [004] ..s1. 578.212713: rcu_nocb_wake: rcu_preempt 4 Timer kworker/6:1H-172 [006] ..s1. 595.583863: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [004] ..s1. 597.634693: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [005] ..s1. 617.598664: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [004] ..s1. 655.996890: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [004] ..s1. 660.606691: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [004] ..s1. 680.059867: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [004] ..s1. 688.227533: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [004] ..s1. 698.491126: rcu_nocb_wake: rcu_preempt 4 Timer fio-703 [000] ..s1. 705.925823: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [004] ..s1. 721.018256: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [007] ..s1. 732.793797: rcu_nocb_wake: rcu_preempt 4 Timer kworker/6:1H-172 [006] ..s1. 747.129268: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [007] ..s1. 766.075583: rcu_nocb_wake: rcu_preempt 4 Timer fio-701 [004] ..s2. 767.047547: rcu_nocb_wake: rcu_preempt 4 Timer kworker/6:1H-172 [006] ..s1. 782.456032: rcu_nocb_wake: rcu_preempt 4 Timer kworker/6:1H-172 [006] ..s1. 797.303567: rcu_nocb_wake: rcu_preempt 4 Timer kworker/6:1H-172 [006] ..s1. 813.686996: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [005] ..s1. 828.534430: rcu_nocb_wake: rcu_preempt 4 Timer fio-700 [004] ..s2. 850.536698: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [007] ..s1. 860.789367: rcu_nocb_wake: rcu_preempt 4 Timer ksoftirqd/6-63 [006] ..s.. 888.948515: rcu_nocb_wake: rcu_preempt 4 Timer fio-699 [000] ..s.. 905.854912: rcu_nocb_wake: rcu_preempt 0 Timer kworker/6:1H-172 [006] ..s1. 911.987781: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [005] ..s1. 924.275337: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [004] ..s1. 938.208899: rcu_nocb_wake: rcu_preempt 4 Timer kworker/0:1H-100 [000] ..s1. 938.236053: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [004] ..s1. 948.338615: rcu_nocb_wake: rcu_preempt 4 Timer fio-702 [000] ..s2. 958.216274: rcu_nocb_wake: rcu_preempt 0 Timer <...>-728 [004] ..s.. 962.432149: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [002] ..s1. 978.545645: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [003] ..s1. 1002.608935: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [004] ..s1. 1002.614904: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [001] ..s1. 1012.848592: rcu_nocb_wake: rcu_preempt 0 Timer kworker/6:1H-172 [006] ..s1. 1021.552408: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [002] ..s1. 1042.543682: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [001] ..s1. 1068.654892: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [002] ..s1. 1080.430541: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [000] ..s1. 1098.349992: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [002] ..s1. 1115.757464: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [005] ..s1. 1170.539822: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [000] ..s1. 1178.191592: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [000] ..s1. 1188.459281: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [004] ..s1. 1196.139077: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [000] ..s1. 1216.762430: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [004] ..s1. 1238.374788: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [004] ..s1. 1241.194246: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [003] ..s1. 1251.433991: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [004] ..s1. 1261.673634: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [001] ..s1. 1261.673637: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [007] ..s1. 1271.913347: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [005] ..s1. 1308.263711: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [000] ..s1. 1308.263738: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [005] ..s1. 1308.270707: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [000] ..s1. 1326.183705: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [004] ..s1. 1326.183709: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [004] ..s1. 1326.193714: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [005] ..s1. 1336.423403: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [003] ..s1. 1336.423404: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [000] ..s1. 1368.678480: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [007] ..s1. 1398.373573: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [005] ..s1. 1428.580662: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [001] ..s1. 1428.580666: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [007] ..s1. 1468.515358: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [000] ..s1. 1488.482412: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [000] ..s1. 1548.385082: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [005] ..s1. 1548.385083: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [003] ..s1. 1572.448367: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [004] ..s1. 1572.448367: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [004] ..s1. 1572.453972: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [005] ..s1. 1582.688060: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [001] ..s1. 1582.688062: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [004] ..s1. 1592.927895: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [007] ..s1. 1608.287294: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [001] ..s1. 1608.799312: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [002] ..s1. 1621.086909: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [005] ..s1. 1631.326609: rcu_nocb_wake: rcu_preempt 4 Timer <idle>-0 [003] .Ns1. 1653.854052: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [001] ..s1. 1668.189531: rcu_nocb_wake: rcu_preempt 0 Timer <idle>-0 [007] ..s1. 1678.429219: rcu_nocb_wake: rcu_preempt 4 Timer <snip> 1000 seconds runtime of "fio" test triggers the do_nocb_deferred_wakeup_timer() around 100 times. Most of them are handled from the soft-irq context. I have looked at the perf figures vs hrtimer_interrupt(). I do not see any obvious performance issues with it. 8xCPUs KVM system; 1000HZ timer; lazy + all offloading. -- Uladzislau Rezki