On Wed, Jan 10, 2024 at 1:22 AM Uladzislau Rezki <urezki@xxxxxxxxx> wrote: > > Hello, Kalesh! > > > > > Hi Uladzislau, > > > > I've tried your patches (v3) on Android with 6.1.43 kernel. > > > > The test cycles 10 apps (including camera) sequentially for 100 > > iterations. > > > > I've set rcu_normal to override the rcu_expedited in the boot > > parameters: > > > > adb shell cat /proc/cmdline | tr ' ' '\n' | grep rcu > > > > rcupdate.rcu_normal=1 > > rcupdate.rcu_expedited=1 > > rcu_nocbs=0-7 > > > > > > The configurations are: > > > > A - echo 0 >/sys/module/rcutree/parameters/rcu_normal_wake_from_gp > > B - echo 1 >/sys/module/rcutree/parameters/rcu_normal_wake_from_gp > > > > Results: > > > > = APP LAUNCH TIME = > > delta (B-A) ratio(%) > > overall_app_launch_time(ms) -11399.00 -6.65 > > > > > > == camera_launch_time > > type delta(B-A %) A_count B_count > > HOT -7.05 99 99 > > COLD -6.33 1 1 > > > > Hi Uladzislau, > If i interpret it correctly you also see that this series reduces > a launch time by 6/7% on your app set. Is that correct? Yes your understanding is correct. > > > === Function Latencies === > > > > Tracing synchronize_rcu_expedited. Hit Ctrl-C to exit Tracing synchronize_rcu_expedited. Hit Ctrl-C to exit > > > > nsec : count distribution nsec : count distribution > > 0 -> 1 : 0 | | 0 -> 1 : 0 | | > > 2 -> 3 : 0 | | 2 -> 3 : 0 | | > > 4 -> 7 : 0 | | 4 -> 7 : 0 | | > > 8 -> 15 : 0 | | 8 -> 15 : 0 | | > > 16 -> 31 : 0 | | 16 -> 31 : 0 | | > > 32 -> 63 : 0 | | 32 -> 63 : 0 | | > > 64 -> 127 : 0 | | 64 -> 127 : 0 | | > > 128 -> 255 : 0 | | 128 -> 255 : 0 | | > > 256 -> 511 : 0 | | 256 -> 511 : 0 | | > > 512 -> 1023 : 0 | | 512 -> 1023 : 0 | | > > 1024 -> 2047 : 0 | | 1024 -> 2047 : 0 | | > > 2048 -> 4095 : 0 | | 2048 -> 4095 : 0 | | > > 4096 -> 8191 : 0 | | 4096 -> 8191 : 0 | | > > 8192 -> 16383 : 0 | | 8192 -> 16383 : 0 | | > > 16384 -> 32767 : 0 | | 16384 -> 32767 : 0 | | > > 32768 -> 65535 : 0 | | 32768 -> 65535 : 0 | | > > 65536 -> 131071 : 0 | | 65536 -> 131071 : 0 | | > > 131072 -> 262143 : 0 | | 131072 -> 262143 : 0 | | > > 262144 -> 524287 : 0 | | 262144 -> 524287 : 0 | | > > 524288 -> 1048575 : 0 | | 524288 -> 1048575 : 0 | | > > 1048576 -> 2097151 : 0 | | 1048576 -> 2097151 : 0 | | > > 2097152 -> 4194303 : 0 | | 2097152 -> 4194303 : 0 | | > > 4194304 -> 8388607 : 871 |** | 4194304 -> 8388607 : 1180 |**** | > > 8388608 -> 16777215 : 3204 |******** | 8388608 -> 16777215 : 7020 |************************* | > > 16777216 -> 33554431 : 15013 |****************************************| 16777216 -> 33554431 : 10952 |****************************************| > > Exiting trace of synchronize_rcu_expedited Exiting trace of synchronize_rcu_expedited > > > > > > Tracing synchronize_rcu. Hit Ctrl-C to exit Tracing synchronize_rcu. Hit Ctrl-C to exit > > > > nsec : count distribution nsec : count distribution > > 0 -> 1 : 0 | | 0 -> 1 : 0 | | > > 2 -> 3 : 0 | | 2 -> 3 : 0 | | > > 4 -> 7 : 0 | | 4 -> 7 : 0 | | > > 8 -> 15 : 0 | | 8 -> 15 : 0 | | > > 16 -> 31 : 0 | | 16 -> 31 : 0 | | > > 32 -> 63 : 0 | | 32 -> 63 : 0 | | > > 64 -> 127 : 0 | | 64 -> 127 : 0 | | > > 128 -> 255 : 0 | | 128 -> 255 : 0 | | > > 256 -> 511 : 0 | | 256 -> 511 : 0 | | > > 512 -> 1023 : 0 | | 512 -> 1023 : 0 | | > > 1024 -> 2047 : 0 | | 1024 -> 2047 : 0 | | > > 2048 -> 4095 : 0 | | 2048 -> 4095 : 0 | | > > 4096 -> 8191 : 0 | | 4096 -> 8191 : 0 | | > > 8192 -> 16383 : 0 | | 8192 -> 16383 : 0 | | > > 16384 -> 32767 : 0 | | 16384 -> 32767 : 0 | | > > 32768 -> 65535 : 0 | | 32768 -> 65535 : 0 | | > > 65536 -> 131071 : 0 | | 65536 -> 131071 : 0 | | > > 131072 -> 262143 : 0 | | 131072 -> 262143 : 0 | | > > 262144 -> 524287 : 0 | | 262144 -> 524287 : 0 | | > > 524288 -> 1048575 : 0 | | 524288 -> 1048575 : 0 | | > > 1048576 -> 2097151 : 0 | | 1048576 -> 2097151 : 0 | | > > 2097152 -> 4194303 : 0 | | 2097152 -> 4194303 : 0 | | > > 4194304 -> 8388607 : 861 |** | 4194304 -> 8388607 : 1136 |**** | > > 8388608 -> 16777215 : 3111 |******** | 8388608 -> 16777215 : 6320 |************************ | > > 16777216 -> 33554431 : 13901 |****************************************| 16777216 -> 33554431 : 10484 |****************************************| > > Exiting trace of synchronize_rcu Exiting trace of synchronize_rcu > > > Who is B and who is A? Left is A (rcu_normal_wake_from_gp=0) and right is B (rcu_normal_wake_from_gp=1) > > > > > Interestingly I tried the same experiment without rcu_normal=1 (leaving rcu_expedited=1): > > > > adb shell cat /proc/cmdline | tr ' ' '\n' | grep rcu > > rcupdate.rcu_expedited=1 > > rcu_nocbs=0-7 > > > > In this case I also saw the -6 to -7% decrease in the app launch times > > but I don't have a good explanation why that would be? (The fucntion > > latency histograms in this case didn't show any significant difference). > > Do you have any insight why this may happen? > > > When rcu_expedited=1 is set and rcu_normal=0 is disabled. The > synchronize_rcu() call is converted into synchronize_rcu_expidited(): > > <snip> > void synchronize_rcu(void) > { > unsigned long flags; > struct rcu_node *rnp; > > RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) || > lock_is_held(&rcu_lock_map) || > lock_is_held(&rcu_sched_lock_map), > "Illegal synchronize_rcu() in RCU read-side critical section"); > if (!rcu_blocking_is_gp()) { > if (rcu_gp_is_expedited()) > synchronize_rcu_expedited(); > else > synchronize_rcu_normal(); > return; > } > ... > <snip> > > rcu_gp_is_expidited() is true, so invoke "expedited" version. > > I see some concerns in preferring an expedited version as a global > replacement. First of all it is related to latency sensitive workloads > because in order to expedite a grace period it sends out IPIs on all > online CPUs to force them to report a quiescent-state asap. I have not > investigated yet how it affects such workloads. > > Therefore, in your case, you also see a performance boost of your app sets. IIUC the patch shouldn't affect the case? The only difference in A vs B is rcu_normal_wake_from_gp (both have rcu_expedited=1). Thanks, Kalesh > > Thank you for looking at it! > > -- > Uladzislau Rezki