Re: [PATCH 1/1] mm/vmalloc: Combine all TLB flush operations of KASAN shadow virtual address into one operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 31, 2024 at 12:27:27AM +0800, Huang Adrian wrote:
> On Tue, Jul 30, 2024 at 7:38 PM Uladzislau Rezki <urezki@xxxxxxxxx> wrote:
> >
> > > On Mon, Jul 29, 2024 at 7:29 PM Uladzislau Rezki <urezki@xxxxxxxxx> wrote:
> > > > It would be really good if Adrian could run the "compiling workload" on
> > > > his big system and post the statistics here.
> > > >
> > > > For example:
> > > >   a) v6.11-rc1 + KASAN.
> > > >   b) v6.11-rc1 + KASAN + patch.
> > >
> > > Sure, please see the statistics below.
> > >
> > > Test Result (based on 6.11-rc1)
> > > ===============================
> > >
> > > 1. Profile purge_vmap_node()
> > >
> > >    A. Command: trace-cmd record -p function_graph -l purge_vmap_node make -j $(nproc)
> > >
> > >    B. Average execution time of purge_vmap_node():
> > >
> > >       no patch (us)           patched (us)    saved
> > >       -------------           ------------    -----
> > >                147885.02                3692.51        97%
> > >
> > >    C. Total execution time of purge_vmap_node():
> > >
> > >       no patch (us)           patched (us)    saved
> > >       -------------           ------------    -----
> > >         194173036               5114138        97%
> > >
> > >    [ftrace log] Without patch: https://gist.github.com/AdrianHuang/a5bec861f67434e1024bbf43cea85959
> > >    [ftrace log] With patch: https://gist.github.com/AdrianHuang/a200215955ee377288377425dbaa04e3
> > >
> > > 2. Use `time` utility to measure execution time
> > >
> > >    A. Command: make clean && time make -j $(nproc)
> > >
> > >    B. The following result is the average kernel execution time of five-time
> > >       measurements. ('sys' field of `time` output):
> > >
> > >       no patch (seconds)      patched (seconds)       saved
> > >       ------------------      ----------------        -----
> > >           36932.904              31403.478             15%
> > >
> > >    [`time` log] Without patch: https://gist.github.com/AdrianHuang/987b20fd0bd2bb616b3524aa6ee43112
> > >    [`time` log] With patch: https://gist.github.com/AdrianHuang/da2ea4e6aa0b4dcc207b4e40b202f694
> > >
> > I meant another statistics. As noted here https://lore.kernel.org/linux-mm/ZogS_04dP5LlRlXN@pc636/T/#m5d57f11d9f69aef5313f4efbe25415b3bae4c818
> > i came to conclusion that below place and lock:
> >
> > <snip>
> > static void exit_notify(struct task_struct *tsk, int group_dead)
> > {
> >         bool autoreap;
> >         struct task_struct *p, *n;
> >         LIST_HEAD(dead);
> >
> >         write_lock_irq(&tasklist_lock);
> > ...
> > <snip>
> >
> > keeps IRQs disabled, so it means that the purge_vmap_node() does the progress
> > but it can be slow.
> >
> > CPU_1:
> > disables IRQs
> > trying to grab the tasklist_lock
> >
> > CPU_2:
> > Sends an IPI to CPU_1
> > waits until the specified callback is executed on CPU_1
> >
> > Since CPU_1 has disabled IRQs, serving an IPI and completion of callback
> > takes time until CPU_1 enables IRQs back.
> >
> > Could you please post lock statistics for kernel compiling use case?
> > KASAN + patch is enough, IMO. This just to double check whether a
> > tasklist_lock is a problem or not.
> 
> Sorry for the misunderstanding.
> 
> Two experiments are shown as follows. I saw you think KASAN + patch is
> enough. But, in case you need another one. ;-)
> 
> a) v6.11-rc1 + KASAN
> 
> The result is different from yours, so I ran two tests (make sure the
> soft lockup warning was triggered).
> 
> Test #1: waittime-max = 5.4ms
> <snip>
> ...
> class name    con-bounces    contentions   waittime-min   waittime-max
> waittime-total   waittime-avg    acq-bounces   acquisitions
> holdtime-min   holdtime-max holdtime-total   holdtime-avg
> ...
> tasklist_lock-W:        118762         120090           0.44
> 5443.22    24807413.37         206.57         429757         569051
>        2.27        3222.00    69914505.87         122.86
> tasklist_lock-R:        108262         108300           0.41
> 5381.34    23613372.10         218.04         489132         541541
>        0.20        5543.40    10095470.68          18.64
>     ---------------
>     tasklist_lock          44594          [<0000000099d3ea35>]
> exit_notify+0x82/0x900
>     tasklist_lock          32041          [<0000000058f753d8>]
> release_task+0x104/0x3f0
>     tasklist_lock          99240          [<000000008524ff80>]
> __do_wait+0xd8/0x710
>     tasklist_lock          43435          [<00000000f6e82dcf>]
> copy_process+0x2a46/0x50f0
>     ---------------
>     tasklist_lock          98334          [<0000000099d3ea35>]
> exit_notify+0x82/0x900
>     tasklist_lock          82649          [<0000000058f753d8>]
> release_task+0x104/0x3f0
>     tasklist_lock              2          [<00000000da5a7972>]
> mm_update_next_owner+0xc0/0x430
>     tasklist_lock          26708          [<00000000f6e82dcf>]
> copy_process+0x2a46/0x50f0
> ...
> <snip>
> 
> Test #2:waittime-max = 5.7ms
> <snip>
> ...
> class name    con-bounces    contentions   waittime-min   waittime-max
> waittime-total   waittime-avg    acq-bounces   acquisitions
> holdtime-min   holdtime-max holdtime-total   holdtime-avg
> ...
> tasklist_lock-W:        121742         123167           0.43
> 5713.02    25252257.61         205.02         432111         569762
>        2.25        3083.08    70711022.74         124.11
> tasklist_lock-R:        111479         111523           0.39
> 5050.50    24557264.88         220.20         491404         542221
>        0.20        5611.81    10007782.09          18.46
>     ---------------
>     tasklist_lock         102317          [<000000008524ff80>]
> __do_wait+0xd8/0x710
>     tasklist_lock          44606          [<00000000f6e82dcf>]
> copy_process+0x2a46/0x50f0
>     tasklist_lock          45584          [<0000000099d3ea35>]
> exit_notify+0x82/0x900
>     tasklist_lock          32969          [<0000000058f753d8>]
> release_task+0x104/0x3f0
>     ---------------
>     tasklist_lock         100498          [<0000000099d3ea35>]
> exit_notify+0x82/0x900
>     tasklist_lock          27401          [<00000000f6e82dcf>]
> copy_process+0x2a46/0x50f0
>     tasklist_lock          85473          [<0000000058f753d8>]
> release_task+0x104/0x3f0
>     tasklist_lock            650          [<000000004d0b9f6b>]
> tty_open_proc_set_tty+0x23/0x210
> ...
> <snip>
> 
> 
> b) v6.11-rc1 + KASAN + patch: waittime-max = 5.7ms
> <snip>
> ...
> class name    con-bounces    contentions   waittime-min   waittime-max
> waittime-total   waittime-avg    acq-bounces   acquisitions
> holdtime-min   holdtime-max holdtime-total   holdtime-avg
> ...
> tasklist_lock-W:        108876         110087           0.33
> 5688.64    18622460.43         169.16         426740         568715
>        1.94        2930.76    62560515.48         110.00
> tasklist_lock-R:         99864          99909           0.43
> 5868.69    17849478.20         178.66         487654         541328
>        0.20        5709.98     9207504.90          17.01
>     ---------------
>     tasklist_lock          91655          [<00000000a622e532>]
> __do_wait+0xd8/0x710
>     tasklist_lock          41100          [<00000000ccf53925>]
> exit_notify+0x82/0x900
>     tasklist_lock           8254          [<00000000093ccded>]
> tty_open_proc_set_tty+0x23/0x210
>     tasklist_lock          39542          [<00000000a0e6bf4d>]
> copy_process+0x2a46/0x50f0
>     ---------------
>     tasklist_lock          90525          [<00000000ccf53925>]
> exit_notify+0x82/0x900
>     tasklist_lock          76934          [<00000000cb7ca00c>]
> release_task+0x104/0x3f0
>     tasklist_lock          23723          [<00000000a0e6bf4d>]
> copy_process+0x2a46/0x50f0
>     tasklist_lock          18223          [<00000000a622e532>]
> __do_wait+0xd8/0x710
> ...
> <snip>
>
Thank you for posting this! So tasklist_lock is not a problem.
I assume you have a full output of lock_stat. Could you please
paste it for v6.11-rc1 + KASAN?

Thank you!

--
Uladzislau Rezki




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux