Re: [PATCH] mm: ksm: fix data-race in __ksm_enter / run_store

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Matthew,

I don't believe execution of unmerge_and_remove_all_rmap_items() after an mm is misplaced is guaranteed. 

Consider the following interleaving:
Thread A executes __ksm_enter with KSM_RUN_MERGE set through the check on https://elixir.bootlin.com/linux/v5.18-rc5/source/mm/ksm.c#L2501
Thread B executes run_store and sets KSM_RUN_UNMERGE and then also executes unmerge_and_remove_all_rmap_items() on https://elixir.bootlin.com/linux/v5.18-rc5/source/mm/ksm.c#L2900
Thread A completes __ksm_enter and misplaces the mm behind the scanning cursor since it is still on the KSM_RUN_MERGE path on https://elixir.bootlin.com/linux/v5.18-rc5/source/mm/ksm.c#L2504

I also noticed through manual inspection another check that appears racy of the KSM_RUN_UNMERGE flag on https://elixir.bootlin.com/linux/v5.18-rc5/source/mm/ksm.c#L2563

Best,

Gabe



On Tue, Aug 2, 2022 at 11:45 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
On Tue, Aug 02, 2022 at 11:15:50PM +0800, Kefeng Wang wrote:
> The ksm_run is alread protected by ksm_thread_mutex in run_store, we
> could add this lock in __ksm_enter() to avoid the above issue.

I don't think this is a great fix.  Why not protect the store with
ksm_mmlist_lock?  ie:

        mutex_lock(&ksm_thread_mutex);
        wait_while_offlining();
        if (ksm_run != flags) {
+               spin_lock(&ksm_mmlist_lock);
                ksm_run = flags;
+               spin_unlock(&ksm_mmlist_lock);
                if (flags & KSM_RUN_UNMERGE) {
                        set_current_oom_origin();
                        err = unmerge_and_remove_all_rmap_items();
                        clear_current_oom_origin();
                        if (err) {
+                               spin_lock(&ksm_mmlist_lock);
                                ksm_run = KSM_RUN_STOP;
+                               spin_unlock(&ksm_mmlist_lock);
...

(I also don't think this is a real bug, because the call to
unmerge_and_remove_all_rmap_items() will "cure" the misplacement of
items in the list, but there's value in shutting up the tools, I suppose)

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux