Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> writes: > On Fri, 18 Nov 2022 21:38:50 +0800 Chen Wandun <chenwandun@xxxxxxxxxx> wrote: > >> A soft lockup occur in scan free swap slot by constructing >> huge memory pressure. >> The test scenario is: 64 CPU cores, 64GB memory, and 28 >> zram devices, the disksize of each zram device is 50MB. >> >> LATENCY_LIMIT is used to prevent soft lockup in function >> scan_swap_map_slots, but the real loop number would more >> than LATENCY_LIMIT because of "goto checks and goto scan" >> repeatly without decrease of latency limit. >> >> In order to fix it, move decrease latency_ration code in advance. >> >> There is also a suspicious place that will cause soft lockup in >> function get_swap_pages, in this function, the "goto start_over" >> may result in continuous scanning of swap partition, if there is >> no cond_sched in scan_swap_map_slots, it would cause soft lockup >> (I am not sure about this). >> >> ... >> > > Looks sensible. Yes. LGTM. Reviewed-by: "Huang, Ying" <ying.huang@xxxxxxxxx> >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -972,23 +972,23 @@ static int scan_swap_map_slots(struct swap_info_struct *si, >> scan: >> spin_unlock(&si->lock); >> while (++offset <= READ_ONCE(si->highest_bit)) { >> - if (swap_offset_available_and_locked(si, offset)) >> - goto checks; >> if (unlikely(--latency_ration < 0)) { >> cond_resched(); >> latency_ration = LATENCY_LIMIT; >> scanned_many = true; >> } >> + if (swap_offset_available_and_locked(si, offset)) >> + goto checks; >> } >> offset = si->lowest_bit; >> while (offset < scan_base) { >> - if (swap_offset_available_and_locked(si, offset)) >> - goto checks; >> if (unlikely(--latency_ration < 0)) { >> cond_resched(); >> latency_ration = LATENCY_LIMIT; >> scanned_many = true; >> } >> + if (swap_offset_available_and_locked(si, offset)) >> + goto checks; >> offset++; >> } >> spin_lock(&si->lock); > > But this does somewhat alter the `scanned_many' logic. We'll now set > 'scanned_many` earlier. What are the effects of this? > > The ed43af10975eef7e changelog outlines tests which could be performed > to ensure we aren't regressing from this. Per my understanding, this will not influence `scanned_many` logic much. Because `scanned_many` flag will be set just a little earlier (one less slot). Best Regards, Huang, Ying