On Fri, 18 Nov 2022 21:38:50 +0800 Chen Wandun <chenwandun@xxxxxxxxxx> wrote: > A soft lockup occur in scan free swap slot by constructing > huge memory pressure. > The test scenario is: 64 CPU cores, 64GB memory, and 28 > zram devices, the disksize of each zram device is 50MB. > > LATENCY_LIMIT is used to prevent soft lockup in function > scan_swap_map_slots, but the real loop number would more > than LATENCY_LIMIT because of "goto checks and goto scan" > repeatly without decrease of latency limit. > > In order to fix it, move decrease latency_ration code in advance. > > There is also a suspicious place that will cause soft lockup in > function get_swap_pages, in this function, the "goto start_over" > may result in continuous scanning of swap partition, if there is > no cond_sched in scan_swap_map_slots, it would cause soft lockup > (I am not sure about this). > > ... > Looks sensible. > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -972,23 +972,23 @@ static int scan_swap_map_slots(struct swap_info_struct *si, > scan: > spin_unlock(&si->lock); > while (++offset <= READ_ONCE(si->highest_bit)) { > - if (swap_offset_available_and_locked(si, offset)) > - goto checks; > if (unlikely(--latency_ration < 0)) { > cond_resched(); > latency_ration = LATENCY_LIMIT; > scanned_many = true; > } > + if (swap_offset_available_and_locked(si, offset)) > + goto checks; > } > offset = si->lowest_bit; > while (offset < scan_base) { > - if (swap_offset_available_and_locked(si, offset)) > - goto checks; > if (unlikely(--latency_ration < 0)) { > cond_resched(); > latency_ration = LATENCY_LIMIT; > scanned_many = true; > } > + if (swap_offset_available_and_locked(si, offset)) > + goto checks; > offset++; > } > spin_lock(&si->lock); But this does somewhat alter the `scanned_many' logic. We'll now set 'scanned_many` earlier. What are the effects of this? The ed43af10975eef7e changelog outlines tests which could be performed to ensure we aren't regressing from this.