On Mon, Jul 8, 2024 at 10:31 PM Bharata B Rao <bharata@xxxxxxx> wrote: > > On 08-Jul-24 9:47 PM, Yu Zhao wrote: > > On Mon, Jul 8, 2024 at 8:34 AM Bharata B Rao <bharata@xxxxxxx> wrote: > >> > >> Hi Yu Zhao, > >> > >> Thanks for your patches. See below... > >> > >> On 07-Jul-24 4:12 AM, Yu Zhao wrote: > >>> Hi Bharata, > >>> > >>> On Wed, Jul 3, 2024 at 9:11 AM Bharata B Rao <bharata@xxxxxxx> wrote: > >>>> > >> <snip> > >>>> > >>>> Some experiments tried > >>>> ====================== > >>>> 1) When MGLRU was enabled many soft lockups were observed, no hard > >>>> lockups were seen for 48 hours run. Below is once such soft lockup. > >>> > >>> This is not really an MGLRU issue -- can you please try one of the > >>> attached patches? It (truncate.patch) should help with or without > >>> MGLRU. > >> > >> With truncate.patch and default LRU scheme, a few hard lockups are seen. > > > > Thanks. > > > > In your original report, you said: > > > > Most of the times the two contended locks are lruvec and > > inode->i_lock spinlocks. > > ... > > Often times, the perf output at the time of the problem shows > > heavy contention on lruvec spin lock. Similar contention is > > also observed with inode i_lock (in clear_shadow_entry path) > > > > Based on this new report, does it mean the i_lock is not as contended, > > for the same path (truncation) you tested? If so, I'll post > > truncate.patch and add reported-by and tested-by you, unless you have > > objections. > > truncate.patch has been tested on two systems with default LRU scheme > and the lockup due to inode->i_lock hasn't been seen yet after 24 hours run. Thanks. > > > > The two paths below were contended on the LRU lock, but they already > > batch their operations. So I don't know what else we can do surgically > > to improve them. > > What has been seen with this workload is that the lruvec spinlock is > held for a long time from shrink_[active/inactive]_list path. In this > path, there is a case in isolate_lru_folios() where scanning of LRU > lists can become unbounded. To isolate a page from ZONE_DMA, sometimes > scanning/skipping of more than 150 million folios were seen. There is > already a comment in there which explains why nr_skipped shouldn't be > counted, but is there any possibility of re-looking at this condition? For this specific case, probably this can help: @@ -1659,8 +1659,15 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan, if (folio_zonenum(folio) > sc->reclaim_idx || skip_cma(folio, sc)) { nr_skipped[folio_zonenum(folio)] += nr_pages; - move_to = &folios_skipped; - goto move; + list_move(&folio->lru, &folios_skipped); + if (spin_is_contended(&lruvec->lru_lock)) { + if (!list_empty(dst)) + break; + spin_unlock_irq(&lruvec->lru_lock); + cond_resched(); + spin_lock_irq(&lruvec->lru_lock); + } + continue; }