On Tue, Aug 13, 2024 at 5:04 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote: > > > > On 09/07/2024 06:58, Yu Zhao wrote: > > On Mon, Jul 8, 2024 at 10:31 PM Bharata B Rao <bharata@xxxxxxx> wrote: > >> > >> On 08-Jul-24 9:47 PM, Yu Zhao wrote: > >>> On Mon, Jul 8, 2024 at 8:34 AM Bharata B Rao <bharata@xxxxxxx> wrote: > >>>> > >>>> Hi Yu Zhao, > >>>> > >>>> Thanks for your patches. See below... > >>>> > >>>> On 07-Jul-24 4:12 AM, Yu Zhao wrote: > >>>>> Hi Bharata, > >>>>> > >>>>> On Wed, Jul 3, 2024 at 9:11 AM Bharata B Rao <bharata@xxxxxxx> wrote: > >>>>>> > >>>> <snip> > >>>>>> > >>>>>> Some experiments tried > >>>>>> ====================== > >>>>>> 1) When MGLRU was enabled many soft lockups were observed, no hard > >>>>>> lockups were seen for 48 hours run. Below is once such soft lockup. > >>>>> > >>>>> This is not really an MGLRU issue -- can you please try one of the > >>>>> attached patches? It (truncate.patch) should help with or without > >>>>> MGLRU. > >>>> > >>>> With truncate.patch and default LRU scheme, a few hard lockups are seen. > >>> > >>> Thanks. > >>> > >>> In your original report, you said: > >>> > >>> Most of the times the two contended locks are lruvec and > >>> inode->i_lock spinlocks. > >>> ... > >>> Often times, the perf output at the time of the problem shows > >>> heavy contention on lruvec spin lock. Similar contention is > >>> also observed with inode i_lock (in clear_shadow_entry path) > >>> > >>> Based on this new report, does it mean the i_lock is not as contended, > >>> for the same path (truncation) you tested? If so, I'll post > >>> truncate.patch and add reported-by and tested-by you, unless you have > >>> objections. > >> > >> truncate.patch has been tested on two systems with default LRU scheme > >> and the lockup due to inode->i_lock hasn't been seen yet after 24 hours run. > > > > Thanks. > > > >>> > >>> The two paths below were contended on the LRU lock, but they already > >>> batch their operations. So I don't know what else we can do surgically > >>> to improve them. > >> > >> What has been seen with this workload is that the lruvec spinlock is > >> held for a long time from shrink_[active/inactive]_list path. In this > >> path, there is a case in isolate_lru_folios() where scanning of LRU > >> lists can become unbounded. To isolate a page from ZONE_DMA, sometimes > >> scanning/skipping of more than 150 million folios were seen. There is > >> already a comment in there which explains why nr_skipped shouldn't be > >> counted, but is there any possibility of re-looking at this condition? > > > > For this specific case, probably this can help: > > > > @@ -1659,8 +1659,15 @@ static unsigned long > > isolate_lru_folios(unsigned long nr_to_scan, > > if (folio_zonenum(folio) > sc->reclaim_idx || > > skip_cma(folio, sc)) { > > nr_skipped[folio_zonenum(folio)] += nr_pages; > > - move_to = &folios_skipped; > > - goto move; > > + list_move(&folio->lru, &folios_skipped); > > + if (spin_is_contended(&lruvec->lru_lock)) { > > + if (!list_empty(dst)) > > + break; > > + spin_unlock_irq(&lruvec->lru_lock); > > + cond_resched(); > > + spin_lock_irq(&lruvec->lru_lock); > > + } > > + continue; Nitpick: if () { ... if (!spin_is_contended(&lruvec->lru_lock)) continue; if (!list_empty(dst)) break; spin_unlock_irq(&lruvec->lru_lock); cond_resched(); spin_lock_irq(&lruvec->lru_lock); } > Hi Yu, > > We are seeing lockups and high memory pressure in Meta production due to this lock contention as well. My colleague highlighted it in https://lore.kernel.org/all/ZrssOrcJIDy8hacI@xxxxxxxxx/ and was pointed to this fix. > > We removed skip_cma check as a temporary measure, but this is a proper fix. I might have missed it but didn't see this as a patch on the mailing list. Just wanted to check if you were planning to send it as a patch? Happy to send it on your behalf as well. Please. Thank you.