On Wed, Apr 3, 2024 at 4:01 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 03.04.24 07:50, Zhaoyang Huang wrote: > > On Tue, Apr 2, 2024 at 8:58 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > >> > >> On 01.04.24 10:17, zhaoyang.huang wrote: > >>> From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx> > >>> > >>> An VM_BUG_ON in step 9 of [1] could happen as the refcnt is dropped > >>> unproperly during the procedure of read_pages()->readahead_folio->folio_put. > >>> This is introduced by commit 9fd472af84ab ("mm: improve cleanup when > >>> ->readpages doesn't process all pages")'. > >>> > >>> key steps of[1] in brief: > >>> 2'. Thread_truncate get folio to its local fbatch by find_get_entry in step 2 > >>> 7'. Last refcnt remained which is not as expect as from alloc_pages > >>> but from thread_truncate's local fbatch in step 7 > >>> 8'. Thread_reclaim succeed to isolate the folio by the wrong refcnt(not > >>> the value but meaning) in step 8 > >>> 9'. Thread_truncate hit the VM_BUG_ON in step 9 > >>> > >>> [1] > >>> Thread_readahead: > >>> 0. folio = filemap_alloc_folio(gfp_mask, 0); > >>> (refcount 1: alloc_pages) > >>> 1. ret = filemap_add_folio(mapping, folio, index + i, gfp_mask); > >>> (refcount 2: alloc_pages, page_cache) > > [not going into all details, just a high-level remark] > > page_cache_ra_unbounded() does a filemap_invalidate_lock_shared(), which > is a down_read_trylock(&mapping->invalidate_lock). > > That is, all read_pages() calls in mm/readahead.c happen under > mapping->invalidate_lock in read mode. > > ... and ... > > >>> > >>> Thread_truncate: > >>> 2. folio = find_get_entries(&fbatch_truncate); > >>> (refcount 3: alloc_pages, page_cache, fbatch_truncate)) > > truncation, such as truncate_inode_pages() must be called under > mapping->invalidate_lock held in write mode. So naive me would have > thought that readahead and truncate cannot race in that way. > > [...] > Thanks for the reminder. But I don't find the spot where holding "mapping->invalidate_lock" by check the callstack of 'kill_bdev()->truncate_inode_pages()->truncate_inode_pages_range()', or the lock is holded beyond this? > > >> > >> Something that would help here is an actual reproducer that triggersthis > >> issue. > >> > >> To me, it's unclear at this point if we are talking about an actual > >> issue or a theoretical issue? > > Thanks for feedback. Above callstack is a theoretical issue so far > > which is arised from an ongoing analysis of a practical livelock issue > > generated by folio_try_get_rcu which is related to abnormal folio > > refcnt state. So do you think this callstack makes sense? > > I'm not an expert on that code, and only spent 5 min looking into the > code. So my reasoning about invalidate_lock above might be completely wrong. > > It would be a very rare race that was not reported so far in practice. > And it certainly wouldn't be the easiest one to explain, because the > call chain above is a bit elaborate and does not explain which locks are > involved and how they fail to protect us from any such race. > > For this case in particular, I think we really need a real reproducer to > convince people that the actual issue does exist and the fix actually > resolves the issue. Sorry, it is theoretically yet according to my understanding. > > -- > Cheers, > > David / dhildenb >