On Thu, 23 Mar 2023, Song Liu wrote: > On Thu, Mar 23, 2023 at 2:56 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > > > On Thu, Mar 23, 2023 at 12:07:46PM -0700, Hugh Dickins wrote: > > > On an earlier audit, for different reasons, I did also run across > > > lib/buildid.c build_id_parse() using find_get_page() without checking > > > PageUptodate() - looks as if it might do the wrong thing if it races > > > with khugepaged collapsing text to huge, and should probably have a > > > similar fix. > > > > That shouldn't be using find_get_page(). It should probably use > > read_cache_folio() which will actually read in the data if it's not > > present in the page cache, and return an ERR_PTR if the data couldn't > > be read. > > build_id_parse() can be called from NMI, so I don't think we can let > read_cache_folio() read-in the data. Interesting. This being the same Layering_Violation_ID which is asking for a home in everyone's struct file? (Okay, I'm being disagreeable, no need to answer!) I think even the current find_get_page() is unsafe from NMI: imagine that NMI interrupting a sequence (maybe THP collapse or splitting, maybe page migration, maybe others) when the page refcount has been frozen to 0: you'll just have to reboot the machine? I guess the RCU-safety of find_get_page() implies that its XArray basics are safe in NMI; but you need a low-level variant (xas_find()?) which does none of the "goto retry"s, and fails immediately if anything is wrong - including !PageUptodate. Hugh