On Wed, Feb 1, 2023 at 12:07 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Mon 30-01-23 11:30:47, Yang Shi wrote: > > On Mon, Jan 30, 2023 at 4:20 AM Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote: > > > > > > > > > > > > On 2023/1/30 16:48, Michal Hocko wrote: > > > > On Mon 30-01-23 09:16:13, Kefeng Wang wrote: > > > >> > > > >> > > > >> On 2023/1/30 5:48, Andrew Morton wrote: > > > >>> On Sun, 29 Jan 2023 10:44:51 +0800 Kefeng Wang <wangkefeng.wang@xxxxxxxxxx> wrote: > > > >>> > > > >>>> As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"), > > > >>> > > > >>> Merged in 2017. > > > >>> > > > >>>> hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg > > > >>>> could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could > > > >>>> occurs a NULL pointer dereference, let's do not record the foreign > > > >>>> writebacks for folio memcg is null in mem_cgroup_track_foreign() to > > > >>>> fix it. > > > >>>> > > > >>>> Reported-by: Ma Wupeng <mawupeng1@xxxxxxxxxx> > > > >>>> Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing") > > > >>> > > > >>> Merged in 2019. > > > >>> > > > ... > > > > > > > > Just to make sure I understand. The page has been hwpoisoned, uncharged > > > > but stayed in the page cache so a next page fault on the address has blowned > > > > up? > > > > > > > > Say we address the NULL memcg case. What is the resulting behavior? > > > > Doesn't userspace access a poisoned page and get a silend memory > > > > corruption? > > > > > > + Yang Shi > > > > > > Check previous link[1], seems that it is a known issue, and there is a > > > TODO list for storage backed filesystems from Yang. > > > > For tmpfs and hugetlbfs, the page cache still stay in page cache, the > > later page fault will handle the case gracefully. Other real storage > > backed filesystem will have page cache truncated. > > > > The page cache will be uncharged before truncate. If the truncate > > fails, we may end up in this case. > > This would be a good addendum to the changelog. What would be a typical > failure in the truncation path? For memory failure path, there may be a couple of cases, for example, page is not for a regular file (maybe directory), fail to release buffers, etc. > > > > > > > > > > [1] > > > https://lore.kernel.org/all/20211020210755.23964-6-shy828301@xxxxxxxxx/T/#m1d40559ca2dcf94396df5369214288f69dec379b > > -- > Michal Hocko > SUSE Labs