On Mon, Mar 30, 2020 at 09:44:08PM -0700, Sean Christopherson wrote: > On Mon, Mar 30, 2020 at 08:35:29PM -0700, Mike Kravetz wrote: > > On 3/28/20 3:10 PM, akpm@xxxxxxxxxxxxxxxxxxxx wrote: > > > The patch titled > > > Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset > > > has been added to the -mm tree. Its filename is > > > mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch > > > > > > This patch should soon appear at > > > http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch > > > and later at > > > http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch > > > > > > Before you just go and hit "reply", please: > > > a) Consider who else should be cc'ed > > > b) Prefer to cc a suitable mailing list as well > > > c) Ideally: find the original patch on the mailing list and do a > > > reply-to-all to that, adding suitable additional cc's > > > > > > *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** > > > > > > The -mm tree is included into linux-next and is updated > > > there every 3-4 working days > > > > > > From: Longpeng <longpeng2@xxxxxxxxxx> > > > Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset > > > > This patch is what caused the BUG reported on i386 non-PAE kernel here: > > > > https://lore.kernel.org/linux-mm/CA+G9fYsJgZhhWLMzUxu_ZQ+THdCcJmFbHQ2ETA_YPP8M6yxOYA@xxxxxxxxxxxxxx/ > > > > As a clue, when building in this environment I get: > > > > CC mm/hugetlb.o > > mm/hugetlb.c: In function ‘huge_pte_offset’: > > cc1: warning: function may return address of local variable [-Wreturn-local-addr] > > mm/hugetlb.c:5361:14: note: declared here > > pud_t *pud, pud_entry; > > ^~~~~~~~~ > > cc1: warning: function may return address of local variable [-Wreturn-local-addr] > > mm/hugetlb.c:5361:14: note: declared here > > cc1: warning: function may return address of local variable [-Wreturn-local-addr] > > mm/hugetlb.c:5360:14: note: declared here > > p4d_t *p4d, p4d_entry; > > ^~~~~~~~~ Yes, this is certainly very bad. > Non-PAE uses ModeB / PSE paging, which only has 2-level page tables. The > non-existent levels get folded in and pmd_offset/pud_offset() return the > passed in pointer instead of accessing a table, e.g.: > > static inline pmd_t * pmd_offset(pud_t * pud, unsigned long address) > { > return (pmd_t *)pud; > } > The bug probably only manifests with PSE paging because it can have huge > pages in the top-level table, i.e. is the only mode that can get a false > positive. > This is arguably a bug in pmd_huge/pud_hug(), seems like they should > unconditionally return false if the relevant level doesn't exist. The issue is that to get the READ_ONCE semantic for a lockless flow this hackily defeats the de-reference inside the pXX_offset by passing in a pointer to a stack variable. This is fine unless you actually care about the *address* of the result of pXX_offset, which huge_pte_offset() does. I can't think of an easy fix here. Andrew, I think this patch has to be dropped :( Longpeng can fix the direct bug he saw by not changing the pXX_offset(), but this extra de-reference will remain some theortical/rare bug according to the memory model. Maybe we need to change pXX_offset to take in the pointer and the de'refd value? Jason