On Tue, Oct 27, 2020 at 10:38:16AM +0200, Mike Rapoport wrote: > On Mon, Oct 26, 2020 at 06:05:30PM +0000, Edgecombe, Rick P wrote: > > On Mon, 2020-10-26 at 11:05 +0200, Mike Rapoport wrote: > > > On Mon, Oct 26, 2020 at 01:13:52AM +0000, Edgecombe, Rick P wrote: > > > > On Sun, 2020-10-25 at 12:15 +0200, Mike Rapoport wrote: > > > > > Indeed, for architectures that define > > > > > CONFIG_ARCH_HAS_SET_DIRECT_MAP > > > > > it is > > > > > possible that __kernel_map_pages() would fail, but since this > > > > > function is > > > > > void, the failure will go unnoticed. > > > > > > > > Could you elaborate on how this could happen? Do you mean during > > > > runtime today or if something new was introduced? > > > > > > A failure in__kernel_map_pages() may happen today. For instance, on > > > x86 > > > if the kernel is built with DEBUG_PAGEALLOC. > > > > > > __kernel_map_pages(page, 1, 0); > > > > > > will need to split, say, 2M page and during the split an allocation > > > of > > > page table could fail. > > > > On x86 at least, DEBUG_PAGEALLOC expects to never have to break a page > > on the direct map and even disables locking in cpa because it assumes > > this. If this is happening somehow anyway then we should probably fix > > that. Even if it's a debug feature, it will not be as useful if it is > > causing its own crashes. > > > > I'm still wondering if there is something I'm missing here. It seems > > like you are saying there is a bug in some arch's, so let's add a WARN > > in cross-arch code to log it as it crashes. A warn and making things > > clearer seem like good ideas, but if there is a bug we should fix it. > > The code around the callers still functionally assume re-mapping can't > > fail. > > Oh, I've meant x86 kernel *without* DEBUG_PAGEALLOC, and indeed the call > that unmaps pages back in safe_copy_page will just reset a 4K page to > NP because whatever made it NP at the first place already did the split. > > Still, on arm64 with DEBUG_PAGEALLOC=n there is a possibility of a race > between map/unmap dance in __vunmap() and safe_copy_page() that may > cause access to unmapped memory: > > __vunmap() > vm_remove_mappings() > set_direct_map_invalid() > safe_copy_page() > __kernel_map_pages() > return > do_copy_page() -> fault > > This is a theoretical bug, but it is still not nice :) Just to clarify: this patch series fixes this problem, right? Will