On Wed, Oct 28, 2020 at 09:03:31PM +0000, Edgecombe, Rick P wrote: > > On Wed, Oct 28, 2020 at 11:20:12AM +0000, Will Deacon wrote: > > > On Tue, Oct 27, 2020 at 10:38:16AM +0200, Mike Rapoport wrote: > > > > > > > > This is a theoretical bug, but it is still not nice :) > > > > > > > > > > Just to clarify: this patch series fixes this problem, right? > > > > Yes. > > > > Well, now I'm confused again. > > As David pointed, __vunmap() should not be executing simultaneously > with the hibernate operation because hibernate can't snapshot while > data it needs to save is still updating. If a thread was paused when a > page was in an "invalid" state, it should be remapped by hibernate > before the copy. > > To level set, before reading this mail, my takeaways from the > discussions on potential hibernate/debug page alloc problems were: > > Potential RISC-V issue: > Doesn't have hibernate support > > Potential ARM issue: > The logic around when it's cpa determines pages might be unmapped looks > correct for current callers. > > Potential x86 page break issue: > Seems to be ok for now, but a new set_memory_np() caller could violate > assumptions in hibernate. > > Non-obvious thorny logic: > General agreement it would be good to separate dependencies. > > Behavior of V1 of this patchset: > No functional change other than addition of a warn in hibernate. There is a change that adds explicit use of set_direct_map() to hibernate. Currently, in case of arm64 with DEBUG_PAGEALLOC=n if a thread was paused when a page was in an "invalid" state hibernate will access an unmapped data because __kernel_map_pages() will bail out. After the change set_direct_map_default_noflush() would be used and the page will get mapped before copy. > So "does this fix the problem", "yes" leaves me a bit confused... Not > saying there couldn't be any problems, especially due to the thorniness > and cross arch stride, but what is it exactly and how does this series > fix it? This series goal was primarily to separate dependincies and make it clearer what DEBUG_PAGEALLOC and what SET_DIRECT_MAP are. As it turned out, there is also some lack of consistency between architectures that implement either of this so I tried to improve this as well. Honestly, I don't know if a thread can be paused at the time __vunmap() left invalid pages, but it could, there is an issue on arm64 with DEBUG_PAGEALLOC=n and this set fixes it. __vunmap() vm_remove_mappings() set_direct_map_invalid() /* thread is frozen */ safe_copy_page() __kernel_map_pages() if (!debug_pagealloc()) return do_copy_page() -> fault -- Sincerely yours, Mike.