On Mon, Oct 19, 2020 at 10:50:39AM +0200, David Hildenbrand wrote: >On 19.10.20 09:54, Wei Yang wrote: >> On Mon, Oct 12, 2020 at 02:53:23PM +0200, David Hildenbrand wrote: >>> Let's add a safe mechanism to unplug memory, avoiding long/endless loops >>> when trying to offline memory - similar to in SBM. >>> >>> Fake-offline all memory (via alloc_contig_range()) before trying to >>> offline+remove it. Use this mode as default, but allow to enable the other >>> mode explicitly (which could give better memory hotunplug guarantees in >> >> I don't get the point how unsafe mode would have a better guarantees? > >It's primarily only relevant when there is a lot of concurrent action >going on while unplugging memory. Using alloc_contig_range() on >ZONE_MOVABLE can fail more easily than memory offlining. > >alloc_contig_range() doesn't try as hard as memory offlining code to >isolate memory. There are known issues with temporary page pinning >(e.g., when a process dies) and the PCP. (mostly discovered via CMA >allocations) > >See the TODO I add in patch #14. > >[...] >>> >>> + if (bbm_safe_unplug) { >>> + /* >>> + * Start by fake-offlining all memory. Once we marked the device >>> + * block as fake-offline, all newly onlined memory will >>> + * automatically be kept fake-offline. Protect from concurrent >>> + * onlining/offlining until we have a consistent state. >>> + */ >>> + mutex_lock(&vm->hotplug_mutex); >>> + virtio_mem_bbm_set_bb_state(vm, bb_id, >>> + VIRTIO_MEM_BBM_BB_FAKE_OFFLINE); >>> + >> >> State is set here. >> >>> + for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) { >>> + page = pfn_to_online_page(pfn); >>> + if (!page) >>> + continue; >>> + >>> + rc = virtio_mem_fake_offline(pfn, PAGES_PER_SECTION); >>> + if (rc) { >>> + end_pfn = pfn; >>> + goto rollback_safe_unplug; >>> + } >>> + } >>> + mutex_unlock(&vm->hotplug_mutex); >>> + } >>> + >>> rc = virtio_mem_bbm_offline_and_remove_bb(vm, bb_id); >>> - if (rc) >>> + if (rc) { >>> + if (bbm_safe_unplug) { >>> + mutex_lock(&vm->hotplug_mutex); >>> + goto rollback_safe_unplug; >>> + } >>> return rc; >>> + } >>> >>> rc = virtio_mem_bbm_unplug_bb(vm, bb_id); >>> if (rc) >> >> And changed to PLUGGED or UNUSED based on rc. > >Right, after offlining+remove succeeded. So no longer added to Linux. > >The final state depends on the success of the unplug request towards the >hypervisor. > >> >>> @@ -1987,6 +2069,17 @@ static int virtio_mem_bbm_offline_remove_and_unplug_bb(struct virtio_mem *vm, >>> virtio_mem_bbm_set_bb_state(vm, bb_id, >>> VIRTIO_MEM_BBM_BB_UNUSED); >>> return rc; >>> + >>> +rollback_safe_unplug: >>> + for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) { >>> + page = pfn_to_online_page(pfn); >>> + if (!page) >>> + continue; >>> + virtio_mem_fake_online(pfn, PAGES_PER_SECTION); >>> + } >>> + virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED); >> >> And changed to ADDED if failed. > >Right, back to the initial state when entering this function. > >> >>> + mutex_unlock(&vm->hotplug_mutex); >>> + return rc; >>> } >> >> So in which case, the bbm state is FAKE_OFFLINE during >> virtio_mem_bbm_notify_going_offline() and >> virtio_mem_bbm_notify_cancel_offline() ? > >Exactly, so we can do our magic with fake-offline pages and our >virtio_mem_bbm_offline_and_remove_bb() can actually succeed. Ah, my fault. The exact code flow is this: virtio_mem_bbm_offline_remove_and_unplug_bb() virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_FAKE_OFFLINE) virtio_mem_fake_offline(pfn, PAGES_PER_SECTION) virtio_mem_bbm_offline_and_remove_bb(vm, bb_id) offline and trigger memory notification --- 1) The notification is necessary at 1) to release the refcount, which is grabbed during fake offline. > > >-- >Thanks, > >David / dhildenb -- Wei Yang Help you, Help me