On Mon, Oct 19, 2020 at 11:04:40AM +0200, David Hildenbrand wrote: >On 18.10.20 05:57, Wei Yang wrote: >> On Fri, Oct 16, 2020 at 11:18:39AM +0200, David Hildenbrand wrote: >>> On 16.10.20 06:03, Wei Yang wrote: >>>> On Mon, Oct 12, 2020 at 02:53:03PM +0200, David Hildenbrand wrote: >>>>> Let's trigger from offlining code when we're not allowed to touch online >> >> Here "touch" means "unplug"? If so, maybe s/touch/unplug/ would be more easy >> to understand. > >Yes, much better. > >[...] > >> I am trying to get more understanding about the logic of virtio_mem_retry(). >> >> Current logic seems clear to me. There are four places to trigger it: >> >> * notify_offline >> * notify_online >> * timer_expired >> * config_changed >> >> In this patch, we try to optimize the first case, notify_offline. > >Yes. > >> >> Now, we would always trigger retry when one of our memory block get offlined. >> Per my understanding, this logic is correct while missed one case (or be more >> precise, not handle one case timely). The case this patch wants to improve is >> virtio_mem_mb_remove(). If my understanding is correct. >> > >Yes, that's one part of it. Read below. > >> virtio_mem_run_wq() >> virtio_mem_unplug_request() >> virtio_mem_mb_unplug_any_sb_offline() >> virtio_mem_mb_remove() --- 1 >> virtio_mem_mb_unplug_any_sb_online() >> virtio_mem_mb_offline_and_remove() --- 2 >> >> The above is two functions this patch adjusts. For 2), it will offline the >> memory block, thus will trigger virtio_mem_retry() originally. But for 1), the >> memory block is already offlined, so virtio_mem_retry() will not be triggered >> originally. This is the case we want to improve in this patch. Instead of wait >> for timer expire, we trigger retry immediately after unplug/remove an offlined >> memory block. >> >> And after this change, this patch still adjust the original >> virtio_mem_notify_offline() path to just trigger virtio_mem_retry() when >> unplug_online is false. (This means the offline event is notified from user >> space instead of from unplug event). >> >> If my above analysis is correct, I got one small suggestion for this patch. >> Instead of adjust current notify_offline handling, how about just trigger >> retry during virtio_mem_mb_remove()? Since per my understanding, we just want >> to do immediate trigger retry when unplug an offlined memory block. > >I probably should have added the following to the patch description: > >"This is a preparation for Big Block Mode (BBM), whereby we can see some >temporary offlining of memory blocks without actually making progress" > >Imagine you have a Big Block that spans to Linux memory blocks. Assume >the first Linux memory blocks has no unmovable data on it. > >Assume you call offline_and_remove_memory() > >1. Try to offline the first block. Works, notifiers triggered. >virtio_mem_retry(). After this patch, the virtio_mem_retry() is remove here. >2. Try to offline the second block. Does not work. >3. Re-online first block. >4. Exit to main loop, exit workqueue. Since offline_and_remove_memory() doesn't succeed, virtio_mem_retry() is not triggered. >5. Retry immediately (due to virtio_mem_retry()), go to 1. So we won't have endless loop. > >So, you'll keep retrying forever. Found while debugging that exact issue :) > If this is the case, my suggestion is to record it in the changelog. Otherwise, we may lose this corner case which is important to this change. > >-- >Thanks, > >David / dhildenb -- Wei Yang Help you, Help me