On 08.03.19 03:24, Michael S. Tsirkin wrote: > On Thu, Mar 07, 2019 at 08:27:32PM +0100, David Hildenbrand wrote: >> On 07.03.19 19:53, Michael S. Tsirkin wrote: >>> On Thu, Mar 07, 2019 at 10:45:58AM -0800, Alexander Duyck wrote: >>>> To that end what I think w may want to do is instead just walk the LRU >>>> list for a given zone/order in reverse order so that we can try to >>>> identify the pages that are most likely to be cold and unused and >>>> those are the first ones we want to be hinting on rather than the ones >>>> that were just freed. If we can look at doing something like adding a >>>> jiffies value to the page indicating when it was last freed we could >>>> even have a good point for determining when we should stop processing >>>> pages in a given zone/order list. >>>> >>>> In reality the approach wouldn't be too different from what you are >>>> doing now, the only real difference would be that we would just want >>>> to walk the LRU list for the given zone/order rather then pulling >>>> hints on what to free from the calls to free_one_page. In addition we >>>> would need to add a couple bits to indicate if the page has been >>>> hinted on, is in the middle of getting hinted on, and something such >>>> as the jiffies value I mentioned which we could use to determine how >>>> old the page is. >>> >>> Do we really need bits in the page? >>> Would it be bad to just have a separate hint list? >>> >>> If you run out of free memory you can check the hint >>> list, if you find stuff there you can spin >>> or kick the hypervisor to hurry up. >>> >>> Core mm/ changes, so nothing's easy, I know. >> >> We evaluated the idea of busy spinning on some bit/list entry a while >> ago. While it sounds interesting, it is usually not what we want and has >> other negative performance impacts. >> >> Talking about "marking" pages, what we actually would want is to rework >> the buddy to skip over these "marked" pages and only really spin in case >> there are no other pages left. Allocation paths should only ever be >> blocked if OOM, not if just some hinting activity is going on on another >> VCPU. >> >> However as you correctly say: "core mm changes". New page flag? >> Basically impossible. > > Well not exactly. page bits are at a premium but only for > *allocated* pages. pages in the buddy are free and there are > some unused bits for these. > As I said, we have to be very careful here. Most parts of struct page can me modified by *the owner* of the page. In case the page is online but not allocated, buddy is the owner. Not some kvm/virtio thingy that hooks into some callback. Manipulating random page bits of buddy pages in *some* kernel module I consider problematic and will most probably not be accepted upstream. What could work is, factoring out these parts e.g. into mm/page_hinting.c, then it gets part of the core mm in some way. Which would actually be a nice thing to do either way we go. -- Thanks, David / dhildenb