On 07.03.19 22:14, Alexander Duyck wrote: > On Thu, Mar 7, 2019 at 10:53 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: >> >> On Thu, Mar 07, 2019 at 10:45:58AM -0800, Alexander Duyck wrote: >>> To that end what I think w may want to do is instead just walk the LRU >>> list for a given zone/order in reverse order so that we can try to >>> identify the pages that are most likely to be cold and unused and >>> those are the first ones we want to be hinting on rather than the ones >>> that were just freed. If we can look at doing something like adding a >>> jiffies value to the page indicating when it was last freed we could >>> even have a good point for determining when we should stop processing >>> pages in a given zone/order list. >>> >>> In reality the approach wouldn't be too different from what you are >>> doing now, the only real difference would be that we would just want >>> to walk the LRU list for the given zone/order rather then pulling >>> hints on what to free from the calls to free_one_page. In addition we >>> would need to add a couple bits to indicate if the page has been >>> hinted on, is in the middle of getting hinted on, and something such >>> as the jiffies value I mentioned which we could use to determine how >>> old the page is. >> >> Do we really need bits in the page? >> Would it be bad to just have a separate hint list? > > The issue is lists are expensive to search. If we have a single bit in > the page we can check it as soon as we have the page. > >> If you run out of free memory you can check the hint >> list, if you find stuff there you can spin >> or kick the hypervisor to hurry up. > > This implies you are keeping a separate list of pages for what has > been hinted on. If we are pulling pages out of the LRU list for that > it will require the zone lock to move the pages back and forth and for > higher core counts that isn't going to scale very well, and if you are > trying to pull out a page that is currently being hinted on you will > run into the same issue of having to wait for the hint to be completed > before proceeding. > >> Core mm/ changes, so nothing's easy, I know. > > We might be able to reuse some existing page flags. For example, there > is the PG_young and PG_idle flags that would actually be a pretty good > fit in terms of what we are looking for in behavior. We could set > PG_young when the page is initially freed, then clear it when we start > to perform the hint, and set PG_idle once the hint has been completed. Just noting that when hinting, we have to set all affected sub-page bits as far as I see. > > The check for if we could use a page would be pretty fast as a result > as well since if PG_young or PG_idle are set it means the page is free > to use so the check in arch_alloc_page would be pretty cheap since we > could probably test for both bits in one read. > I still dislike spinning on ordinary allocation paths. If we want to go that way, core mm has to consider these bits and try other pages first. -- Thanks, David / dhildenb