On Thu, Mar 7, 2019 at 1:28 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 07.03.19 22:14, Alexander Duyck wrote: > > On Thu, Mar 7, 2019 at 10:53 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > >> > >> On Thu, Mar 07, 2019 at 10:45:58AM -0800, Alexander Duyck wrote: > >>> To that end what I think w may want to do is instead just walk the LRU > >>> list for a given zone/order in reverse order so that we can try to > >>> identify the pages that are most likely to be cold and unused and > >>> those are the first ones we want to be hinting on rather than the ones > >>> that were just freed. If we can look at doing something like adding a > >>> jiffies value to the page indicating when it was last freed we could > >>> even have a good point for determining when we should stop processing > >>> pages in a given zone/order list. > >>> > >>> In reality the approach wouldn't be too different from what you are > >>> doing now, the only real difference would be that we would just want > >>> to walk the LRU list for the given zone/order rather then pulling > >>> hints on what to free from the calls to free_one_page. In addition we > >>> would need to add a couple bits to indicate if the page has been > >>> hinted on, is in the middle of getting hinted on, and something such > >>> as the jiffies value I mentioned which we could use to determine how > >>> old the page is. > >> > >> Do we really need bits in the page? > >> Would it be bad to just have a separate hint list? > > > > The issue is lists are expensive to search. If we have a single bit in > > the page we can check it as soon as we have the page. > > > >> If you run out of free memory you can check the hint > >> list, if you find stuff there you can spin > >> or kick the hypervisor to hurry up. > > > > This implies you are keeping a separate list of pages for what has > > been hinted on. If we are pulling pages out of the LRU list for that > > it will require the zone lock to move the pages back and forth and for > > higher core counts that isn't going to scale very well, and if you are > > trying to pull out a page that is currently being hinted on you will > > run into the same issue of having to wait for the hint to be completed > > before proceeding. > > > >> Core mm/ changes, so nothing's easy, I know. > > > > We might be able to reuse some existing page flags. For example, there > > is the PG_young and PG_idle flags that would actually be a pretty good > > fit in terms of what we are looking for in behavior. We could set > > PG_young when the page is initially freed, then clear it when we start > > to perform the hint, and set PG_idle once the hint has been completed. > > Just noting that when hinting, we have to set all affected sub-page bits > as far as I see. You may be correct there. One thing I hadn't thought about is what happens if the page is split or merged up to a higher order. I guess I could be talked into being okay with a side list that we maintain a few pages in that are isolated from the rest. > > > > The check for if we could use a page would be pretty fast as a result > > as well since if PG_young or PG_idle are set it means the page is free > > to use so the check in arch_alloc_page would be pretty cheap since we > > could probably test for both bits in one read. > > > > I still dislike spinning on ordinary allocation paths. If we want to go > that way, core mm has to consider these bits and try other pages first. Agreed. I was just thinking that would be follow-on work since in my mind the collision rate for these should be low.