>>> >>> In addition we will need some way to identify which pages have been >>> hinted on and which have not. The way I believe easiest to do this >>> would be to overload the PageType value so that we could essentially >>> have two values for "Buddy" pages. We would have our standard "Buddy" >>> pages, and "Buddy" pages that also have the "Offline" value set in the >>> PageType field. Tracking the Online vs Offline pages this way would >>> actually allow us to do this with almost no overhead as the mapcount >>> value is already being reset to clear the "Buddy" flag so adding a >>> "Offline" flag to this clearing should come at no additional cost. >> >> Just nothing here that this will require modifications to kdump >> (makedumpfile to be precise and the vmcore information exposed from the >> kernel), as kdump only checks for the the actual mapcount value to >> detect buddy and offline pages (to exclude them from dumps), they are >> not treated as flags. >> >> For now, any mapcount values are really only separate values, meaning >> not the separate bits are of interest, like flags would be. Reusing >> other flags would make our life a lot easier. E.g. PG_young or so. But >> clearing of these is then the problematic part. >> >> Of course we could use in the kernel two values, Buddy and BuddyOffline. >> But then we have to check for two different values whenever we want to >> identify a buddy page in the kernel. > > Actually this may not be working the way you think it is working. Trust me, I know how it works. That's why I was giving you the notice. Read the first paragraph again and ignore the others. I am only concerned about makedumpfile that has to be changed. PAGE_OFFLINE_MAPCOUNT_VALUE PAGE_BUDDY_MAPCOUNT_VALUE Once you find out how these values are used, you should understand what has to be changed and where. >>> >>> Lastly we would need to create a specialized function for allocating >>> the non-"Offline" pages, and to tweak __free_one_page to tail enqueue >>> "Offline" pages. I'm thinking the alloc function it would look >>> something like __rmqueue_smallest but without the "expand" and needing >>> to modify the !page check to also include a check to verify the page >>> is not "Offline". As far as the changes to __free_one_page it would be >>> a 2 line change to test for the PageType being offline, and if it is >>> to call add_to_free_area_tail instead of add_to_free_area. >> >> As already mentioned, there might be scenarios where the additional >> hinting thread might consume too much CPU cycles, especially if there is >> little guest activity any you mostly spend time scanning a handful of >> free pages and reporting them. I wonder if we can somehow limit the >> amount of wakeups/scans for a given period to mitigate this issue. > > That is why I was talking about breaking nr_free into nr_freed and > nr_bound. By doing that I can record the nr_free value to a > virtio-balloon specific location at the start of any walk and should > know exactly now many pages were freed between that call and the next > one. By ordering things such that we place the "Offline" pages on the > tail of the list it should make the search quite fast since we would > just be always allocating off of the head of the queue until we have > hinted everything int he queue. So when we hit the last call to alloc > the non-"Offline" pages and shut down our thread we can use the > nr_freed value that we recorded to know exactly how many pages have > been added that haven't been hinted. > >> One main issue I see with your approach is that we need quite a lot of >> core memory management changes. This is a problem. I wonder if we can >> factor out most parts into callbacks. > > I think that is something we can't get away from. However if we make > this generic enough there would likely be others beyond just the > virtualization drivers that could make use of the infrastructure. For > example being able to track the rate at which the free areas are > cycling in and out pages seems like something that would be useful > outside of just the virtualization areas. Might be, but might be the other extreme, people not wanting such special cases in core mm. I assume the latter until I see a very clear design where such stuff has been properly factored out. > >> E.g. in order to detect where to queue a certain page (front/tail), call >> a callback if one is registered, mark/check pages in a core-mm unknown >> way as offline etc. >> >> I still wonder if there could be an easier way to combine recording of >> hints and one hinting thread, essentially avoiding scanning and some of >> the required core-mm changes. > > The concern I have with trying to avoid the scanning by tracking is > that if you fall behind it becomes something where just tracking the > metadata for the page hints would start to become expensive. Depends, if it is mostly only marking a bit in a bitmap, it should in general not be too much of an issue. As usual, the datastructure used is the important bit. -- Thanks, David / dhildenb