On 8/12/19 2:47 PM, Alexander Duyck wrote: > On Mon, Aug 12, 2019 at 6:13 AM Nitesh Narayan Lal <nitesh@xxxxxxxxxx> wrote: >> This patch introduces the core infrastructure for free page reporting in >> virtual environments. It enables the kernel to track the free pages which >> can be reported to its hypervisor so that the hypervisor could >> free and reuse that memory as per its requirement. >> >> While the pages are getting processed in the hypervisor (e.g., >> via MADV_DONTNEED), the guest must not use them, otherwise, data loss >> would be possible. To avoid such a situation, these pages are >> temporarily removed from the buddy. The amount of pages removed >> temporarily from the buddy is governed by the backend(virtio-balloon >> in our case). >> >> To efficiently identify free pages that can to be reported to the >> hypervisor, bitmaps in a coarse granularity are used. Only fairly big >> chunks are reported to the hypervisor - especially, to not break up THP >> in the hypervisor - "MAX_ORDER - 2" on x86, and to save space. The bits >> in the bitmap are an indication whether a page *might* be free, not a >> guarantee. A new hook after buddy merging sets the bits. >> >> Bitmaps are stored per zone, protected by the zone lock. A workqueue >> asynchronously processes the bitmaps, trying to isolate and report pages >> that are still free. The backend (virtio-balloon) is responsible for >> reporting these batched pages to the host synchronously. Once reporting/ >> freeing is complete, isolated pages are returned back to the buddy. >> >> Signed-off-by: Nitesh Narayan Lal <nitesh@xxxxxxxxxx> > [...] >> +} >> + >> +/** >> + * __page_reporting_enqueue - tracks the freed page in the respective zone's >> + * bitmap and enqueues a new page reporting job to the workqueue if possible. >> + */ >> +void __page_reporting_enqueue(struct page *page) >> +{ >> + struct page_reporting_config *phconf; >> + struct zone *zone; >> + >> + rcu_read_lock(); >> + /* >> + * We should not process this page if either page reporting is not >> + * yet completely enabled or it has been disabled by the backend. >> + */ >> + phconf = rcu_dereference(page_reporting_conf); >> + if (!phconf) >> + return; >> + >> + zone = page_zone(page); >> + bitmap_set_bit(page, zone); >> + >> + /* >> + * We should not enqueue a job if a previously enqueued reporting work >> + * is in progress or we don't have enough free pages in the zone. >> + */ >> + if (atomic_read(&zone->free_pages) >= phconf->max_pages && >> + !atomic_cmpxchg(&phconf->refcnt, 0, 1)) > This doesn't make any sense to me. Why are you only incrementing the > refcount if it is zero? Combining this with the assignment above, this > isn't really a refcnt. It is just an oversized bitflag. The intent for having an extra variable was to ensure that at a time only one reporting job is enqueued. I do agree that for that purpose I really don't need a reference counter and I should have used something like bool 'page_hinting_active'. But with bool, I think there could be a possible chance of race. Maybe I should rename this variable and keep it as atomic. Any thoughts? > > Also I am pretty sure this results in the opportunity to miss pages > because there is nothing to prevent you from possibly missing a ton of > pages you could hint on if a large number of pages are pushed out all > at once and then the system goes idle in terms of memory allocation > and freeing. I was looking at how you are enqueuing/processing reporting jobs for each zone. I am wondering if I should also consider something on similar lines as having that I might be able to address the concern which you have raised above. But it would also mean that I have to add an additional flag in the zone_flags. :) > [...] > >> +EXPORT_SYMBOL_GPL(page_reporting_enable); >> -- >> 2.21.0 >> -- Thanks Nitesh