On 12/5/19 11:22 AM, Alexander Duyck wrote: > From: Alexander Duyck <alexander.h.duyck@xxxxxxxxxxxxxxx> > > In order to pave the way for free page reporting in virtualized > environments we will need a way to get pages out of the free lists and > identify those pages after they have been returned. To accomplish this, > this patch adds the concept of a Reported Buddy, which is essentially > meant to just be the Uptodate flag used in conjunction with the Buddy > page type. > > To prevent the reported pages from leaking outside of the buddy lists I > added a check to clear the PageReported bit in the del_page_from_free_list > function. As a result any reported page that is split, merged, or > allocated will have the flag cleared prior to the PageBuddy value being > cleared. > > The process for reporting pages is fairly simple. Once we free a page that > meets the minimum order for page reporting we will schedule a worker thread > to start 2s or more in the future. That worker thread will begin working > from the lowest supported page reporting order up to MAX_ORDER - 1 pulling > unreported pages from the free list and storing them in the scatterlist. > > When processing each individual free list it is necessary for the worker > thread to release the zone lock when it needs to stop and report the full > scatterlist of pages. To reduce the work of the next iteration the worker > thread will rotate the free list so that the first unreported page in the > free list becomes the first entry in the list. [...] > k); > + > + return err; > +} > + > +static int > +page_reporting_process_zone(struct page_reporting_dev_info *prdev, > + struct scatterlist *sgl, struct zone *zone) > +{ > + unsigned int order, mt, leftover, offset = PAGE_REPORTING_CAPACITY; > + unsigned long watermark; > + int err = 0; > + > + /* Generate minimum watermark to be able to guarantee progress */ > + watermark = low_wmark_pages(zone) + > + (PAGE_REPORTING_CAPACITY << PAGE_REPORTING_MIN_ORDER); > + > + /* > + * Cancel request if insufficient free memory or if we failed > + * to allocate page reporting statistics for the zone. > + */ > + if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA)) > + return err; > + Will it not make more sense to check the low watermark condition before every reporting request generated for a bunch of 32 isolated pages? or will that be too costly? > + /* Process each free list starting from lowest order/mt */ > + for (order = PAGE_REPORTING_MIN_ORDER; order < MAX_ORDER; order++) { > + for (mt = 0; mt < MIGRATE_TYPES; mt++) { > + /* We do not pull pages from the isolate free list */ > + if (is_migrate_isolate(mt)) > + continue; > + > + err = page_reporting_cycle(prdev, zone, order, mt, > + sgl, &offset); > + if (err) > + return err; > + } > + } > + > + /* report the leftover pages before going idle */ > + leftover = PAGE_REPORTING_CAPACITY - offset; > + if (leftover) { > + sgl = &sgl[offset]; > + err = prdev->report(prdev, sgl, leftover); > + > + /* flush any remaining pages out from the last report */ > + spin_lock_irq(&zone->lock); > + page_reporting_drain(prdev, sgl, leftover, !err); > + spin_unlock_irq(&zone->lock); > + } > + > + return err; > +} -- Nitesh