On Friday 08 May 2009, Wu Fengguang wrote: > On Fri, May 08, 2009 at 07:11:30AM +0800, Rafael J. Wysocki wrote: > > On Friday 08 May 2009, David Rientjes wrote: > > > On Thu, 7 May 2009, Andrew Morton wrote: > > > > > > > The setting and clearing of that thing looks gruesomely racy.. > > > > > > > > > > It's not racy currently because zone_scan_lock ensures ZONE_OOM_LOCKED > > > gets test/set and cleared atomically for the entire zonelist (the clear > > > happens for the same zonelist that was test/set). > > > > > > Using it for hibernation in the way I've proposed will open it up to the > > > race I earlier described: when a kthread is in the oom killer and > > > subsequently clears its zonelist of ZONE_OOM_LOCKED (all other tasks are > > > frozen so they can't be in the oom killer). That's perfectly acceptable, > > > however, since the system is by definition already oom if kthreads can't > > > get memory so it will end up killing a user task even though it's stuck in > > > D state and will exit on thaw; we aren't concerned about killing > > > needlessly because the oom killer becomes a no-op when it finds a task > > > that has already been killed but hasn't exited by way of TIF_MEMDIE. > > > > OK there. > > > > So everyone seems to agree we can do something like in the patch below? > > > > --- > > From: Rafael J. Wysocki <rjw@xxxxxxx> > > Subject: PM/Hibernate: Rework shrinking of memory > > > > Rework swsusp_shrink_memory() so that it calls shrink_all_memory() > > just once to make some room for the image and then allocates memory > > to apply more pressure to the memory management subsystem, if > > necessary. > > Thanks! Reducing to single-pass helps memory bounty laptops considerably :) > > > Unfortunately, we don't seem to be able to drop shrink_all_memory() > > entirely just yet, because that would lead to huge performance > > regressions in some test cases. > > Yes, but it's not the fault of this patch. In fact some regressions > may even be positive pressures to the page allocate/reclaim code ;) > > > Signed-off-by: Rafael J. Wysocki <rjw@xxxxxxx> > > --- > > kernel/power/snapshot.c | 151 +++++++++++++++++++++++++++++++++--------------- > > 1 file changed, 104 insertions(+), 47 deletions(-) > > > > Index: linux-2.6/kernel/power/snapshot.c > > =================================================================== > > --- linux-2.6.orig/kernel/power/snapshot.c > > +++ linux-2.6/kernel/power/snapshot.c > > @@ -1066,69 +1066,126 @@ void swsusp_free(void) > > buffer = NULL; > > } > > > > +/* Helper functions used for the shrinking of memory. */ > > + > > /** > > - * swsusp_shrink_memory - Try to free as much memory as needed > > - * > > - * ... but do not OOM-kill anyone > > + * preallocate_image_memory - Allocate given number of page frames > > + * @nr_pages: Number of page frames to allocate > > * > > - * Notice: all userland should be stopped before it is called, or > > - * livelock is possible. > > + * Return value: Number of page frames actually allocated > > */ > > - > > -#define SHRINK_BITE 10000 > > -static inline unsigned long __shrink_memory(long tmp) > > +static unsigned long preallocate_image_memory(unsigned long nr_pages) > > { > > - if (tmp > SHRINK_BITE) > > - tmp = SHRINK_BITE; > > - return shrink_all_memory(tmp); > > + unsigned long nr_alloc = 0; > > + > > + while (nr_pages > 0) { > > + if (!alloc_image_page(GFP_KERNEL | __GFP_NOWARN)) > > + break; > > + nr_pages--; > > + nr_alloc++; > > + } > > + > > + return nr_alloc; > > } > > > > +/** > > + * swsusp_shrink_memory - Make the kernel release as much memory as needed > > + * > > + * To create a hibernation image it is necessary to make a copy of every page > > + * frame in use. We also need a number of page frames to be free during > > + * hibernation for allocations made while saving the image and for device > > + * drivers, in case they need to allocate memory from their hibernation > > + * callbacks (these two numbers are given by PAGES_FOR_IO and SPARE_PAGES, > > + * respectively, both of which are rough estimates). To make this happen, we > > + * compute the total number of available page frames and allocate at least > > + * > > + * ([page frames total] + PAGES_FOR_IO + [metadata pages]) / 2 + 2 * SPARE_PAGES > > + * > > + * of them, which corresponds to the maximum size of a hibernation image. > > + * > > + * If image_size is set below the number following from the above formula, > > + * the preallocation of memory is continued until the total number of page > > + * frames in use is below the requested image size or it is impossible to > > + * allocate more memory, whichever happens first. > > + */ > > int swsusp_shrink_memory(void) > > { > > - long tmp; > > struct zone *zone; > > - unsigned long pages = 0; > > - unsigned int i = 0; > > - char *p = "-\\|/"; > > + unsigned long saveable, size, max_size, count, pages = 0; > > struct timeval start, stop; > > + int error = 0; > > > > - printk(KERN_INFO "PM: Shrinking memory... "); > > + printk(KERN_INFO "PM: Shrinking memory ... "); > > do_gettimeofday(&start); > > - do { > > - long size, highmem_size; > > > > - highmem_size = count_highmem_pages(); > > - size = count_data_pages() + PAGES_FOR_IO + SPARE_PAGES; > > - tmp = size; > > - size += highmem_size; > > - for_each_populated_zone(zone) { > > - tmp += snapshot_additional_pages(zone); > > - if (is_highmem(zone)) { > > - highmem_size -= > > - zone_page_state(zone, NR_FREE_PAGES); > > - } else { > > - tmp -= zone_page_state(zone, NR_FREE_PAGES); > > - tmp += zone->lowmem_reserve[ZONE_NORMAL]; > > - } > > - } > > + /* Count the number of saveable data pages. */ > > + saveable = count_data_pages() + count_highmem_pages(); > > > > - if (highmem_size < 0) > > - highmem_size = 0; > > + /* > > + * Compute the total number of page frames we can use (count) and the > > + * number of pages needed for image metadata (size). > > + */ > > + count = saveable; > > + size = 0; > > + for_each_populated_zone(zone) { > > + size += snapshot_additional_pages(zone); > > + count += zone_page_state(zone, NR_FREE_PAGES); > > + count -= zone->pages_min; > > I'd prefer to be more safe, by removing the above line... > > > + } > > ...and add another line here: > > count -= totalreserve_pages; OK > But hey, that 'count' counts "savable+free" memory. > We don't have a counter for an estimation of "free+freeable" memory, > ie. we are sure we cannot preallocate above that threshold. > > One applicable situation is, when there are 800M anonymous memory, > but only 500M image_size and no swap space. > > In that case we will otherwise goto the oom code path. Sure oom is > (and shall be) reliably disabled in hibernation, but still we shall be > cautious enough not to create a low memory situation, which will hurt: > - hibernation speed > (vmscan goes mad trying to squeeze the last free page) > - user experiences after resume > (all *active* file data and metadata have to reloaded) Strangely enough, my recent testing with this patch doesn't confirm the theory. :-) Namely, I set image_size too low on purpose and it only caused preallocate_image_memory() to return NULL at one point and that was it. It didn't even took too much time. I'll carry out more testing to verify this observation. > The current code simply tries *too hard* to meet image_size. > I'd rather take that as a mild advice, and to only free > "free+freeable-margin" pages when image_size is not approachable. > > The safety margin can be totalreserve_pages, plus enough pages for > retaining the "hard core working set". How to compute the size of the "hard core working set", then? Rafael -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html