Re: s2disk hang update

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/23/10, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
> On Tuesday 23 February 2010, Alan Jenkins wrote:
>> On 2/22/10, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
>> > On Monday 22 February 2010, Alan Jenkins wrote:
>> >> Rafael J. Wysocki wrote:
>> >> > On Friday 19 February 2010, Alan Jenkins wrote:
>> >> >
>> >> >> On 2/18/10, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
>> >> >>
>> >> >>> On Thursday 18 February 2010, Alan Jenkins wrote:
>> >> >>>
>> >> >>>> On 2/17/10, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
>> >> >>>>
>> >> >>>>> On Wednesday 17 February 2010, Alan Jenkins wrote:
>> >> >>>>>
>> >> >>>>>> On 2/16/10, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
>> >> >>>>>>
>> >> >>>>>>> On Tuesday 16 February 2010, Alan Jenkins wrote:
>> >> >>>>>>>
>> >> >>>>>>>> On 2/16/10, Alan Jenkins <sourcejedi.lkml@xxxxxxxxxxxxxx>
>> >> >>>>>>>> wrote:
>> >> >>>>>>>>
>> >> >>>>>>>>> On 2/15/10, Rafael J. Wysocki <rjw@xxxxxxx> wrote:
>> >> >>>>>>>>>
>> >> >>>>>>>>>> On Tuesday 09 February 2010, Alan Jenkins wrote:
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>> Perhaps I spoke too soon.  I see the same hang if I run too
>> >> >>>>>>>>>>> many
>> >> >>>>>>>>>>> applications.  The first hibernation fails with "not enough
>> >> >>>>>>>>>>> swap"
>> >> >>>>>>>>>>> as
>> >> >>>>>>>>>>> expected, but the second or third attempt hangs (with the
>> >> >>>>>>>>>>> same
>> >> >>>>>>>>>>> backtrace
>> >> >>>>>>>>>>> as before).
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>> The patch definitely helps though.  Without the patch, I
>> >> >>>>>>>>>>> see a
>> >> >>>>>>>>>>> hang
>> >> >>>>>>>>>>> the
>> >> >>>>>>>>>>> first time I try to hibernate with too many applications
>> >> >>>>>>>>>>> running.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>> Well, I have an idea.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Can you try to apply the appended patch in addition and see
>> >> >>>>>>>>>> if
>> >> >>>>>>>>>> that
>> >> >>>>>>>>>> helps?
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Rafael
>> >> >>>>>>>>>>
>> >> >>>>>>>>> It doesn't seem to help.
>> >> >>>>>>>>>
>> >> >>>>>>>> To be clear: It doesn't stop the hang when I hibernate with
>> >> >>>>>>>> too
>> >> >>>>>>>> many
>> >> >>>>>>>> applications.
>> >> >>>>>>>>
>> >> >>>>>>>> It does stop the same hang in a different case though.
>> >> >>>>>>>>
>> >> >>>>>>>> 1. boot with init=/bin/bash
>> >> >>>>>>>> 2. run s2disk
>> >> >>>>>>>> 3. cancel the s2disk
>> >> >>>>>>>> 4. repeat steps 2&3
>> >> >>>>>>>>
>> >> >>>>>>>> With the patch, I can run 10s of iterations, with no hang.
>> >> >>>>>>>> Without the patch, it soon hangs, (in disable_nonboot_cpus(),
>> >> >>>>>>>> as
>> >> >>>>>>>> always).
>> >> >>>>>>>>
>> >> >>>>>>>> That's what happens on 2.6.33-rc7.  On 2.6.30, there is no
>> >> >>>>>>>> problem.
>> >> >>>>>>>> On 2.6.31 and 2.6.32 I don't get a hang, but dmesg shows an
>> >> >>>>>>>> allocation
>> >> >>>>>>>> failure after a couple of iterations ("kthreadd: page
>> >> >>>>>>>> allocation
>> >> >>>>>>>> failure. order:1, mode:0xd0").  It looks like it might be the
>> >> >>>>>>>> same
>> >> >>>>>>>> stop_machine thread allocation failure that causes the hang.
>> >> >>>>>>>>
>> >> >>>>>>> Have you tested it alone or on top of the previous one?  If
>> >> >>>>>>> you've
>> >> >>>>>>> tested it
>> >> >>>>>>> alone, please apply the appended one in addition to it and
>> >> >>>>>>> retest.
>> >> >>>>>>>
>> >> >>>>>>> Rafael
>> >> >>>>>>>
>> >> >>>>>> I did test with both patches applied together -
>> >> >>>>>>
>> >> >>>>>> 1. [Update] MM / PM: Force GFP_NOIO during suspend/hibernation
>> >> >>>>>> and
>> >> >>>>>> resume
>> >> >>>>>> 2. "reducing the number of pages that we're going to keep
>> >> >>>>>> preallocated
>> >> >>>>>> by
>> >> >>>>>> 20%"
>> >> >>>>>>
>> >> >>>>> In that case you can try to reduce the number of preallocated
>> >> >>>>> pages
>> >> >>>>> even
>> >> >>>>> more,
>> >> >>>>> ie. change "/ 5" to "/ 2" (for example) in the second patch.
>> >> >>>>>
>> >> >>>> It still hangs if I try to hibernate a couple of times with too
>> >> >>>> many
>> >> >>>> applications.
>> >> >>>>
>> >> >>> Hmm.  I guess I asked that before, but is this a 32-bit or 64-bit
>> >> >>> system and
>> >> >>> how much RAM is there in the box?
>> >> >>>
>> >> >>> Rafael
>> >> >>>
>> >> >> EeePC 701.  32 bit.  512Mb RAM.  350Mb swap file, on a "first-gen"
>> >> >> SSD.
>> >> >>
>> >> >
>> >> > Hmm.  I'd try to make  free_unnecessary_pages() free all of the
>> >> > preallocated
>> >> > pages and see what happens.
>> >> >
>> >>
>> >> It still hangs in hibernation_snapshot() / disable_nonboot_cpus().
>> >> After apparently freeing over 400Mb / 100,000 pages of preallocated
>> >> ram.
>> >>
>> >>
>> >>
>> >> There is a change which I missed before.  When I applied your first
>> >> patch ("Force GFP_NOIO during suspend" etc.), it did change the hung
>> >> task backtraces a bit.  I don't know if it tells us anything.
>> >>
>> >> Without the patch, there were two backtraces.  The first backtrace
>> >> suggested a problem allocating pages for a kernel thread (at
>> >> copy_process() / try_to_free_pages()).  The second showed that this
>> >> problem was blocking s2disk (at hibernation_snapshot() /
>> >> disable_nonboot_cpus() / stop_machine_create()).
>> >>
>> >> With the GFP_NOIO patch, I see only the s2disk backtrace.
>> >
>> > Can you please post this backtrace?
>>
>> Sure.  It's rather like the one I posted before, except
>>
>> a) it only shows the one hung task (s2disk)
>> b) this time I had lockdep enabled
>> c) this time most of the lines don't have question marks.
>
> Well, it still looks like we're waiting for create_workqueue_thread() to
> return, which probably is trying to allocate memory for the thread
> structure.
>
> My guess is that the preallocated memory pages freed by
> free_unnecessary_pages() go into a place from where they cannot be taken for
> subsequent NOIO allocations.  I have no idea why that happens though.
>
> To test that theory you can try to change GFP_IOFS to GFP_KERNEL in the
> calls to clear_gfp_allowed_mask() in kernel/power/hibernate.c (and in
> kernel/power/suspend.c for completness).

Effectively forcing GFP_NOWAIT, so the allocation should fail instead
of hanging?

It seems to stop the hang, but I don't see any other difference - the
hibernation process isn't stopped earlier, and I don't get any new
kernel messages about allocation failures.  I wonder if it's because
GFP_NOWAIT triggers ALLOC_HARDER.

I have other evidence which argues for your theory:

[ successful s2disk, with forced NOIO (but not NOWAIT), and test code
as attached ]

 Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
 1280 GFP_NOWAIT allocations of order 0 are possible
 640 GFP_NOWAIT allocations of order 1 are possible
 320 GFP_NOWAIT allocations of order 2 are possible

[ note - 1280 pages is the maximum test allocation used here.  The
test code is only accurate when talking about smaller numbers of free
pages ]

 1280 GFP_KERNEL allocations of order 0 are possible
 640 GFP_KERNEL allocations of order 1 are possible
 320 GFP_KERNEL allocations of order 2 are possible

 PM: Preallocating image memory...
 212 GFP_NOWAIT allocations of order 0 are possible
 102 GFP_NOWAIT allocations of order 1 are possible
 50 GFP_NOWAIT allocations of order 2 are possible

 Freeing all 90083 preallocated pages
 (and 0 highmem pages, out of 0)
 190 GFP_NOWAIT allocations of order 0 are possible
 102 GFP_NOWAIT allocations of order 1 are possible
 50 GFP_NOWAIT allocations of order 2 are possible
 1280 GFP_KERNEL allocations of order 0 are possible
 640 GFP_KERNEL allocations of order 1 are possible
 320 GFP_KERNEL allocations of order 2 are possible
 done (allocated 90083 pages)

It looks like you're right and the freed pages are not accessible with
GFP_NOWAIT for some reason.

I also tried a number of test runs with too many applications, and saw this:

Freeing all 104006 preallocated pages ...
65 GFP_NOWAIT allocations of order 0 ...
18 GFP_NOWAIT allocations of order 1 ...
9 GFP_NOWAIT allocations of order 2 ...
0 GFP_KERNEL allocations of order 0 are possible
...
Disabling nonboot cpus ...
...
PM: Hibernation image created
Force enabled HPET at resume
PM: early thaw of devices complete after ... msecs

<hang, no backtrace visible even after 120 seconds>

I'm not bothered by the new hang; the test code will inevitably have
some side effects.  I'm not sure why GFP_KERNEL allocations would fail
in this scenario though...  perhaps the difference is that we've
swapped out the entire userspace so GFP_IO doesn't help.

Regards
Alan
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index da5288e..2e245d9 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -262,6 +262,7 @@ static int create_image(int platform_mode)
 	if (error || hibernation_test(TEST_PLATFORM))
 		goto Platform_finish;
 
+	check_free(GFP_NOWAIT);
 	error = disable_nonboot_cpus();
 	if (error || hibernation_test(TEST_CPUS)
 	    || hibernation_testmode(HIBERNATION_TEST))
diff --git a/kernel/power/power.h b/kernel/power/power.h
index 46c5a26..d2178dc 100644
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -236,3 +236,53 @@ static inline void suspend_thaw_processes(void)
 {
 }
 #endif
+
+/* An empirical check on the number of free pages */
+static inline int check_free_pages(gfp_t gfp_flags, const char *gfp_name, int order)
+{
+	int ret;
+	int count = 0;
+	void *first = NULL;
+	void **p = &first;
+	unsigned long page;
+
+	/* Allocate free pages into a linked list, headed by "first" */
+	while(count < ((PAGES_FOR_IO + SPARE_PAGES) >> order)) {
+		page = __get_free_pages(gfp_flags|__GFP_NOWARN, order);
+		if (!page)
+			break;
+		*p = (void *)page;
+		p = (void **)page;
+		count++;
+	}
+	*p = NULL;
+
+	ret = count;
+	printk(KERN_INFO
+		"%d %s allocations of order %d are possible\n",
+		count, gfp_name, order);
+
+	/* Free the pages again */
+	p = first;
+	while(p) {
+		page = (unsigned long) p;
+		p = *p;
+		free_pages(page, order);
+		count--;
+	}
+	BUG_ON(count != 0);
+
+	return ret;
+}
+
+static inline void __check_free(gfp_t gfp_flags, const char *gfp_name)
+{
+	int order;
+
+	for (order = 0; order < 3; order++)
+		if (check_free_pages(gfp_flags, gfp_name, order) <= 0)
+			break;
+}
+
+#define check_free(flags) __check_free(flags, #flags)
+
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 36cb168..605b7b7 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1261,6 +1261,8 @@ int hibernate_preallocate_memory(void)
 	struct timeval start, stop;
 	int error;
 
+	check_free(GFP_NOWAIT);
+	check_free(GFP_KERNEL);
 	printk(KERN_INFO "PM: Preallocating image memory... ");
 	do_gettimeofday(&start);
 
@@ -1350,7 +1352,10 @@ int hibernate_preallocate_memory(void)
 	 * pages in memory, but we have allocated more.  Release the excessive
 	 * ones now.
 	 */
+	check_free(GFP_NOWAIT);
 	free_unnecessary_pages();
+	check_free(GFP_NOWAIT);
+	check_free(GFP_KERNEL);
 
  out:
 	do_gettimeofday(&stop);
_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

[Index of Archives]     [Linux ACPI]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [CPU Freq]     [Kernel Newbies]     [Fedora Kernel]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux