[linux-pm] Re: [RFC][PATCH] swsusp: support creating bigger images

rjw at sisk.pl (Rafael J. Wysocki) · Tue Apr 25 01:27:25 2006

Hi,

On Tuesday 25 April 2006 00:16, Pavel Machek wrote:
> > The appended patch allows swsusp to break the "50% of the normal zone" limit.
> > This is achieved by using the observation that pages mapped by frozen
> > userland tasks need not be copied before saving.
> 
> I've not followed the patch too carefully, but it is not as bad as I
> expected...
> 
> > With this patch applied I was able to save (and restore ;-)) ~800 MB suspend
> > images on a box with 1 GB of RAM.
> 
> And did it also work second time? ;-).

Yes, somethinkg like 10 times in a row. :-)

> Okay, so it can be done, and patch does not look too bad. It still
> scares me. Is 800MB image more responsive than 500MB after resume?

Yes, it is, slightly, but I think 800 meg images are impractical for
performance reasons (like IMO everything above 500 meg with the current
hardware).  However this means we can save 80% of RAM with the patch
and that should be 400 meg instead of 250 on a 500 meg machine, or
200 meg instead of 125 on a 250 meg machine.

Apart from this we save some memory allocations and copyings during
suspend.

> I guess we can revert to old behaviour by simply returning 1 from
> need_to_copy, right?

Yes.

> I assume that need_to_copy returns 1 in case page is shared by current
> and some other process?

Yes.  [If the page is shared with the current task it has to mapped by it, in
which case it will be copied.]

> > [Please don't beat me very hard, just couldn't resist. ;-)]
> 
> Well, you are about to force me to learn about mm internals. Plus you
> force everyone who tries to modify swsusp. ... it may be okay if
> benefit is great enough and if it gets proper testing. Not 2.6.17
> material.

Obviously not, and yes it requires a _lot_ of testing especially on boxes
with 512 meg of RAM or less.

> Is benefit worth it?

Well, that depends.  I think for boxes with 1 GB of RAM or more it's just
unnecessary (as of today, but this may change if faster disks are available).
On boxes with 512 MB of RAM or less it may change a lot as far as the
responsiveness after resume is concerned.

Anyway do you think it may go into -mm (unless Andrew shoots it down,
that is ;-))?

> > --- linux-2.6.17-rc1-mm3.orig/include/linux/rmap.h	2006-04-22 10:34:33.000000000 +0200
> > +++ linux-2.6.17-rc1-mm3/include/linux/rmap.h	2006-04-22 10:34:45.000000000 +0200
> > @@ -104,6 +104,12 @@ pte_t *page_check_address(struct page *,
> >   */
> >  unsigned long page_address_in_vma(struct page *, struct vm_area_struct *);
> >  
> > +#ifdef CONFIG_SOFTWARE_SUSPEND
> > +int page_mapped_by_current(struct page *);
> > +#else
> > +static inline int page_mapped_by_current(struct page *page) { return 0; }
> > +#endif /* CONFIG_SOFTWARE_SUSPEND */
> 
> I'd leave it undefined. That will prevent nasty surprise when someone
> tries to use it w/o CONFIG_SOFTWARE_SUSPEND set.

OK

> > @@ -251,6 +253,14 @@ int restore_special_mem(void)
> >  	return ret;
> >  }
> >  
> > +/* Represents a stacked allocated page to be used in the future */
> > +struct res_page {
> > +	struct res_page *next;
> > +	char padding[PAGE_SIZE - sizeof(void *)];
> > +};
> > +
> > +static struct res_page *page_list;
> > +
> >  static int pfn_is_nosave(unsigned long pfn)
> >  {
> >  	unsigned long nosave_begin_pfn = __pa(&__nosave_begin) >>
> > PAGE_SHIFT;
> 
> Is this part of some other patch?

No.  That's because I use a linked list of allocated pages that has been
defined in a slightly different way in
swsusp-use-less-memory-during-resume.patch,
so I just redefine and reuse it.

> > @@ -490,6 +613,11 @@ void swsusp_free(void)
> >  	buffer = NULL;
> >  }
> >  
> > +void swsusp_free(void)
> > +{
> > +	free_image();
> > +	restore_active_inactive_lists();
> > +}
> 
> This still scares me. Nice test would be to
> save/restore_active_inactive_lists repeatedly in a loop ... on running
> system ... aha, but it probably can't work outside of refrigerator?

restore_active_inactive_lists() only does something if suspend fails
(otherwise the lists active_pages and inactive_pages are empty).

The suspend failure may be simulated by running swapoff and suspendig
with "echo disk > /sys/power/state".  However it wouldn't be interesting to
do this in a loop because it will make the same set of pages go out of the
LRU lists and back in a round robin fashion.

I did the the following test:
- ran swapoff
- tried to suspend with "echo disk > /sys/power/state" (which failed,
obviously)
- checked the numbers of active/inactive pages in /proc/meminfo
- ran swapon
- set image_size to 0
- suspended with "echo disk > /sys/power/state" to verify if the right number
of pages would be reclaimed
- resumed

at it worked, apparently.

> >  	case SNAPSHOT_UNFREEZE:
> >  		if (!data->frozen)
> >  			break;
> > +		if (data->ready) {
> > +			error = -EPERM;
> > +			break;
> > +		}
> >  		down(&pm_sem);
> >  		thaw_processes();
> >  		enable_nonboot_cpus();
> 
> Error from UNFREEZE is not nice:
> 
> Unfreeze:
>         unfreeze(snapshot_fd);
>         return error;
> }
> ... we don't handle it.

It will only occur if we don't call free_snapshot() before unfreeze() which is
a bug anyway.  We prevent it from occuring by always calling free_snapshot()
before unfreeze(). :-)

> OTOH I guess it will be all fixed on exit()?

Well, sort of, but I think the LRU lists always have to be restored before
thawing processes (eg. kswapd).

Greetings,
Rafael