Re: [PATCH 1/3] power, vfs: move away from PF_KTHREAD freezing in favor of fs freezing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Oct 30, 2015, at 21:47, Jiri Kosina <jikos@xxxxxxxxxx> wrote:
> 
> From: Jiri Kosina <jkosina@xxxxxxx>
> 
> Freeze all filesystems during hibernation in favor of dropping kthread
> freezing completely.
> 
> Kthread freezing has a history of not very well defined semantics.
> Historically, it has been established to make sure that kthreads which are
> helpers to writing out data to disk are frozen during hibernation at
> well-defined state, so that it's guaranteed that they freeze only after all the
> outstanding I/O made it to disk (userspace has already been frozen by that
> time, so there is no new I/O being issued).
> 
> One of the issues with kthread freezer is that the places where try_to_freeze()
> is called in kthreads is rather random / arbitrary. This stems mostly from
> the fact that there is actually no good definition / list of requirements
> that should be enforced on a frozen kthread (it's important to mention that
> frozen kthread for example doesn't automatically imply that everything that
> has been spawned asynchronously from it (such as timers) is stopped as well).
> 
> Basically the main argument why kthread freezer is not needed boils down to
> this: the only facility that is needed during suspend: "no persistent fs
> changes are allowed from now on".
> 
> 	- this is something we get from fs freezing for free
> 	- Why do we issue all those try_to_freeze() calls in kthreads that
> 	  don't generate any I/O themselves at all?
> 	- Why do we issue all those try_to_freeze() calls in kthreads that
> 	  are actual I/O helpers? (if there is outstanding I/O, we *want*
> 	  it to happen before hibernating).
> 
> This patch removes freezing of kthreads during suspend, and issues a global
> filesystem freeze instead.
> 
> We awoid freezing in-memory filesystems, because (a) they end up in the
> image in a consistent state anyway (b) we will deadlock, as write to
> /sys/power/state would never succeed.
> 
> The patch could have been made a bit nicer if generic iterate_supers()
> could be used, but the locking (especially s_umount semantics) would have to
> be redone, so that'd be something to postpone for later.
> Also, the 'skip_virtual' flag is superfluous for now (as hibernation is the
> only user of this facility for the time being), but I'd like to keep it
> there to avoid further confusion regarding the fact that freeze_all_supers()
> actually skips in-memory filesystems.
> 
> Based on prior work done by Rafael Wysocki.
> 
> Signed-off-by: Jiri Kosina <jkosina@xxxxxxx>
> ---
> drivers/xen/manage.c     |  6 ----
> fs/super.c               | 84 ++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/freezer.h  |  2 --
> include/linux/fs.h       |  2 ++
> kernel/power/hibernate.c |  5 ---
> kernel/power/power.h     | 20 +-----------
> kernel/power/process.c   | 63 +++++++++---------------------------
> kernel/power/user.c      |  9 ------
> 8 files changed, 103 insertions(+), 88 deletions(-)
> 
> diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
> index e12bd36..d47716a 100644
> --- a/drivers/xen/manage.c
> +++ b/drivers/xen/manage.c
> @@ -109,12 +109,6 @@ static void do_suspend(void)
> 		goto out;
> 	}
> 
> -	err = freeze_kernel_threads();
> -	if (err) {
> -		pr_err("%s: freeze kernel threads failed %d\n", __func__, err);
> -		goto out_thaw;
> -	}
> -
> 	err = dpm_suspend_start(PMSG_FREEZE);
> 	if (err) {
> 		pr_err("%s: dpm_suspend_start %d\n", __func__, err);
> diff --git a/fs/super.c b/fs/super.c
> index 954aeb8..b7cb50f 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1373,3 +1373,87 @@ out:
> 	return 0;
> }
> EXPORT_SYMBOL(thaw_super);
> +
> +static bool super_should_freeze(struct super_block *sb, bool skip_virtual)
> +{
> +	if (!sb->s_root)
> +		return false;
> +	if (!(sb->s_flags & MS_BORN))
> +		return false;
> +	/* Should we freeze virtual filesystems? */
> +	if (sb->s_bdi == &noop_backing_dev_info && skip_virtual)
> +		return false;
> +	/* No need to freeze read-only filesystems */
> +	if (sb->s_flags & MS_RDONLY)
> +		return false;
> +	return true;
> +}
> +
> +/**
> + * freeze_all_supers -- iterate through all filesystems and freeze them
> + * @skip_virtual: should those with no backing device be skipped?
> + *
> + * Iterate over all superblocks and call freeze_super() for them. Some
> + * use-cases (such as freezer) might want to have to skip those which
> + * don't have any backing bdev.
> + *
> + */
> +int freeze_all_supers(bool skip_virtual)
> +{
> +	struct super_block *sb, *p = NULL;
> +	int error = 0;
> +
> +	spin_lock(&sb_lock);
> +	/*
> +	 * The list of super-blocks is iterated in a reverse order so that
> +	 * inter-dependencies (such as loopback devices) are handled in
> +	 * a non-deadlocking order
> +	 */
> +	list_for_each_entry_reverse(sb, &super_blocks, s_list) {
> +		if (hlist_unhashed(&sb->s_instances))
> +			continue;
> +		sb->s_count++;
> +
> +		spin_unlock(&sb_lock);
> +		if (super_should_freeze(sb, skip_virtual)) {
> +			error = freeze_super(sb);
> +			if (error) {
> +				spin_lock(&sb_lock);
> +				break;
> +			}
> +		}
> +		spin_lock(&sb_lock);
> +
> +		if (p)
> +			__put_super(p);
> +		p = sb;
> +	}
> +	if (p)
> +		__put_super(p);
> +	spin_unlock(&sb_lock);
> +
> +	return error;
> +}
> +
> +void thaw_all_supers(void)
> +{
> +	struct super_block *sb, *p = NULL;
> +
> +	spin_lock(&sb_lock);
> +	list_for_each_entry(sb, &super_blocks, s_list) {
> +		if (hlist_unhashed(&sb->s_instances))
> +			continue;
> +		sb->s_count++;
> +
> +		spin_unlock(&sb_lock);
> +		thaw_super(sb);
> +		spin_lock(&sb_lock);
> +
> +		if (p)
> +			__put_super(p);
> +		p = sb;
> +	}
> +	if (p)
> +		__put_super(p);
> +	spin_unlock(&sb_lock);
> +}
> diff --git a/include/linux/freezer.h b/include/linux/freezer.h
> index 6b7fd9c..81335f6 100644
> --- a/include/linux/freezer.h
> +++ b/include/linux/freezer.h
> @@ -43,9 +43,7 @@ extern void __thaw_task(struct task_struct *t);
> 
> extern bool __refrigerator(bool check_kthr_stop);
> extern int freeze_processes(void);
> -extern int freeze_kernel_threads(void);
> extern void thaw_processes(void);
> -extern void thaw_kernel_threads(void);
> 
> /*
>  * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 72d8a84..8ef2605 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2029,6 +2029,8 @@ extern int fd_statfs(int, struct kstatfs *);
> extern int vfs_ustat(dev_t, struct kstatfs *);
> extern int freeze_super(struct super_block *super);
> extern int thaw_super(struct super_block *super);
> +extern int freeze_all_supers(bool);
> +extern void thaw_all_supers(void);
> extern bool our_mnt(struct vfsmount *mnt);
> 
> extern int current_umask(void);
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index 690f78f..d5c36bb 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -348,10 +348,6 @@ int hibernation_snapshot(int platform_mode)
> 	if (error)
> 		goto Close;
> 
> -	error = freeze_kernel_threads();
> -	if (error)
> -		goto Cleanup;
> -
> 	if (hibernation_test(TEST_FREEZER)) {
> 
> 		/*
> @@ -402,7 +398,6 @@ int hibernation_snapshot(int platform_mode)
> 	return error;
> 
>  Thaw:
> -	thaw_kernel_threads();
>  Cleanup:
> 	swsusp_free();
> 	goto Close;
> diff --git a/kernel/power/power.h b/kernel/power/power.h
> index caadb56..4c3267f 100644
> --- a/kernel/power/power.h
> +++ b/kernel/power/power.h
> @@ -224,25 +224,7 @@ extern int pm_test_level;
> #ifdef CONFIG_SUSPEND_FREEZER
> static inline int suspend_freeze_processes(void)
> {
> -	int error;
> -
> -	error = freeze_processes();
> -	/*
> -	 * freeze_processes() automatically thaws every task if freezing
> -	 * fails. So we need not do anything extra upon error.
> -	 */
> -	if (error)
> -		return error;
> -
> -	error = freeze_kernel_threads();
> -	/*
> -	 * freeze_kernel_threads() thaws only kernel threads upon freezing
> -	 * failure. So we have to thaw the userspace tasks ourselves.
> -	 */
> -	if (error)
> -		thaw_processes();
> -
> -	return error;
> +	return freeze_processes();
> }
> 
> static inline void suspend_thaw_processes(void)
> diff --git a/kernel/power/process.c b/kernel/power/process.c
> index 564f786..b1ad215 100644
> --- a/kernel/power/process.c
> +++ b/kernel/power/process.c
> @@ -17,6 +17,7 @@
> #include <linux/delay.h>
> #include <linux/workqueue.h>
> #include <linux/kmod.h>
> +#include <linux/fs.h>
> #include <trace/events/power.h>
> 
> /* 
> @@ -140,6 +141,17 @@ int freeze_processes(void)
> 	pr_cont("\n");
> 	BUG_ON(in_atomic());
> 
> +	pr_info("Freezing filesystems ... ");
> +
> +	error = freeze_all_supers(true);
> +	if (error) {
> +		pr_cont("failed.\n");
> +		thaw_processes();
> +		return error;
> +	}
> +
> +	pr_cont("done.\n");
> +
> 	/*
> 	 * Now that the whole userspace is frozen we need to disbale
> 	 * the OOM killer to disallow any further interference with
> @@ -148,35 +160,10 @@ int freeze_processes(void)
> 	if (!error && !oom_killer_disable())
> 		error = -EBUSY;
> 
> -	if (error)
> +	if (error) {
> 		thaw_processes();
> -	return error;
> -}
> -
> -/**
> - * freeze_kernel_threads - Make freezable kernel threads go to the refrigerator.
> - *
> - * On success, returns 0.  On failure, -errno and only the kernel threads are
> - * thawed, so as to give a chance to the caller to do additional cleanups
> - * (if any) before thawing the userspace tasks. So, it is the responsibility
> - * of the caller to thaw the userspace tasks, when the time is right.
> - */
> -int freeze_kernel_threads(void)
> -{
> -	int error;
> -
> -	pr_info("Freezing remaining freezable tasks ... ");
> -
> -	pm_nosig_freezing = true;
> -	error = try_to_freeze_tasks(false);
> -	if (!error)
> -		pr_cont("done.");
> -
> -	pr_cont("\n");
> -	BUG_ON(in_atomic());
> -
> -	if (error)
> -		thaw_kernel_threads();
> +		thaw_all_supers();
> +	}
> 	return error;
> }
> 
> @@ -197,6 +184,7 @@ void thaw_processes(void)
> 
> 	__usermodehelper_set_disable_depth(UMH_FREEZING);
> 	thaw_workqueues();
> +	thaw_all_supers();
> 
> 	read_lock(&tasklist_lock);
> 	for_each_process_thread(g, p) {
> @@ -216,22 +204,3 @@ void thaw_processes(void)
> 	trace_suspend_resume(TPS("thaw_processes"), 0, false);
> }
> 
> -void thaw_kernel_threads(void)
> -{
> -	struct task_struct *g, *p;
> -
> -	pm_nosig_freezing = false;
> -	pr_info("Restarting kernel threads ... ");
> -
> -	thaw_workqueues();
> -
> -	read_lock(&tasklist_lock);
> -	for_each_process_thread(g, p) {
> -		if (p->flags & (PF_KTHREAD | PF_WQ_WORKER))
> -			__thaw_task(p);
> -	}
> -	read_unlock(&tasklist_lock);
> -
> -	schedule();
> -	pr_cont("done.\n");
> -}
> diff --git a/kernel/power/user.c b/kernel/power/user.c
> index 526e891..75771c0 100644
> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -275,15 +275,6 @@ static long snapshot_ioctl(struct file *filp, unsigned int cmd,
> 		swsusp_free();
> 		memset(&data->handle, 0, sizeof(struct snapshot_handle));
> 		data->ready = false;
> -		/*
> -		 * It is necessary to thaw kernel threads here, because
> -		 * SNAPSHOT_CREATE_IMAGE may be invoked directly after
> -		 * SNAPSHOT_FREE.  In that case, if kernel threads were not
> -		 * thawed, the preallocation of memory carried out by
> -		 * hibernation_snapshot() might run into problems (i.e. it
> -		 * might fail or even deadlock).
> -		 */
> -		thaw_kernel_threads();
> 		break;
> 
> 	case SNAPSHOT_PREF_IMAGE_SIZE:
> -- 
> 1.9.2
do you test your patch on kthread_bind()  kernel thread ?
if you remove freeze_kernel_threads() function,
this means lots of pthread will be running status during suspend .
will have problem for bind thread , these thread will be migrate to other 
CPU , will have problem running code on other CPU, another problems is 
that the cpu_allowed_ptr is changed and will never be restore to original CPU mask .
this is not unsafe .
for example, if there is a thread like this :

kthread_bind()
while (!pthread_should_stop())
{
	do_something();

	try_to_freeze();
}

if remove try_to_freeze() , this thread maybe migrate to other CPU if the thread is running .
freeze kernel thread can make sure lots of kernel thread are not in runq during suspend ,
then we can avoid task migration.

Thanks
















--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux