Re: [RFC PATCH v11.1 2/2] mm, pmem, xfs: Introduce MF_MEM_REMOVE for unbind

Shiyang Ruan <ruansy.fnst@xxxxxxxxxxx> · Wed, 26 Apr 2023 10:27:43 +0800

在 2023/4/25 23:18, Darrick J. Wong 写道:
On Tue, Apr 25, 2023 at 03:23:15PM +0200, Jan Kara wrote:
On Tue 25-04-23 20:47:35, Shiyang Ruan wrote:

在 2023/4/20 20:09, Jan Kara 写道:
On Thu 20-04-23 10:07:39, Shiyang Ruan wrote:
在 2023/4/12 18:52, Shiyang Ruan 写道:
This is a RFC HOTFIX.

This hotfix adds a exclusive forzen state to make sure any others won't
thaw the fs during xfs_dax_notify_failure():

     #define SB_FREEZE_EXCLUSIVE	(SB_FREEZE_COMPLETE + 2)
Using +2 here is because Darrick's patch[0] is using +1.  So, should we
make these definitions global?

Another thing I can't make up my mind is: when another freezer has freeze
the fs, should we wait unitl it finish, or print a warning in dmesg and
return -EBUSY?

Since there are at least 2 places needs exclusive forzen state, I think
we can refactor helper functions of freeze/thaw for them.  e.g.
     int freeze_super_exclusive(struct super_block *sb, int frozen);
     int thaw_super_exclusive(struct super_block *sb, int frozen);

[0] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/commit/?h=repair-fscounters&id=c3a0d1de4d54ffb565dbc7092dfe1fb851940669

I'm OK with the idea of new freeze state that does not allow userspace to
thaw the filesystem. But I don't really like the guts of filesystem
freezing being replicated inside XFS. It is bad enough that they are
replicated in [0], replicating them *once more* in another XFS file shows
we are definitely doing something wrong. And Luis will need yet another
incantation of the exlusive freeze for suspend-to-disk. So please guys get
together and reorganize the generic freezing code so that it supports
exclusive freeze (for in-kernel users) and works for your usecases instead
of replicating it inside XFS...

I agree that too much replicating code is not good.  It's necessary to
create a generic exclusive freeze/thaw for all users.  But for me, I don't
have the confidence to do it well, because it requires good design and code
changes will involve other filesystems.  It's diffcult.

However, I hope to be able to make progress on this unbind feature. Thus, I
tend to refactor a common helper function for xfs first, and update the code
later when the generic freeze is done.

I think Darrick was thinking about working on a proper generic interface.
So please coordinate with him.

I'll post a vfs generic kernelfreeze series later today.

One thing I haven't figured out yet is what's supposed to happen when
PREREMOVE is called on a frozen filesystem.

call PREREMOVE when:
1. freezed by kernel:    we wait unitl kernel thaws -> not sure
2. freezed by userspace: we take over the control of freeze state:
     a. userspace can't thaw before PREREMOVE is done
     b. kernel keeps freeze state after PREREMOVE is done and before 
userspace thaws

Since the unbind interface doesn't return any other errcode except 
-ENODEV, the only thing I can think of to do is wait for the other one 
done?  If another one doesn't thaw after a long time waitting, we print 
a "waitting too long" warning in dmesg.  But I'm not sure if this is good.

We don't want userspace to
be able to thaw the fs while PREREMOVE is running, so I /guess/ that
means we need some method for the kernel to take over a userspace
freeze and then put it back when we're done?

As is designed by Luis, we can add sb->s_writers.frozen_by_user flag to 
distinguish whether current freeze state is initiated by kernel or 
userspace.  In his patch, userspace can take over kernel's freeze.  We 
just need to switch the order.

--
Thanks,
Ruan.

--D

								Honza

--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR