On Thu, 2019-09-26 at 07:14 -0400, Brian Foster wrote: > On Thu, Sep 26, 2019 at 11:27:40AM +0800, Ian Kent wrote: > > On Wed, 2019-09-25 at 10:34 -0400, Brian Foster wrote: > > > On Wed, Sep 25, 2019 at 04:07:08PM +0800, Ian Kent wrote: > > > > On Wed, 2019-09-25 at 15:42 +0800, Ian Kent wrote: > > > > > On Tue, 2019-09-24 at 10:38 -0400, Brian Foster wrote: > > > > > > On Tue, Sep 24, 2019 at 09:22:49PM +0800, Ian Kent wrote: > > > > > > > Add the fs_context_operations method .get_tree that > > > > > > > validates > > > > > > > mount options and fills the super block as previously > > > > > > > done > > > > > > > by the file_system_type .mount method. > > > > > > > > > > > > > > Signed-off-by: Ian Kent <raven@xxxxxxxxxx> > > > > > > > --- > > > > > > > fs/xfs/xfs_super.c | 50 > > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > 1 file changed, 50 insertions(+) > > > > > > > > > > > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > > > > > > index ea3640ffd8f5..6f9fe92b4e21 100644 > > > > > > > --- a/fs/xfs/xfs_super.c > > > > > > > +++ b/fs/xfs/xfs_super.c > > > > > > > @@ -1933,6 +1933,51 @@ xfs_fs_fill_super( > > > > > > > return error; > > > > > > > } > > > > > > > > > > > > > > +STATIC int > > > > > > > +xfs_fill_super( > > > > > > > + struct super_block *sb, > > > > > > > + struct fs_context *fc) > > > > > > > +{ > > > > > > > + struct xfs_fs_context *ctx = fc->fs_private; > > > > > > > + struct xfs_mount *mp = sb->s_fs_info; > > > > > > > + int silent = fc->sb_flags & > > > > > > > SB_SILENT; > > > > > > > + int error = -ENOMEM; > > > > > > > + > > > > > > > + mp->m_super = sb; > > > > > > > + > > > > > > > + /* > > > > > > > + * set up the mount name first so all the errors will > > > > > > > refer to > > > > > > > the > > > > > > > + * correct device. > > > > > > > + */ > > > > > > > + mp->m_fsname = kstrndup(sb->s_id, MAXNAMELEN, > > > > > > > GFP_KERNEL); > > > > > > > + if (!mp->m_fsname) > > > > > > > + return -ENOMEM; > > > > > > > + mp->m_fsname_len = strlen(mp->m_fsname) + 1; > > > > > > > + > > > > > > > + error = xfs_validate_params(mp, ctx, false); > > > > > > > + if (error) > > > > > > > + goto out_free_fsname; > > > > > > > + > > > > > > > + error = __xfs_fs_fill_super(mp, silent); > > > > > > > + if (error) > > > > > > > + goto out_free_fsname; > > > > > > > + > > > > > > > + return 0; > > > > > > > + > > > > > > > + out_free_fsname: > > > > > > > + sb->s_fs_info = NULL; > > > > > > > + xfs_free_fsname(mp); > > > > > > > + > > > > > > > > > > > > I'm still not following the (intended) lifecycle of mp > > > > > > here. > > > > > > Looking > > > > > > ahead in the series, we allocate mp in > > > > > > xfs_init_fs_context() > > > > > > and > > > > > > set > > > > > > some state. It looks like at some point we grow an > > > > > > xfs_fc_free() > > > > > > callback that frees mp, but that doesn't exist as of yet. > > > > > > So is > > > > > > that > > > > > > a > > > > > > memory leak as of this patch? > > > > > > > > > > > > We also call xfs_free_fsname() here (which doesn't reset > > > > > > pointers > > > > > > to > > > > > > NULL) and open-code kfree()'s of a couple of the same > > > > > > fields in > > > > > > xfs_fc_free(). Those look like double frees to me. > > > > > > > > > > > > Hmm.. I guess I'm kind of wondering why we lift the mp > > > > > > alloc > > > > > > out of > > > > > > the > > > > > > fill super call in the first place. At a glance, it doesn't > > > > > > look > > > > > > like > > > > > > we > > > > > > do anything in that xfs_init_fs_context() call that we > > > > > > couldn't > > > > > > do > > > > > > a > > > > > > bit > > > > > > later.. > > > > > > > > > > Umm ... yes ... > > > > > > > > > > I think I've got the active code path right ... > > > > > > > > > > At this point .mount == xfs_fs_mount() which will calls > > > > > xfs_fs_fill_super() to fill the super block. > > > > > > > > > > xfs_fs_fill_super() allocates the super block info struct and > > > > > sets > > > > > it in the super block private info field, then calls > > > > > xfs_parseargs() > > > > > which still allocates mp->m_fsname at this point, to > > > > > accomodate a > > > > > similar free pattern in xfs_test_remount_options(). > > > > > > > > > > It then calls __xfs_fs_fill_super() which doesn't touch those > > > > > fsname > > > > > fields or mp to fit in with what will be done later. > > > > > > > > > > If an error occurs both the fsname fields (xfs_free_fsname()) > > > > > and > > > > > mp > > > > > are freed by the main caller, xfs_fs_fill_super(). > > > > > > > > > > I think that process is ok. > > > > > > > > > > The mount api process that isn't active yet is a bit > > > > > different. > > > > > > > > > > The context (ctx), a temporary working space, is allocated > > > > > then > > > > > saved > > > > > in the mount context (fc) and the super block info is also > > > > > allocated > > > > > and saved in the mount context in it's field of the same name > > > > > as > > > > > the > > > > > private super block info field, s_fs_info. > > > > > > > > > > The function xfs_fill_super() is called as a result of the > > > > > .get_tree() > > > > > mount context operation to fill the super block. > > > > > > > > > > During this process, when the VFS successfully allocates the > > > > > super > > > > > block s_fs_info is set in the super block and the mount > > > > > context > > > > > field set to NULL. From this point freeing the private super > > > > > block > > > > > info becomes part of usual freeing of the super block with > > > > > the > > > > > super > > > > > operation .kill_sb(). > > > > > > > > > > But if the super block allocation fails then the mount > > > > > context > > > > > s_fs_info field remains set and is the responsibility of the > > > > > mount context operations .fc_free() method to clean up. > > > > > > > > > > Now the VFS calls to xfs_fill_super() after this. > > > > > > > > > > I should have been able to leave xfs_fill_super() it as it > > > > > was with: > > > > > sb->s_fs_info = NULL; > > > > > xfs_free_fsname(mp); > > > > > kfree(mp); > > > > > and that should have been ok but it wasn't, there was some > > > > > sort > > > > > of > > > > > allocation problem, possibly a double free, causing a crash. > > > > > > > > > > Strictly speaking this cleanup process should be carried out > > > > > by > > > > > either the mount context .fc_free() or super operation > > > > > .kill_sb() > > > > > and that's what I want to do. > > > > > > > > Umm ... but I can't actually do that ... > > > > > > > > Looking back at xfs I realize that the filling of the super > > > > block is meant to leave nothing allocated and set > > > > sb->s_fs_info = NULL on error so that ->put_super() won't try > > > > and cleanup a whole bunch of stuff that hasn't been done. > > > > > > > > Which brings me back to what I originally had above ... which > > > > we believe doesn't work ? > > > > > > > > > > It looks like perhaps the assignment of sb->s_fs_info was lost as > > > well? > > > Skipping to the end, I see xfs_init_fs_context() alloc mp and > > > assign > > > fc->s_fs_info. xfs_get_tree() leads to xfs_fill_super(), which > > > somehow > > > gets mp from sb->s_fs_info (not fc->...), but then resets sb- > > > > s_fs_info > > > on error and frees the names, leaving fs->s_fs_info so presumably > > > xfs_fc_free() can free mp along with a couple of the names > > > (again). I > > > can't really make heads or tails of what this is even attempting > > > to > > > do. > > > > Ha, it seems a bit mysterious, but it's actually much simpler > > than it appears. > > > > Feel free to explain any of the above..? Where do you currently > assign > sb->s_fs_info, for example? The VFS does the assignment, see below. > > > > That aside, it's not clear to me why the new code can't follow a > > > similar > > > pattern as the old code with regard to allocation. Allocate mp in > > > xfs_fill_super() and set up sb/fc pointers, reset pointers and > > > free > > > mp > > > on error return. Otherwise, xfs_fc_free() checks for fc- > > > >s_fs_info != > > > NULL and frees mp from there. Is there some reason we can't > > > continue > > > to > > > do that? > > > > I think not without a fairly significant re-design. > > > > The main difference is the mount-api will allocate the super > > block later than the old mount code. > > > > Basically, if file system parameter parsing fails the super > > block won't get allocated. > > > > So the super block isn't available during parameter parsing > > but the file system private data structure may be needed for > > it, so it comes from the file system context at that point. > > > > When the super block is successfully allocated the file system > > private data structure is set in the super block (and the field > > NULLed in the context) and things progress much the same as > > before from that point. > > > > That's the essential difference in the process AFAICS. > > > > I see. This is probably something that should be noted in the commit > log (that the ordering changes from before such that we need to > allocate > mp a bit earlier). This is reasonable because even though the current > code allocs mp in the fill_super callback, we parse arguments > immediately after the mp allocation and don't otherwise rely on the > sb > in that code. > > If I follow correctly, it sounds like perhaps we need to separate the > management of sb->s_fs_info from the "ownership" of mp. For example, > allocate mp, assign fc->s_fs_info and free via xfs_fc_free() as you > do > now. In the xfs_fill_super() callback, pull mp from fc->s_fs_info and > assign it to sb->s_fs_info. If we fail at this point, reset > sb->s_fs_info to NULL and let the fc infrastructure deal with freeing > mp > in its own callback. The mount api callbacks are for use during setup and don't really apply once that's done by the mount api. Lets describe the life cycle of s_fs_info by going through the call path from the xfs_get_tree() fs context operation. At this point the context has been setup by .init_fs_context ( aka. xfs_init_fs_context()), and fc->s_fs_info == mp and parameter parsing has been done. Then the .get_tree() method gets called (aka. fs_get_tree) and all this does (now) is call get_tree_bdev() as Al requested. Even though that's changed it still ends up calling similar VFS functions to what it did before. Now: int get_tree_bdev(struct fs_context *fc, int (*fill_super)(struct super_block *, struct fs_context *)) { skip ... (the usual stuff until we get to) fc->sb_flags |= SB_NOSEC; fc->sget_key = bdev; s = sget_fc(fc, test_bdev_super_fc, set_bdev_super_fc); mutex_unlock(&bdev->bd_fsfreeze_mutex); if (IS_ERR(s)) return PTR_ERR(s); if (s->s_root) { snip ... (mount root already exists) } else { s->s_mode = mode; snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev); sb_set_blocksize(s, block_size(bdev)); error = fill_super(s, fc); if (error) { deactivate_locked_super(s); return error; } s->s_flags |= SB_ACTIVE; bdev->bd_super = s; } BUG_ON(fc->root); fc->root = dget(s->s_root); return 0; } This is to establish the order of what's done. The thing we want to look at is the sget_fc() function. struct super_block *sget_fc(struct fs_context *fc, int (*test)(struct super_block *, struct fs_context *), int (*set)(struct super_block *, struct fs_context *)) { snip ... (usual stuff that finds or allocates a super block) s->s_fs_info = fc->s_fs_info; err = set(s, fc); if (err) { s->s_fs_info = NULL; spin_unlock(&sb_lock); destroy_unused_super(s); return ERR_PTR(err); } fc->s_fs_info = NULL; snip ... (remaining setup stuff) return s; } This is where the s_fs_info changes from being the responsibility of the fs context to being the responsibility of the super block. And you can see that this is done before fill_super() is called. Finally, fill_super() in xfs will do all the fs specific setup and either return success or it will undo anything it has done when returning an error, just as it did previously. Specifically, it must set sb->s_fs_info == NULL on failure so that when the VFS puts the super block xfs doesn't try and release a whole bunch of stuff that wasn't setup. Consequently mp must also be freed on error return from fill_super(). What I'm saying is that by the time fill_super() is called everything is is a state where things need to be done pretty much as though the mount api context isn't present, the only thing that muddies the order of things is the calling of .fc_free() that happens a bit later but isn't relevant to this ordering of operations. So I'm not sure there needs to be (or can be) changes to the way s_fs_info is handled in the implementation. We can go into more detail if it's needed, I am trying to explain my motivations for doing what I've done, ;) > > What I'm not clear on is whether something like xfs_fs_put_super() > should still free mp as well. Once the filesystem successfully > mounts, > are we still going to see an xfs_fc_free() callback, or is this all > just > transient mount path stuff? If the former, perhaps put_super() should > also not free mp and just reset its own ->s_fs_info reference. If the > latter, then I guess we just need to understand at what point during > a > successful mount responsibility to free transfers from one place to > the > other. Thoughts? put_super() should continue to function the same way it does now provided fill_super() does the right thing, which I'm pretty sure it (the mount api version) does now . xfs_fs_put_super() won't need to free mp if an error occurs, s_fs_info needs to be NULL in that case as has always been the case. And realize xfs_fc_free() (aka. .fc_free()) is largely irrelevant from the POV of xfs once we get to fill_super(). > > Brian > > > By the time fill_super() is called everything is set and you > > should be able to proceed almost the same as before. > > > > Ian > > > > > Brian > > > > > > > > So I'm not sure the allocation time and the place this is > > > > > done > > > > > can (or should) be done differently. > > > > > > > > > > And that freeing on error exit from xfs_fill_super() is > > > > > definitely > > > > > wrong now! Ha, and I didn't see any crashes myself when I > > > > > tested > > > > > it ... maybe I need a reproducer ... > > > > > > > > > > Ian > > > > > > > > > > > Brian > > > > > > > > > > > > > + return error; > > > > > > > +} > > > > > > > + > > > > > > > +STATIC int > > > > > > > +xfs_get_tree( > > > > > > > + struct fs_context *fc) > > > > > > > +{ > > > > > > > + return vfs_get_block_super(fc, xfs_fill_super); > > > > > > > +} > > > > > > > + > > > > > > > STATIC void > > > > > > > xfs_fs_put_super( > > > > > > > struct super_block *sb) > > > > > > > @@ -2003,6 +2048,11 @@ static const struct > > > > > > > super_operations > > > > > > > xfs_super_operations = { > > > > > > > .free_cached_objects = xfs_fs_free_cached_objects, > > > > > > > }; > > > > > > > > > > > > > > +static const struct fs_context_operations > > > > > > > xfs_context_ops = > > > > > > > { > > > > > > > + .parse_param = xfs_parse_param, > > > > > > > + .get_tree = xfs_get_tree, > > > > > > > +}; > > > > > > > + > > > > > > > static struct file_system_type xfs_fs_type = { > > > > > > > .owner = THIS_MODULE, > > > > > > > .name = "xfs", > > > > > > >