On Fri, Sep 27, 2019 at 09:16:28AM +0800, Ian Kent wrote: > On Thu, 2019-09-26 at 07:14 -0400, Brian Foster wrote: > > On Thu, Sep 26, 2019 at 11:27:40AM +0800, Ian Kent wrote: > > > On Wed, 2019-09-25 at 10:34 -0400, Brian Foster wrote: > > > > On Wed, Sep 25, 2019 at 04:07:08PM +0800, Ian Kent wrote: > > > > > On Wed, 2019-09-25 at 15:42 +0800, Ian Kent wrote: > > > > > > On Tue, 2019-09-24 at 10:38 -0400, Brian Foster wrote: > > > > > > > On Tue, Sep 24, 2019 at 09:22:49PM +0800, Ian Kent wrote: > > > > > > > > Add the fs_context_operations method .get_tree that > > > > > > > > validates > > > > > > > > mount options and fills the super block as previously > > > > > > > > done > > > > > > > > by the file_system_type .mount method. > > > > > > > > > > > > > > > > Signed-off-by: Ian Kent <raven@xxxxxxxxxx> > > > > > > > > --- > > > > > > > > fs/xfs/xfs_super.c | 50 > > > > > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > > 1 file changed, 50 insertions(+) > > > > > > > > > > > > > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > > > > > > > index ea3640ffd8f5..6f9fe92b4e21 100644 > > > > > > > > --- a/fs/xfs/xfs_super.c > > > > > > > > +++ b/fs/xfs/xfs_super.c > > > > > > > > @@ -1933,6 +1933,51 @@ xfs_fs_fill_super( > > > > > > > > return error; > > > > > > > > } > > > > > > > > > > > > > > > > +STATIC int > > > > > > > > +xfs_fill_super( > > > > > > > > + struct super_block *sb, > > > > > > > > + struct fs_context *fc) > > > > > > > > +{ > > > > > > > > + struct xfs_fs_context *ctx = fc->fs_private; > > > > > > > > + struct xfs_mount *mp = sb->s_fs_info; > > > > > > > > + int silent = fc->sb_flags & > > > > > > > > SB_SILENT; > > > > > > > > + int error = -ENOMEM; > > > > > > > > + > > > > > > > > + mp->m_super = sb; > > > > > > > > + > > > > > > > > + /* > > > > > > > > + * set up the mount name first so all the errors will > > > > > > > > refer to > > > > > > > > the > > > > > > > > + * correct device. > > > > > > > > + */ > > > > > > > > + mp->m_fsname = kstrndup(sb->s_id, MAXNAMELEN, > > > > > > > > GFP_KERNEL); > > > > > > > > + if (!mp->m_fsname) > > > > > > > > + return -ENOMEM; > > > > > > > > + mp->m_fsname_len = strlen(mp->m_fsname) + 1; > > > > > > > > + > > > > > > > > + error = xfs_validate_params(mp, ctx, false); > > > > > > > > + if (error) > > > > > > > > + goto out_free_fsname; > > > > > > > > + > > > > > > > > + error = __xfs_fs_fill_super(mp, silent); > > > > > > > > + if (error) > > > > > > > > + goto out_free_fsname; > > > > > > > > + > > > > > > > > + return 0; > > > > > > > > + > > > > > > > > + out_free_fsname: > > > > > > > > + sb->s_fs_info = NULL; > > > > > > > > + xfs_free_fsname(mp); > > > > > > > > + > > > > > > > > > > > > > > I'm still not following the (intended) lifecycle of mp > > > > > > > here. > > > > > > > Looking > > > > > > > ahead in the series, we allocate mp in > > > > > > > xfs_init_fs_context() > > > > > > > and > > > > > > > set > > > > > > > some state. It looks like at some point we grow an > > > > > > > xfs_fc_free() > > > > > > > callback that frees mp, but that doesn't exist as of yet. > > > > > > > So is > > > > > > > that > > > > > > > a > > > > > > > memory leak as of this patch? > > > > > > > > > > > > > > We also call xfs_free_fsname() here (which doesn't reset > > > > > > > pointers > > > > > > > to > > > > > > > NULL) and open-code kfree()'s of a couple of the same > > > > > > > fields in > > > > > > > xfs_fc_free(). Those look like double frees to me. > > > > > > > > > > > > > > Hmm.. I guess I'm kind of wondering why we lift the mp > > > > > > > alloc > > > > > > > out of > > > > > > > the > > > > > > > fill super call in the first place. At a glance, it doesn't > > > > > > > look > > > > > > > like > > > > > > > we > > > > > > > do anything in that xfs_init_fs_context() call that we > > > > > > > couldn't > > > > > > > do > > > > > > > a > > > > > > > bit > > > > > > > later.. > > > > > > > > > > > > Umm ... yes ... > > > > > > > > > > > > I think I've got the active code path right ... > > > > > > > > > > > > At this point .mount == xfs_fs_mount() which will calls > > > > > > xfs_fs_fill_super() to fill the super block. > > > > > > > > > > > > xfs_fs_fill_super() allocates the super block info struct and > > > > > > sets > > > > > > it in the super block private info field, then calls > > > > > > xfs_parseargs() > > > > > > which still allocates mp->m_fsname at this point, to > > > > > > accomodate a > > > > > > similar free pattern in xfs_test_remount_options(). > > > > > > > > > > > > It then calls __xfs_fs_fill_super() which doesn't touch those > > > > > > fsname > > > > > > fields or mp to fit in with what will be done later. > > > > > > > > > > > > If an error occurs both the fsname fields (xfs_free_fsname()) > > > > > > and > > > > > > mp > > > > > > are freed by the main caller, xfs_fs_fill_super(). > > > > > > > > > > > > I think that process is ok. > > > > > > > > > > > > The mount api process that isn't active yet is a bit > > > > > > different. > > > > > > > > > > > > The context (ctx), a temporary working space, is allocated > > > > > > then > > > > > > saved > > > > > > in the mount context (fc) and the super block info is also > > > > > > allocated > > > > > > and saved in the mount context in it's field of the same name > > > > > > as > > > > > > the > > > > > > private super block info field, s_fs_info. > > > > > > > > > > > > The function xfs_fill_super() is called as a result of the > > > > > > .get_tree() > > > > > > mount context operation to fill the super block. > > > > > > > > > > > > During this process, when the VFS successfully allocates the > > > > > > super > > > > > > block s_fs_info is set in the super block and the mount > > > > > > context > > > > > > field set to NULL. From this point freeing the private super > > > > > > block > > > > > > info becomes part of usual freeing of the super block with > > > > > > the > > > > > > super > > > > > > operation .kill_sb(). > > > > > > > > > > > > But if the super block allocation fails then the mount > > > > > > context > > > > > > s_fs_info field remains set and is the responsibility of the > > > > > > mount context operations .fc_free() method to clean up. > > > > > > > > > > > > Now the VFS calls to xfs_fill_super() after this. > > > > > > > > > > > > I should have been able to leave xfs_fill_super() it as it > > > > > > was with: > > > > > > sb->s_fs_info = NULL; > > > > > > xfs_free_fsname(mp); > > > > > > kfree(mp); > > > > > > and that should have been ok but it wasn't, there was some > > > > > > sort > > > > > > of > > > > > > allocation problem, possibly a double free, causing a crash. > > > > > > > > > > > > Strictly speaking this cleanup process should be carried out > > > > > > by > > > > > > either the mount context .fc_free() or super operation > > > > > > .kill_sb() > > > > > > and that's what I want to do. > > > > > > > > > > Umm ... but I can't actually do that ... > > > > > > > > > > Looking back at xfs I realize that the filling of the super > > > > > block is meant to leave nothing allocated and set > > > > > sb->s_fs_info = NULL on error so that ->put_super() won't try > > > > > and cleanup a whole bunch of stuff that hasn't been done. > > > > > > > > > > Which brings me back to what I originally had above ... which > > > > > we believe doesn't work ? > > > > > > > > > > > > > It looks like perhaps the assignment of sb->s_fs_info was lost as > > > > well? > > > > Skipping to the end, I see xfs_init_fs_context() alloc mp and > > > > assign > > > > fc->s_fs_info. xfs_get_tree() leads to xfs_fill_super(), which > > > > somehow > > > > gets mp from sb->s_fs_info (not fc->...), but then resets sb- > > > > > s_fs_info > > > > on error and frees the names, leaving fs->s_fs_info so presumably > > > > xfs_fc_free() can free mp along with a couple of the names > > > > (again). I > > > > can't really make heads or tails of what this is even attempting > > > > to > > > > do. > > > > > > Ha, it seems a bit mysterious, but it's actually much simpler > > > than it appears. > > > > > > > Feel free to explain any of the above..? Where do you currently > > assign > > sb->s_fs_info, for example? > > The VFS does the assignment, see below. > > > > > > > That aside, it's not clear to me why the new code can't follow a > > > > similar > > > > pattern as the old code with regard to allocation. Allocate mp in > > > > xfs_fill_super() and set up sb/fc pointers, reset pointers and > > > > free > > > > mp > > > > on error return. Otherwise, xfs_fc_free() checks for fc- > > > > >s_fs_info != > > > > NULL and frees mp from there. Is there some reason we can't > > > > continue > > > > to > > > > do that? > > > > > > I think not without a fairly significant re-design. > > > > > > The main difference is the mount-api will allocate the super > > > block later than the old mount code. > > > > > > Basically, if file system parameter parsing fails the super > > > block won't get allocated. > > > > > > So the super block isn't available during parameter parsing > > > but the file system private data structure may be needed for > > > it, so it comes from the file system context at that point. > > > > > > When the super block is successfully allocated the file system > > > private data structure is set in the super block (and the field > > > NULLed in the context) and things progress much the same as > > > before from that point. > > > > > > That's the essential difference in the process AFAICS. > > > > > > > I see. This is probably something that should be noted in the commit > > log (that the ordering changes from before such that we need to > > allocate > > mp a bit earlier). This is reasonable because even though the current > > code allocs mp in the fill_super callback, we parse arguments > > immediately after the mp allocation and don't otherwise rely on the > > sb > > in that code. > > > > If I follow correctly, it sounds like perhaps we need to separate the > > management of sb->s_fs_info from the "ownership" of mp. For example, > > allocate mp, assign fc->s_fs_info and free via xfs_fc_free() as you > > do > > now. In the xfs_fill_super() callback, pull mp from fc->s_fs_info and > > assign it to sb->s_fs_info. If we fail at this point, reset > > sb->s_fs_info to NULL and let the fc infrastructure deal with freeing > > mp > > in its own callback. > > The mount api callbacks are for use during setup and don't really > apply once that's done by the mount api. > > Lets describe the life cycle of s_fs_info by going through the call > path from the xfs_get_tree() fs context operation. > > At this point the context has been setup by .init_fs_context ( aka. > xfs_init_fs_context()), and fc->s_fs_info == mp and parameter parsing > has been done. > > Then the .get_tree() method gets called (aka. fs_get_tree) and all > this does (now) is call get_tree_bdev() as Al requested. > > Even though that's changed it still ends up calling similar VFS > functions to what it did before. > > Now: > int get_tree_bdev(struct fs_context *fc, > int (*fill_super)(struct super_block *, > struct fs_context *)) > { > > skip ... (the usual stuff until we get to) > > fc->sb_flags |= SB_NOSEC; > fc->sget_key = bdev; > s = sget_fc(fc, test_bdev_super_fc, set_bdev_super_fc); > mutex_unlock(&bdev->bd_fsfreeze_mutex); > if (IS_ERR(s)) > return PTR_ERR(s); > > if (s->s_root) { > snip ... (mount root already exists) > } else { > s->s_mode = mode; > snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev); > sb_set_blocksize(s, block_size(bdev)); > error = fill_super(s, fc); > if (error) { > deactivate_locked_super(s); > return error; > } > > s->s_flags |= SB_ACTIVE; > bdev->bd_super = s; > } > > BUG_ON(fc->root); > fc->root = dget(s->s_root); > return 0; > } > > This is to establish the order of what's done. > > The thing we want to look at is the sget_fc() function. > > struct super_block *sget_fc(struct fs_context *fc, > int (*test)(struct super_block *, struct fs_context *), > int (*set)(struct super_block *, struct fs_context *)) > { > snip ... (usual stuff that finds or allocates a super block) > > s->s_fs_info = fc->s_fs_info; > err = set(s, fc); > if (err) { > s->s_fs_info = NULL; > spin_unlock(&sb_lock); > destroy_unused_super(s); > return ERR_PTR(err); > } > fc->s_fs_info = NULL; > > snip ... (remaining setup stuff) > > return s; > } > > This is where the s_fs_info changes from being the responsibility > of the fs context to being the responsibility of the super block. > Ok, so basically we allocate the mp earlier so the mount api stuff can do mount option parsing and such, the vfs allocs the sb and transfers ->s_fs_info from the context to the sb and thus at this point the responsibility to free mp transfers from xfs_fc_free() to either xfs_fill_super() (on error) or xfs_fs_put_super(). Makes sense I think, thanks for the explanation (though again I think this is the kind of thing that should be documented in the commit log). > And you can see that this is done before fill_super() is called. > So what is the expected behavior if a failure occurs after sget_fc() does the ->s_fs_info transfer and before fill_super() is called? > Finally, fill_super() in xfs will do all the fs specific setup > and either return success or it will undo anything it has done > when returning an error, just as it did previously. > > Specifically, it must set sb->s_fs_info == NULL on failure so > that when the VFS puts the super block xfs doesn't try and > release a whole bunch of stuff that wasn't setup. Consequently > mp must also be freed on error return from fill_super(). > Ok. > What I'm saying is that by the time fill_super() is called > everything is is a state where things need to be done pretty > much as though the mount api context isn't present, the only > thing that muddies the order of things is the calling of > .fc_free() that happens a bit later but isn't relevant to > this ordering of operations. > So the intent is that in the event of a fill_super() failure as described above, the ->s_fs_info transfer would have already happened and fc->s_fs_info would be NULL, right? > So I'm not sure there needs to be (or can be) changes to > the way s_fs_info is handled in the implementation. > > We can go into more detail if it's needed, I am trying to > explain my motivations for doing what I've done, ;) > > > > > What I'm not clear on is whether something like xfs_fs_put_super() > > should still free mp as well. Once the filesystem successfully > > mounts, > > are we still going to see an xfs_fc_free() callback, or is this all > > just > > transient mount path stuff? If the former, perhaps put_super() should > > also not free mp and just reset its own ->s_fs_info reference. If the > > latter, then I guess we just need to understand at what point during > > a > > successful mount responsibility to free transfers from one place to > > the > > other. Thoughts? > > put_super() should continue to function the same way it does > now provided fill_super() does the right thing, which I'm > pretty sure it (the mount api version) does now . > > xfs_fs_put_super() won't need to free mp if an error occurs, > s_fs_info needs to be NULL in that case as has always been the > case. > > And realize xfs_fc_free() (aka. .fc_free()) is largely irrelevant > from the POV of xfs once we get to fill_super(). > Ok. Brian > > > > Brian > > > > > By the time fill_super() is called everything is set and you > > > should be able to proceed almost the same as before. > > > > > > Ian > > > > > > > Brian > > > > > > > > > > So I'm not sure the allocation time and the place this is > > > > > > done > > > > > > can (or should) be done differently. > > > > > > > > > > > > And that freeing on error exit from xfs_fill_super() is > > > > > > definitely > > > > > > wrong now! Ha, and I didn't see any crashes myself when I > > > > > > tested > > > > > > it ... maybe I need a reproducer ... > > > > > > > > > > > > Ian > > > > > > > > > > > > > Brian > > > > > > > > > > > > > > > + return error; > > > > > > > > +} > > > > > > > > + > > > > > > > > +STATIC int > > > > > > > > +xfs_get_tree( > > > > > > > > + struct fs_context *fc) > > > > > > > > +{ > > > > > > > > + return vfs_get_block_super(fc, xfs_fill_super); > > > > > > > > +} > > > > > > > > + > > > > > > > > STATIC void > > > > > > > > xfs_fs_put_super( > > > > > > > > struct super_block *sb) > > > > > > > > @@ -2003,6 +2048,11 @@ static const struct > > > > > > > > super_operations > > > > > > > > xfs_super_operations = { > > > > > > > > .free_cached_objects = xfs_fs_free_cached_objects, > > > > > > > > }; > > > > > > > > > > > > > > > > +static const struct fs_context_operations > > > > > > > > xfs_context_ops = > > > > > > > > { > > > > > > > > + .parse_param = xfs_parse_param, > > > > > > > > + .get_tree = xfs_get_tree, > > > > > > > > +}; > > > > > > > > + > > > > > > > > static struct file_system_type xfs_fs_type = { > > > > > > > > .owner = THIS_MODULE, > > > > > > > > .name = "xfs", > > > > > > > > >