Re: [RFC PATCH v3 02/61] e2fsck: copy context when using multi-thread fsck

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Thu, 17 Dec 2020 17:27:25 -0800

On Fri, Dec 18, 2020 at 09:13:25AM +0800, Wang Shilong wrote:
> On Fri, Dec 18, 2020 at 8:01 AM Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
> >
> > On Wed, Nov 18, 2020 at 07:38:48AM -0800, Saranya Muruganandam wrote:
> > > From: Li Xi <lixi@xxxxxxx>
> > >
> > > This patch only copy the context to a new one when -m is enabled.
> > > It doesn't actually start any thread. When pass1 test finishes,
> > > the new context is copied back to the original context.
> > >
> > > Since the signal handler only changes the original context, so
> > > add global_ctx in "struct e2fsck_struct" and use that to check
> > > whether there is any signal of canceling.
> > >
> > > This patch handles the long jump properly so that all the existing
> > > tests can be passed even the context has been copied. Otherwise,
> > > test f_expisize_ea_del would fail when aborting.
> > >
> > > Signed-off-by: Li Xi <lixi@xxxxxxx>
> > > Signed-off-by: Wang Shilong <wshilong@xxxxxxx>
> > > Signed-off-by: Saranya Muruganandam <saranyamohan@xxxxxxxxxx>
> > > ---
> > >  e2fsck/pass1.c | 114 +++++++++++++++++++++++++++++++++++++++++++++----
> > >  e2fsck/unix.c  |   1 +
> > >  2 files changed, 107 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
> > > index 8eecd958..64d237d3 100644
> > > --- a/e2fsck/pass1.c
> > > +++ b/e2fsck/pass1.c
> > > @@ -1144,7 +1144,22 @@ static int quota_inum_is_reserved(ext2_filsys fs, ext2_ino_t ino)
> > >       return 0;
> > >  }
> > >
> > > -void e2fsck_pass1(e2fsck_t ctx)
> > > +static int e2fsck_should_abort(e2fsck_t ctx)
> > > +{
> > > +     e2fsck_t global_ctx;
> > > +
> > > +     if (ctx->flags & E2F_FLAG_SIGNAL_MASK)
> > > +             return 1;
> > > +
> > > +     if (ctx->global_ctx) {
> > > +             global_ctx = ctx->global_ctx;
> > > +             if (global_ctx->flags & E2F_FLAG_SIGNAL_MASK)
> > > +                     return 1;
> > > +     }
> > > +     return 0;
> > > +}
> > > +
> > > +void e2fsck_pass1_thread(e2fsck_t ctx)
> > >  {
> > >       int     i;
> > >       __u64   max_sizes;
> > > @@ -1360,7 +1375,7 @@ void e2fsck_pass1(e2fsck_t ctx)
> > >               if (ino > ino_threshold)
> > >                       pass1_readahead(ctx, &ra_group, &ino_threshold);
> > >               ehandler_operation(old_op);
> > > -             if (ctx->flags & E2F_FLAG_SIGNAL_MASK)
> > > +             if (e2fsck_should_abort(ctx))
> > >                       goto endit;
> > >               if (pctx.errcode == EXT2_ET_BAD_BLOCK_IN_INODE_TABLE) {
> > >                       /*
> > > @@ -1955,7 +1970,7 @@ void e2fsck_pass1(e2fsck_t ctx)
> > >               if (process_inode_count >= ctx->process_inode_size) {
> > >                       process_inodes(ctx, block_buf);
> > >
> > > -                     if (ctx->flags & E2F_FLAG_SIGNAL_MASK)
> > > +                     if (e2fsck_should_abort(ctx))
> > >                               goto endit;
> > >               }
> > >       }
> > > @@ -2068,6 +2083,89 @@ endit:
> > >       else
> > >               ctx->invalid_bitmaps++;
> > >  }
> > > +
> > > +static errcode_t e2fsck_pass1_thread_prepare(e2fsck_t global_ctx, e2fsck_t *thread_ctx)
> > > +{
> > > +     errcode_t       retval;
> > > +     e2fsck_t        thread_context;
> > > +
> > > +     retval = ext2fs_get_mem(sizeof(struct e2fsck_struct), &thread_context);
> >
> > Hm, so I guess the strategy here is that parallel e2fsck makes
> > per-thread copies of the ext2_filsys and e2fsck_t global contexts?
> > And then after the threaded parts complete, each thread merges its
> > per-thread contexts back into the global one, right?
> 
> Yes.
> 
> >
> > This means that we have to be careful to track which fields in those
> > cloned contexts have been updated by the thread so that we can copy them
> > back and not lose any data.
> >
> > I'm wondering if for future maintainability it would be better to track
> > the per-thread data in a separate structure to make it very explicit
> > which data (sub)structures are effectively per-thread and hence don't
> > require locking?
> 
> Maybe use a per-thread structure is better maintained, but i am not sure
> we could remove locking completely.
> 
> Locking is mostly used for fix, because fixing is serialized now
> and for some global structure which could be used seldomly
> but could simplify codes.

<nod> I was assuming that you'd still put a lock in the global structure
and use it for data fields that aren't so frequently accessed.

--D

> >
> > (I ask that mostly because I'm having a hard time figuring out which
> > fields are supposed to be shared and which ones aren't...)
> >
> > --D
> >
> > > +     if (retval) {
> > > +             com_err(global_ctx->program_name, retval, "while allocating memory");