On Mon, May 18, 2020 at 08:03:09PM -0500, Eric Sandeen wrote: > On 5/18/20 7:58 PM, Darrick J. Wong wrote: > > On Mon, May 18, 2020 at 05:35:33PM -0500, Eric Sandeen wrote: > >> Long ago, a young developer tried to fix a segfault in xfs_repair where > >> a short progress reporting interval could cause a timer to go off and try > >> to print a progress mesage before any had been properly set up because > >> we were still busy zeroing the log, and a NULL pointer dereference > >> ensued. > >> > >> That young developer got it wrong, and completely broke progress > >> reporting, because the change caused us to exit early from the pthread > >> start routine, and not initialize the progress timer at all. > >> > >> That developer is now slightly older and wiser, and finally realizes that > >> the simple and correct solution here is to initialize the message format > >> to the first one in the list, so that we will be ready to go with a > >> progress message no matter when the first timer goes off. > >> > >> Reported-by: Leonardo Vaz <lvaz@xxxxxxxxxx> > >> Fixes: 7f2d6b811755 ("xfs_repair: avoid segfault if reporting progre...") > >> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx> > >> --- > >> > >> It might be nice to add progress reporting for the log zeroing, but that > >> requires renumbering all these macros, and we don't/can't actually get > >> any fine-grained progress at all, so probably not worth it. > >> > >> diff --git a/repair/progress.c b/repair/progress.c > >> index 5ee08229..d7baa606 100644 > >> --- a/repair/progress.c > >> +++ b/repair/progress.c > >> @@ -125,7 +125,11 @@ init_progress_rpt (void) > >> */ > >> > >> pthread_mutex_init(&global_msgs.mutex, NULL); > >> - global_msgs.format = NULL; > >> + /* > >> + * Ensure the format string is not NULL in case the first timer > >> + * goes off before any stage calls set_progress_msg() to set it. > >> + */ > >> + global_msgs.format = &progress_rpt_reports[0]; > > > > Hmm so does that mean the first progress report could be for "scanning > > freespace"? > > Yes. But unless "zeroing the log" takes more than the report interval, > which by default is 15 minutes, it won't be wrong. > > > Or could you append a new entry to progress_rpt_reports for "getting my > > shit together and moving out of my parents basement" and initialize it > > to that? > > Oh I guess it could be appended and wouldn't have to be out of order but > honestly I don't think it's worth it, even a big slow log shouldn't take > long enough to need a progress report. Zeroing a 2048MB log in 900s = 2.28MB/s So I guess that's unlikely... but it still feels like leaving some kind of weird logic bomb lurking where if we decrease the interval or someone throws us a slow cloudy block store, we'll start issuing weird progress messsages about some other part of xfs_repair which hasn't even started yet. <shrug> --D > -Eric > > > --D > > > >> global_msgs.count = glob_agcount; > >> global_msgs.interval = report_interval; > >> global_msgs.done = prog_rpt_done; > >> @@ -171,10 +175,6 @@ progress_rpt_thread (void *p) > >> msg_block_t *msgp = (msg_block_t *)p; > >> uint64_t percent; > >> > >> - /* It's possible to get here very early w/ no progress msg set */ > >> - if (!msgp->format) > >> - return NULL; > >> - > >> if ((msgbuf = (char *)malloc(DURATION_BUF_SIZE)) == NULL) > >> do_error (_("progress_rpt: cannot malloc progress msg buffer\n")); > >> > >> > >