Re: [PATCH] xfs_repair: fix progress reporting

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Mon, 18 May 2020 18:13:35 -0700

On Mon, May 18, 2020 at 08:03:09PM -0500, Eric Sandeen wrote:
> On 5/18/20 7:58 PM, Darrick J. Wong wrote:
> > On Mon, May 18, 2020 at 05:35:33PM -0500, Eric Sandeen wrote:
> >> Long ago, a young developer tried to fix a segfault in xfs_repair where
> >> a short progress reporting interval could cause a timer to go off and try
> >> to print a progress mesage before any had been properly set up because
> >> we were still busy zeroing the log, and a NULL pointer dereference
> >> ensued.
> >>
> >> That young developer got it wrong, and completely broke progress
> >> reporting, because the change caused us to exit early from the pthread
> >> start routine, and not initialize the progress timer at all.
> >>
> >> That developer is now slightly older and wiser, and finally realizes that
> >> the simple and correct solution here is to initialize the message format
> >> to the first one in the list, so that we will be ready to go with a
> >> progress message no matter when the first timer goes off.
> >>
> >> Reported-by: Leonardo Vaz <lvaz@xxxxxxxxxx>
> >> Fixes: 7f2d6b811755 ("xfs_repair: avoid segfault if reporting progre...")
> >> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
> >> ---
> >>
> >> It might be nice to add progress reporting for the log zeroing, but that
> >> requires renumbering all these macros, and we don't/can't actually get
> >> any fine-grained progress at all, so probably not worth it.
> >>
> >> diff --git a/repair/progress.c b/repair/progress.c
> >> index 5ee08229..d7baa606 100644
> >> --- a/repair/progress.c
> >> +++ b/repair/progress.c
> >> @@ -125,7 +125,11 @@ init_progress_rpt (void)
> >>  	 */
> >>  
> >>  	pthread_mutex_init(&global_msgs.mutex, NULL);
> >> -	global_msgs.format = NULL;
> >> +	/*
> >> +	 * Ensure the format string is not NULL in case the first timer
> >> +	 * goes off before any stage calls set_progress_msg() to set it.
> >> +	 */
> >> +	global_msgs.format = &progress_rpt_reports[0];
> > 
> > Hmm so does that mean the first progress report could be for "scanning
> > freespace"?
> 
> Yes.  But unless "zeroing the log" takes more than the report interval,
> which by default is 15 minutes, it won't be wrong.
> 
> > Or could you append a new entry to progress_rpt_reports for "getting my
> > shit together and moving out of my parents basement" and initialize it
> > to that?
> 
> Oh I guess it could be appended and wouldn't have to be out of order but
> honestly I don't think it's worth it, even a big slow log shouldn't take
> long enough to need a progress report.

Zeroing a 2048MB log in 900s = 2.28MB/s

So I guess that's unlikely... but it still feels like leaving some kind
of weird logic bomb lurking where if we decrease the interval or someone
throws us a slow cloudy block store, we'll start issuing weird progress
messsages about some other part of xfs_repair which hasn't even started
yet.  <shrug>

--D

> -Eric
> 
> > --D
> > 
> >>  	global_msgs.count = glob_agcount;
> >>  	global_msgs.interval = report_interval;
> >>  	global_msgs.done   = prog_rpt_done;
> >> @@ -171,10 +175,6 @@ progress_rpt_thread (void *p)
> >>  	msg_block_t *msgp = (msg_block_t *)p;
> >>  	uint64_t percent;
> >>  
> >> -	/* It's possible to get here very early w/ no progress msg set */
> >> -	if (!msgp->format)
> >> -		return NULL;
> >> -
> >>  	if ((msgbuf = (char *)malloc(DURATION_BUF_SIZE)) == NULL)
> >>  		do_error (_("progress_rpt: cannot malloc progress msg buffer\n"));
> >>  
> >>
> >