On 5/18/20 8:13 PM, Darrick J. Wong wrote: > On Mon, May 18, 2020 at 08:03:09PM -0500, Eric Sandeen wrote: >> On 5/18/20 7:58 PM, Darrick J. Wong wrote: >>> On Mon, May 18, 2020 at 05:35:33PM -0500, Eric Sandeen wrote: >>>> Long ago, a young developer tried to fix a segfault in xfs_repair where >>>> a short progress reporting interval could cause a timer to go off and try >>>> to print a progress mesage before any had been properly set up because >>>> we were still busy zeroing the log, and a NULL pointer dereference >>>> ensued. >>>> >>>> That young developer got it wrong, and completely broke progress >>>> reporting, because the change caused us to exit early from the pthread >>>> start routine, and not initialize the progress timer at all. >>>> >>>> That developer is now slightly older and wiser, and finally realizes that >>>> the simple and correct solution here is to initialize the message format >>>> to the first one in the list, so that we will be ready to go with a >>>> progress message no matter when the first timer goes off. >>>> >>>> Reported-by: Leonardo Vaz <lvaz@xxxxxxxxxx> >>>> Fixes: 7f2d6b811755 ("xfs_repair: avoid segfault if reporting progre...") >>>> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx> >>>> --- >>>> >>>> It might be nice to add progress reporting for the log zeroing, but that >>>> requires renumbering all these macros, and we don't/can't actually get >>>> any fine-grained progress at all, so probably not worth it. >>>> >>>> diff --git a/repair/progress.c b/repair/progress.c >>>> index 5ee08229..d7baa606 100644 >>>> --- a/repair/progress.c >>>> +++ b/repair/progress.c >>>> @@ -125,7 +125,11 @@ init_progress_rpt (void) >>>> */ >>>> >>>> pthread_mutex_init(&global_msgs.mutex, NULL); >>>> - global_msgs.format = NULL; >>>> + /* >>>> + * Ensure the format string is not NULL in case the first timer >>>> + * goes off before any stage calls set_progress_msg() to set it. >>>> + */ >>>> + global_msgs.format = &progress_rpt_reports[0]; >>> >>> Hmm so does that mean the first progress report could be for "scanning >>> freespace"? >> >> Yes. But unless "zeroing the log" takes more than the report interval, >> which by default is 15 minutes, it won't be wrong. >> >>> Or could you append a new entry to progress_rpt_reports for "getting my >>> shit together and moving out of my parents basement" and initialize it >>> to that? >> >> Oh I guess it could be appended and wouldn't have to be out of order but >> honestly I don't think it's worth it, even a big slow log shouldn't take >> long enough to need a progress report. > > Zeroing a 2048MB log in 900s = 2.28MB/s > > So I guess that's unlikely... but it still feels like leaving some kind > of weird logic bomb lurking where if we decrease the interval or someone > throws us a slow cloudy block store, we'll start issuing weird progress > messsages about some other part of xfs_repair which hasn't even started > yet. <shrug> welp, alright, I'll add a log status message.