On 10/17/13 12:50 PM, Eric Sandeen wrote: > For a very large filesystem, zeroing the log may take some time. > > If we ask for progress reports frequently enough that one fires > before we finish with log zeroing, we try to use a progress format > which has not yet been set up, and segfault: > > # mkfs.xfs -d size=60t,file,name=fsfile > # xfs_repair -m 9000 -o ag_stride=32 -t 1 fsfile > Phase 1 - find and verify superblock... > - reporting progress in intervals of 1 seconds > Phase 2 - using internal log > - zero log... > Segmentation fault > > (gdb) bt > #0 0x0000000000426962 in progress_rpt_thread (p=0x67ad20) at progress.c:234 > #1 0x0000003b98a07851 in start_thread (arg=0x7f19d8e47700) at pthread_create.c:301 > #2 0x0000003b982e767d in ?? () > #3 0x0000000000000000 in ?? () > (gdb) p msgp > $1 = (msg_block_t *) 0x67ad20 > (gdb) p msgp->format > $2 = (progress_rpt_t *) 0x0 > (gdb) > > I suppose we could rig up progress reports for log zeroing, but > that won't usually take terribly long; for now, be defensive > and init the message->format to NULL, and just return early > from the progress thread if we've not yet set up any message. > > (Sure, global_msgs is global, and ->format is already NULL, > but to me it's worth being explicit since we will test it). > > Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx> > --- > > diff --git a/repair/progress.c b/repair/progress.c > index ab320dc..45a412e 100644 > --- a/repair/progress.c > +++ b/repair/progress.c > @@ -124,6 +124,7 @@ init_progress_rpt (void) > */ > > pthread_mutex_init(&global_msgs.mutex, NULL); > + global_msgs.format = NULL; > global_msgs.count = glob_agcount; > global_msgs.interval = report_interval; > global_msgs.done = prog_rpt_done; > @@ -169,6 +170,10 @@ progress_rpt_thread (void *p) > msg_block_t *msgp = (msg_block_t *)p; > __uint64_t percent; > > + /* It's possible to get here very early w/ no progress msg set */ > + if (!msgp->format) > + return NULL; > + > if ((msgbuf = (char *)malloc(DURATION_BUF_SIZE)) == NULL) > do_error (_("progress_rpt: cannot malloc progress msg buffer\n")); Dammit: CID 1107596: Data race condition (MISSING_LOCK) /repair/progress.c: 127 ( missing_lock) 124 */ 125 126 pthread_mutex_init(&global_msgs.mutex, NULL); >>> CID 1107596: Data race condition (MISSING_LOCK) >>> Accessing "global_msgs.format" without holding lock "msg_block_s.mutex". Elsewhere, "global_msgs.format" is accessed with "msg_block_s.mutex" held 2 out of 2 times. 127 global_msgs.format = NULL; 128 global_msgs.count = glob_agcount; 129 global_msgs.interval = report_interval; 130 global_msgs.done = prog_rpt_done; 131 global_msgs.total = &prog_rpt_total; Probably best to just drop the new NULL assignment, since it's a global init'd to 0 anyway, to shut up coverity? -Eric _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs