Re: Commit 7eaceaccab5f40 causing boot hang.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2011-04-04 at 14:47 +0100, Richard Kennedy wrote:
> On Thu, 2011-03-31 at 15:49 +0100, Richard Kennedy wrote:
> > On Thu, 2011-03-31 at 15:33 +0200, Jens Axboe wrote:
> >[...]
> > > >>> Hi Jens,
> > > >>>
> > > >>> I'm seeing a problem with fio never completing when writing to 2 disks
> > > >>> simultaneously. In my test case I'm writing 2Gb to both a LVM volume & a
> > > >>> pata drive on x86_64 on a AMD X2. Could this be a related issue?
> > > >>>
> > > >>> I'm not getting anything reported in the log, lockup detection doesn't
> > > >>> report anything either. The write seems to have finished (the disk light
> > > >>> activity has stopped) and the cpu cores are both below 10% usage, but
> > > >>> fio never returns. The test does complete some times, but it seems to be
> > > >>> one 1 in 4.
> > > >>
> > > >> So when you say PATA, it's /dev/hdaX something as well?
> > > >>
> > > >>> I'm going to try tracing it and see if I can spot where it's stuck.
> > > >>
> > > >> Thanks, that would be nice.
> > > >>
> > > > The second drive is /dev/sdb1 mounted on /opt, both file systems are
> > > > ext4.
> > > 
> > > So probably not related. What does the fio job look like?
> > > 
> > fio job file --
> > [global]
> > pre_read=1
> > ioengine=mmap
> > 
> > [f1]
> > size=2g
> > rw=write
> > directory=/home/tests
> > 
> > [f2]
> > size=2g
> > rw=write
> > directory=/opt/tests
> > 
> > Fio gets run from a script that also collects stats but it's been
> > running without any problems up until 2.6.39-rc1.
> > 
> Hi Jens
> I've upgrade to the latest fio version in the git repo 1.51 and I'm
> still seeing this problem. 
> 
> Fio gets stuck after it writes the 100% complete message and strace on
> the processes shows this.
> 
> the controlling fio process :- 
>  ...
> [pid  8439] wait4(8442, 0x7fff848203ac, WNOHANG, NULL) = 0
> [pid  8439] nanosleep({0, 10000000}, NULL) = 0
> [pid  8439] wait4(8441, 0x7fff848203ac, WNOHANG, NULL) = 0
> [pid  8439] wait4(8442, 0x7fff848203ac, WNOHANG, NULL) = 0
> [pid  8439] nanosleep({0, 10000000}
> 
> & the 2 workers are both stopped here, strace shows only the one line
> for each process.
> 
> Process 8441 attached - interrupt to quit
> futex(0x7f9db76a802c, FUTEX_WAIT_PRIVATE, 2, NULL
> 
> 
> Process 8442 attached - interrupt to quit
> futex(0x7f9db76a802c, FUTEX_WAIT_PRIVATE, 2, NULL
> 
> How do I find out which futex it's waiting for? 
> Any ideas where I should look next ?
> 
> I can run the same test successfully on 2.6.38 so is it worth trying to
> bisect this ? 
> 
> thanks 
> Richard
> 
My problem has gone away in v2.6.39-rc3.
I've just finished bisecting it down to 6de9843dab3f, & that got
reverted in rc3, so no problem ;)

(The data corruption caused by that faulty commit was zeroing out the
shared mutexs in fio & the worker threads were getting stuck on the
writeout_mutex.)

regards
Richard




--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux