Re: [patch, v3] add an aio test which closes the fd before destroying the ioctx

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Wed, Aug 27, 2014 at 06:49:22PM +1000, Dave Chinner wrote:
> On Tue, Aug 26, 2014 at 10:27:40AM -0700, Zach Brown wrote:
> > On Tue, Aug 26, 2014 at 12:05:11PM -0400, Jeff Moyer wrote:
> > > Benjamin LaHaise <bcrl@xxxxxxxxx> writes:
> > > 
> > > > Does someone already have a simple test case we can add to the libaio test 
> > > > suite to verify this behaviour?
> > > 
> > > I can't reproduce this problem using a loop device, which is what the
> > > libaio test suite uses.  Even when using real hardware, you have to have
> > > disks that are slow enough in order for this to trigger reliably (or
> > > at all).
> > 
> > I wonder if you could use something like dm suspend to abuse indefinite
> > latencies.
> > 
> > > I could write a more targeted test within xfstests, but I don't think
> > > that's strictly necessary (it would just make it more clear what the
> > > expectations are, and maybe bump the hit rate percentage up).
> > 
> > I think it'd be worth it (he says, not commiting *his* time).  It would
> > have been nice if a targeted test helped Dave raise the alarm
> > immediately rather than gnaw away at his brain with inconsistent mostly
> > unrelated failures for months.
> 
> I'm not sure it's worth the effort. now we have two tests that have
> triggered the same problem, I've been easily able to reproduce it
> with 2 VMs with test/scratch image files sharing the same spindle.
> i.e. run xfstests in one VM, run generic/323 in the other VM, and
> it reproduces fairly easily.
> 
> I'm just running it in a loop now to measure how successfully I'm
> reproducing the problem, then I'll apply the fix and see if it gets
> better. If it does get better, then I'll keep the patch around
> locally until it is upstream, and then I'll shout whenever I see
> this problem occur again....

Ok, so of 32 executions in a tight loop of generic/323, only 5
executions passed while 27 failed.

With the patch suggested, it failed the first 5 executions, so I
don't think it fixes the problem.

BTW, generic/323 is pulling 8,000 read IOPS and 500MB/s from my
single spindle. Methinks that the test file is resident in the BBWC
on the RAID controller, which may be why nobody else is reproducing
this problem....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux