Re: testcase 011 trips and ASSERT in x86_64 too

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2011-03-13 at 11:48 +1100, Dave Chinner wrote:

Thanks for your response, Dave.

<snip>
> As i said before, the debug check is known to be racy. Having it
> trigger is not necessarily a sign of a problem. I have only ever
> tripped it once since the way the check operates was changed.
> There's no point in spending time trying to analyse it and explain
> it as we already know why and how it can trigger in a racy manner.

Oh, may be I misunderstood. In your earlier reply you mentioned that you
wanted to know if the problem is consistently reproducible. Since it
was, I went on to debug the problem. 

If it is not an issue, it will be a good idea to reduce that ASSERT to
WARN_ON_ONCE() as you mentioned.

> 
> > Then I started comparing the behavioral difference bet the two ARCHs,
> > and I found that in POWER I see more number of threads at a time (max of
> > 4 threads) in the function xlog_grant_log_space(), whereas in x86_64 I
> > see max of only two and mostly it is only one.
> > 
> > I also noted that in POWER test case 011 takes about 8 seconds whereas
> > in x86_64, it takes about 165 seconds.
> > 
> > So, I ventured into the core of test case 011, dirstress, and found that
> > simply creating 1000s of files under a directory takes very long time in
> > x86_64 compare to POWER(1 min 15s Vs 2s)
> 
> On my x86-64 boxes, test 011 takes 3s with CONFIG_XFS_DEBUG=y, all
> lock checking turned on, memory poisoning active, etc. With a
> prodution kernel, it usually takes 1s. Even on a single SATA drive.
> 
> So, without knowing anything about your x86-64 machine, I'd say
> there's something wrong with it or it's configuration. Try turning
> off barriers and seeing if that makes it go faster....

Slowness happened in two x86_64 blades. 

In the blade where the storage is a SSD device, nobarrier helped
drastically.
==========
[root@test27 chandra]# mount -o nobarrier /dev/disk/by-id/wwn-0x5000a7203002f7e4-part1 /mnt/xfsMntPt/
[root@test27 chandra]# time ./b /mnt/xfsMntPt/d1/ 10000 1
i 0

real	0m1.983s
user	0m0.026s
sys	0m1.365s
===================

Whereas, in the blade where the storage is a SAN disk, it didn't help
much. Note that I verified the disk is performing fine by using a ext4
filesystem.
===================
[root@test65 chandra]# mount /dev/sdb1 /mnt/xfs
[root@test65 chandra]# mount /dev/sdb2 /mnt/ext4
[root@test65 chandra]# tail -2 /proc/mounts 
/dev/sdb1 /mnt/xfs xfs rw,seclabel,relatime,attr2,noquota 0 0
/dev/sdb2 /mnt/ext4 ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
[root@test65 chandra]# time ./b /mnt/ext4/d1 10000 1
i 0

real    0m0.332s
user    0m0.006s
sys     0m0.264s
[root@test65 chandra]# time ./b /mnt/xfs/d1 10000 1
i 0

real    1m35.620s
user    0m0.012s
sys     0m0.735s
[root@test65 chandra]# mount -o nobarrier /dev/sdb1 /mnt/xfs
[root@test65 chandra]# tail -2 /proc/mounts
/dev/sdb2 /mnt/ext4 ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0
/dev/sdb1 /mnt/xfs xfs rw,seclabel,relatime,attr2,nobarrier,noquota 0 0
[root@test65 chandra]# time ./b /mnt/xfs/d1 10000 1
i 0

real    1m6.772s
user    0m0.011s
sys     0m0.739s
========================

What else could affect the behavior like this ?

Also, note that in power I get the fast performace with barrier on.

Thanks,

chandra

<snip>


_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs


[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux