On Wed, Apr 29, 2015 at 06:49:25AM +1000, Dave Chinner wrote: > On Wed, Apr 29, 2015 at 12:56:34AM +0800, Eryu Guan wrote: > > Hi, > > > > I was testing v4.1-rc1 kernel and hit generic/204 failure on 512b block > > size v4 xfs and 1k block size v5 xfs. And this seems to be a regression > > since v4.0 > > Firstly, knowing your exact test machine and xfstests configuration > is important here, so: > > http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F Thanks, I'll follow it next time. (I know about this link, but I hit the issue on different hosts, both vm and baremetal, so I thought it's not relevant to hardware, but I still missed the test configs..) > > > [root@dhcp-66-86-11 xfstests]# MKFS_OPTIONS="-b size=512" ./check generic/204 > > FSTYP -- xfs (non-debug) > > PLATFORM -- Linux/x86_64 dhcp-66-86-11 4.0.0-rc1+ > > MKFS_OPTIONS -- -f -b size=512 /dev/sda6 > > MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 /dev/sda6 /mnt/testarea/scratch > > > > generic/204 8s ... - output mismatch (see /root/xfstests/results//generic/204.out.bad) > > --- tests/generic/204.out 2014-12-11 00:28:13.409000000 +0800 > > +++ /root/xfstests/results//generic/204.out.bad 2015-04-29 00:36:43.232000000 +0800 > > @@ -1,2 +1,37664 @@ > > QA output created by 204 > > +./tests/generic/204: line 83: /mnt/testarea/scratch/108670: No space left on device > > +./tests/generic/204: line 84: /mnt/testarea/scratch/108670: No space left on device > > ... > > I bisected to this commit > > > > e88b64e xfs: use generic percpu counters for free inode counter Sorry, I pasted the wrong commit (again..), it should be 501ab32 xfs: use generic percpu counters for inode counter > > I don't think that this is the actual cause of the issue, because I > have records of generic/204 failing on 1k v5 filesystems every so > often going back to the start of the log file I have for my v5/1k > test config: > > $ grep "Failures\|EST" results/check.log |grep -B 1 generic/204 > Wed Jun 19 11:26:35 EST 2013 > Failures: generic/204 generic/225 generic/231 generic/263 generic/306 > Wed Jun 19 12:49:08 EST 2013 > Failures: generic/204 generic/225 generic/231 generic/263 generic/270 > -- > Mon Jul 8 17:23:44 EST 2013 > Failures: generic/204 > Mon Jul 8 20:37:50 EST 2013 > Failures: generic/204 generic/225 generic/231 generic/263 generic/306 > -- > Thu Jul 18 16:55:26 EST 2013 > Failures: generic/015 generic/077 generic/193 generic/204 > -- > Mon Jul 29 19:42:49 EST 2013 > Failures: generic/193 generic/204 generic/225 generic/230 generic/231 > Mon Aug 12 19:40:53 EST 2013 > Failures: generic/193 generic/204 generic/225 generic/230 generic/23 > .... I noticed that the failures are quite old, generic/204 got updated several times to make it pass in 2014, especially this commit 31a50c7 generic/204: tweak reserve pool size (Mon Apr 28 10:54:27 2014) The commit log says 'This makes the test pass on a filesystem made with MKFS_OPTIONS="-b size=1024 -m crc=1".' So I think it's a new failure since v4.0 > > > Seems like the same issue this patch tries to fix, but test still fails > > after applying this patch. > > > > [PATCH v2] xfs: use percpu_counter_read_positive for mp->m_icount > > http://oss.sgi.com/archives/xfs/2015-04/msg00195.html > > > > Not sure if it's the expected behavior/a known issue, report it to the > > list anyway. > > Repeating the test on v4/512b, I get the same result as you. > > $ cat results/generic/204.full > files 127500, resvblks 1024 > reserved blocks = 1024 > available reserved blocks = 1024 > $ > > Ok, those numbers add up to exactly 97,920,000 bytes, as per the > test config. > > $ sudo mount /dev/vdb /mnt/scratch > $ df -h /mnt/scratch > Filesystem Size Used Avail Use% Mounted on > /dev/vdb 99M 87M 13M 88% /mnt/scratch > $ df -i /mnt/scratch > Filesystem Inodes IUsed IFree IUse% Mounted on > /dev/vdb 108608 108608 0 100% /mnt/scratch > $ > > And for v5/1k: > > $ sudo mkfs.xfs -f -m crc=1,finobt=1 -b size=1k -d size=$((106 * 1024 * 1024)) -l size=7m /dev/vdb > meta-data=/dev/vdb isize=512 agcount=4, agsize=27136 blks > = sectsz=512 attr=2, projid32bit=1 > = crc=1 finobt=1 > data = bsize=1024 blocks=108544, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 ftype=1 > log =internal log bsize=1024 blocks=7168, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > $ sudo mount /dev/vdb /mnt/scratch > $ df -i /mnt/scratch > Filesystem Inodes IUsed IFree IUse% Mounted on > /dev/vdb 54272 3 54269 1% /mnt/scratch > $ > > Yup, it's clear *why* it is failing, too. There aren't enough free > inodes configured by mkfs. That means it's the mkfs imaxpct config > that is the issue here, not the commit that made the max inode > threshold more accurate... I did some comparison on "good" kernel and "bad" kernel(output of xfs_info, df -i, df -h and 204.full after test), here is the diff [root@dhcp-66-86-11 xfstests]# diff -Nu 204.good 204.bad --- 204.good 2015-04-29 22:00:13.274000000 +0800 +++ 204.bad 2015-04-29 19:51:15.195000000 +0800 @@ -10,10 +10,10 @@ realtime =none extsz=4096 blocks=0, rtextents=0 [root@dhcp-66-86-11 xfstests]# df -i /mnt/scratch Filesystem Inodes IUsed IFree IUse% Mounted on -/dev/sda6 63808 63753 55 100% /mnt/scratch +/dev/sda6 54528 54528 0 100% /mnt/scratch [root@dhcp-66-86-11 xfstests]# df -h /mnt/scratch Filesystem Size Used Avail Use% Mounted on -/dev/sda6 99M 99M 0 100% /mnt/scratch +/dev/sda6 99M 88M 12M 89% /mnt/scratch [root@dhcp-66-86-11 xfstests]# cat results/generic/204.full files 63750, resvblks 1024 reserved blocks = 1024 So the only difference is the max inode count, "bad" kernel has a lower up limit of max inode count. More experiments show that the icount is more accurate on "bad" kernel. fs/xfs/libxfs/xfs_ialloc.c:1343 if (mp->m_maxicount && percpu_counter_read(&mp->m_icount) + mp->m_ialloc_inos > mp->m_maxicount) { noroom = 1; okalloc = 0; } "Good" kernel uses mp->m_sb.sb_icount, which is not accurate during the test(256), and it never hits the "noroom" condition. "Bad" kernel uses percpu counter and the &mp->m_icount is a more accurate number(54000+), so it hits "noroom" in the test. > > Adding "-i maxpct=50" to the mkfs command allows the test to pass on > both v4/512 and v5/1k filesystems. IOWs, it does not appear to be > code problem but is a test config problem... I agree it's not a code problem, I think it's kind of expected behavior. And confirmed that adding "-i maxpct=50" makes test pass again. > > Can you send a patch to fstests@xxxxxxxxxxxxxxx that fixes the test > for these configs? Sure, will do. Thanks for the explanation! Eryu _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs