ext4 xfstests-bld results for 3.8-rc6 on Pandaboard ES

Eric Whitney <enwlinux@xxxxxxxxx> · Thu, 7 Feb 2013 17:05:36 -0500

These are the results of a run of Ted Ts'o's xfstests-bld test harness against
ext4 on a Pandaboard ES running a 3.8-rc6 kernel.  The harness contains
several test suites exercising various combinations of ext4 mount options.
Each segment of the text below describes the mount options used in each
scenario (although not noted, the block_validity option was also used in all
cases), a list of xfstests run, and notes discussing any failures reported by 
the suite or found in the syslog.

Code built from the tips of the e2fsprogs and xfstests master branches was
used for this testing rather than that found in the xfstests-bld tree.

With one exception noted below, the test results taken on the Pandaboard (ARM
processor) are the same as those observed from a run on an x86-64 KVM guest.

Thanks,
Eric


BEGIN TEST: Ext4 4k block Fri Feb  1 13:08:25 EST 2013
Ran: 001 002 005 006 007 011 013 014 015 020 053 062 069 070 075 076 079 088
 091 105 112 113 117 120 123 124 126 128 129 130 131 135 141 169 184 193 198
 207 210 211 212 213 214 215 219 221 223 225 228 230 235 236 237 240 243 245
 246 247 248 249 255 256 257 258 263 271 277 289 294
Passed all 69 tests
END TEST: Ext4 4k block Fri Feb  1 14:07:45 EST 2013


BEGIN TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Fri Feb  1
 14:08:14 EST 2013
Ran: 001 002 005 006 007 011 013 014 015 020 053 062 069 070 075 076 079 088
 091 105 112 113 117 120 123 124 126 128 129 130 131 135 141 169 184 193 198
 207 210 211 212 215 219 221 225 230 235 236 237 240 245 246 247 248 249 257
 258 263 271 277 289 294
Passed all 62 tests
END TEST: Ext4 4k block w/nodelalloc, no flex_bg, and no extents Fri Feb  1 
 14:38:05 EST 2013


BEGIN TEST: Ext4 4k block w/ no journal Fri Feb  1 14:38:10 EST 2013
Ran: 001 002 005 006 007 011 013 014 015 020 053 062 069 070 075 076 079 088
 091 105 112 113 117 120 123 124 126 128 129 130 131 135 141 169 184 193 198
 207 210 211 212 213 214 215 219 221 223 225 228 230 235 236 237 240 243 245
 246 247 248 249 255 256 257 258 263 271 277 289 294
Passed all 69 tests
END TEST: Ext4 4k block w/ no journal Fri Feb  1 15:33:29 EST 2013


BEGIN TEST: Ext4 1k block Fri Feb  1 15:33:52 EST 2013
Ran: 001 002 005 006 007 011 013 014 015 020 053 062 069 070 075 076 079 088
 091 105 112 113 117 120 123 124 126 128 129 130 131 135 141 169 184 193 198
 207 210 211 212 213 214 215 219 221 223 225 228 230 235 236 237 240 243 245
 246 247 248 249 255 256 257 258 263 271 277 289 294
Passed all 69 tests
END TEST: Ext4 1k block Fri Feb  1 16:05:38 EST 2013


BEGIN TEST: Ext4 4k block w/nodelalloc and no flex_bg Fri Feb  1 16:05:45 EST
 2013
Ran: 001 002 005 006 007 011 013 014 015 020 053 062 069 070 075 076 079 088
 091 105 112 113 117 120 123 124 126 128 129 130 131 135 141 169 184 193 198
 207 210 211 212 213 214 215 219 221 223 225 228 230 235 236 237 240 243 245
 246 247 248 249 255 256 257 258 263 271 277 289 294
Failures: 223
END TEST: Ext4 4k block w/nodelalloc and no flex_bg Fri Feb  1 16:36:47 EST
 2013

Notes:

1) This is a sample of the 223 failure output (also seen on x86-64);  the
complete test log contains numerous additional mismatches of the same sort:

223 54s ...	[15:33:09] [15:34:11] - output mismatch (see 223.out.bad)
    --- 223.out	2012-12-23 14:52:52.897392805 -0500
    +++ 223.out.bad	2013-02-06 15:34:11.750303000 -0500
    @@ -204,47 +204,47 @@
     SCRATCH_MNT/file-1-131072-falloc: well-aligned
     SCRATCH_MNT/file-1-131072-write: well-aligned
     SCRATCH_MNT/file-2-131072-falloc: well-aligned
    -SCRATCH_MNT/file-2-131072-write: well-aligned
    +SCRATCH_MNT/file-2-131072-write: Start block 33296 not multiple of sunit 32
     SCRATCH_MNT/file-3-131072-falloc: well-aligned
     SCRATCH_MNT/file-3-131072-write: well-aligned
     ...


BEGIN TEST: Ext4 4k block w/metadata_csum Fri Feb  1 16:36:54 EST 2013
Ran: 001 002 005 006 007 011 013 014 015 020 053 062 069 070 075 076 079 088
 091 105 112 113 117 120 123 124 126 128 129 130 131 135 141 169 184 193 198
 207 210 211 212 213 214 215 219 221 223 225 228 230 235 236 237 240 243 245
 246 247 248 249 255 256 257 258 263 271 277 289 294
Passed all 69 tests
END TEST: Ext4 4k block w/metadata_csum Fri Feb  1 17:09:01 EST 2013


BEGIN TEST: Ext4 4k block w/dioread_nolock Fri Feb  1 17:09:01 EST 2013
Ran: 001 002 005 006 007 011 013 014 015 020 053 062 069 070 075 076 079 088
 091 105 112 113 117 120 123 124 126 128 129 130 131 135 141 169 184 193 198
 207 210 211 212 213 214 215 219 221 223 225 228 230 235 236 237 240 243 245
 246 247 248 249 255 256 257 258 263 271 277 289 294
Passed all 69 tests
END TEST: Ext4 4k block w/dioread_nolock Fri Feb  1 18:05:12 EST 2013

Notes:

1) 113 produces lockdep warnings on 3.8-rc6 that I've only seen on the
Pandaboard so far (this is the only testing anomaly on -rc6 that appears to be
unique to it).  I've seen them on both the Pandaboard and x86-64 in past -rc's,
and the signature has varied over time with different rc's.  The lockdep
warning is lengthy, so I've placed it at the end of this report.


BEGIN TEST: Ext4 4k block w/data=journal Fri Feb  1 18:05:32 EST 2013
Ran: 001 002 005 006 007 011 013 014 015 020 053 062 069 070 075 076 079 088
 091 105 112 113 117 120 123 124 126 128 129 130 131 135 141 169 184 193 198
 207 210 211 212 213 214 215 219 221 223 225 228 230 235 236 237 240 243 245
 246 247 248 249 255 256 257 258 263 271 277 289 294
Failures: 223
END TEST: Ext4 4k block w/data=journal Fri Feb  1 19:01:44 EST 2013

Notes:

1) The following message appears in the syslog when xfstest 246 is run (also
seen on x86-64):

JBD2: Spotted dirty metadata buffer (dev = sda1, blocknr = 160256). There's a
risk of filesystem corruption in case of system crash.

2) This is a sample of the 223 failure output (also seen on x86-64);  the
complete test log contains numerous additional mismatches of the same sort:
223 61s ...	[15:48:55] [15:50:18] - output mismatch (see 223.out.bad)
    --- 223.out	2012-12-23 14:52:52.897392805 -0500
    +++ 223.out.bad	2013-02-06 15:50:18.437803000 -0500
    @@ -204,45 +204,45 @@
     SCRATCH_MNT/file-1-131072-falloc: well-aligned
     SCRATCH_MNT/file-1-131072-write: well-aligned
     SCRATCH_MNT/file-2-131072-falloc: well-aligned
    -SCRATCH_MNT/file-2-131072-write: well-aligned
    +SCRATCH_MNT/file-2-131072-write: Start block 33296 not multiple of sunit 32
     SCRATCH_MNT/file-3-131072-falloc: well-aligned
     SCRATCH_MNT/file-3-131072-write: well-aligned
     ...


BEGIN TEST: Ext4 4k block w/bigalloc Fri Feb  1 19:02:10 EST 2013
Ran: 001 002 005 006 007 011 013 014 015 020 053 062 069 070 075 076 079 088
 091 105 112 113 117 120 123 124 126 128 129 130 131 135 141 169 184 193 198
 207 210 211 212 213 214 215 219 221 223 225 228 230 235 236 237 240 243 245
 246 247 248 249 257 258 263 271 277 289 294
Failures: 015 219 235
END TEST: Ext4 4k block w/bigalloc Fri Feb  1 19:21:22 EST 2013

Notes:

1) This is all of the 015 failure output (also seen on x86-64):

015 6s ...	[17:23:07] [17:23:12] - output mismatch (see 015.out.bad)
    --- 015.out	2012-12-23 14:52:52.530205305 -0500
    +++ 015.out.bad	2013-02-06 17:23:12.390928000 -0500
    @@ -4,4 +4,5 @@
     check free space:
     delete fill:
     check free space:
    -   !!! free space is in range
    +   !!! free space has value of 0
    +free space is NOT in range 43655.04 .. 44536.96
     ...

Multiple bursts of messages similar to this sample appear in the syslog when
015 is run:

EXT4-fs (sda2): delayed block allocation failed for inode 12 at logical offset 11056 with max blocks 2048 with error -28
EXT4-fs (sda2): This should not happen!! Data will be lost
EXT4-fs (sda2): Total free blocks count 0
EXT4-fs (sda2): Free/Dirty block details
EXT4-fs (sda2): free_blocks=0
EXT4-fs (sda2): dirty_blocks=42992
EXT4-fs (sda2): Block reservation details
EXT4-fs (sda2): i_reserved_data_blocks=998
EXT4-fs (sda2): i_reserved_meta_blocks=1689
EXT4-fs (sda2): delayed block allocation failed for inode 12 at logical offset 13105 with max blocks 2048 with error -28
EXT4-fs (sda2): This should not happen!! Data will be lost

Occasionally (maybe 20% of the time), 015 will lead to a stall where the test
makes no further forward progress.  This happens at the end of the test when
a file used to consume available free space on the filesystem is deleted.
The rm is left in the D state, a flusher thread continues to run and
accumulate time, and I/O is continuously written to the storage device at a
slow rate.


2) This is all of the 219 failure output (also seen on x86-64):

--- 219.out	2013-02-01 12:46:06.672838718 -0500
+++ 219.out.bad	2013-02-07 12:36:14.734678000 -0500
@@ -16,7 +16,7 @@
   Size: 49152        Filetype: Regular File
   File: "SCRATCH_MNT/mmap"
   Size: 49152        Filetype: Regular File
-Usage OK (type=u)
+Too many blocks used (type=u)
 
 ### test group accounting
 
@@ -34,4 +34,4 @@
   Size: 49152        Filetype: Regular File
   File: "SCRATCH_MNT/mmap"
   Size: 49152        Filetype: Regular File
-Usage OK (type=g)
+Too many blocks used (type=g)


3) This is all of the 235 failure output (also seen on x86-64):

--- 235.out	2012-12-23 14:52:52.905205305 -0500
+++ 235.out.bad	2013-02-07 12:40:53.750303000 -0500
@@ -21,7 +21,7 @@
                         Block limits                File limits
 User            used    soft    hard  grace    used  soft  hard  grace
 ----------------------------------------------------------------------
-fsgqa     --       8       0       0              1     0     0       
+fsgqa     --      64       0       0              1     0     0       
 
 
 *** Report for group quotas on device SCRATCH_DEV
@@ -29,6 +29,6 @@
                         Block limits                File limits
 Group           used    soft    hard  grace    used  soft  hard  grace
 ----------------------------------------------------------------------
-fsgqa     --       8       0       0              1     0     0       
+fsgqa     --      64       0       0              1     0     0       


4) When 225 is run, one or more instances of the following warning will be
posted to the syslog (also on x86-64):

WARNING: at fs/ext4/inode.c:362 ext4_da_update_reserve_space+0x214/0x280()
Modules linked in:
[<c0016d18>] (unwind_backtrace+0x0/0x104) from [<c059e450>] (dump_stack+0x20/0x24)
[<c059e450>] (dump_stack+0x20/0x24) from [<c0037c68>] (warn_slowpath_common+0x5c/0x74)
[<c0037c68>] (warn_slowpath_common+0x5c/0x74) from [<c0037cac>] (warn_slowpath_null+0x2c/0x34)
[<c0037cac>] (warn_slowpath_null+0x2c/0x34) from [<c01af4f4>] (ext4_da_update_reserve_space+0x214/0x280)
[<c01af4f4>] (ext4_da_update_reserve_space+0x214/0x280) from [<c01dc928>] (ext4_ext_map_blocks+0xc10/0x153c)
[<c01dc928>] (ext4_ext_map_blocks+0xc10/0x153c) from [<c01af624>] (ext4_map_blocks+0xc4/0x230)
[<c01af624>] (ext4_map_blocks+0xc4/0x230) from [<c01b3c6c>] (mpage_da_map_and_submit+0xa0/0x598)
[<c01b3c6c>] (mpage_da_map_and_submit+0xa0/0x598) from [<c01b494c>] (ext4_da_writepages+0x2d8/0x564)
[<c01b494c>] (ext4_da_writepages+0x2d8/0x564) from [<c00e38cc>] (do_writepages+0x34/0x48)
[<c00e38cc>] (do_writepages+0x34/0x48) from [<c00d9b64>] (__filemap_fdatawrite_range+0x5c/0x64)
[<c00d9b64>] (__filemap_fdatawrite_range+0x5c/0x64) from [<c00d9ba4>] (filemap_fdatawrite+0x38/0x40)
[<c00d9ba4>] (filemap_fdatawrite+0x38/0x40) from [<c00d9bdc>] (filemap_write_and_wait+0x30/0x54)
[<c00d9bdc>] (filemap_write_and_wait+0x30/0x54) from [<c0128864>] (do_vfs_ioctl+0x498/0x5d8)
[<c0128864>] (do_vfs_ioctl+0x498/0x5d8) from [<c0128a18>] (sys_ioctl+0x74/0x94)
[<c0128a18>] (sys_ioctl+0x74/0x94) from [<c000e3c0>] (ret_fast_syscall+0x0/0x3c)



Here's the lockdep report for xfstest 113 run in the dioread_nolock scenario
as mentioned above (the taint is due to an unrelated warning from the clk
subsystem believed to be unique to 3.8 on Pandaboards/OMAP):

[ INFO: possible circular locking dependency detected ]
3.8.0-rc6 #1 Tainted: G        W   
-------------------------------------------------------
aio-stress/8597 is trying to acquire lock:
 (&mm->mmap_sem){++++++}, at: [<c00f1928>] get_user_pages_fast+0x48/0x88

but task is already holding lock:
 (&ei->i_data_sem){++++.+}, at: [<c01ae2a8>] ext4_direct_IO+0x3d4/0x450
 
which lock already depends on the new lock.
 
 
the existing dependency chain (in reverse order) is:
 
-> #1 (&ei->i_data_sem){++++.+}:
       [<c008c190>] __lock_acquire+0x162c/0x1b04
       [<c008cb70>] lock_acquire+0x9c/0x114
       [<c05a4f78>] down_read+0x40/0x50
       [<c01af6dc>] ext4_map_blocks+0x17c/0x230
       [<c01b1b6c>] _ext4_get_block+0x84/0x164
       [<c01b1d2c>] ext4_get_block+0x34/0x3c
       [<c0153bfc>] do_mpage_readpage+0x4d8/0x754
       [<c0153f50>] mpage_readpages+0xd8/0x134
       [<c01aedf4>] ext4_readpages+0x4c/0x54
       [<c00e3cbc>] __do_page_cache_readahead+0x1e8/0x2b0
       [<c00e408c>] ra_submit+0x38/0x40
       [<c00db0d4>] filemap_fault+0x3b4/0x490
       [<c00fb220>] __do_fault+0x78/0x4a8
       [<c00fe36c>] handle_pte_fault+0x80/0x6f8
       [<c00fea8c>] handle_mm_fault+0xa8/0xd8
       [<c05a91bc>] do_page_fault+0x294/0x3b4
       [<c05a9384>] do_translation_fault+0xa8/0xb0
       [<c00083f4>] do_DataAbort+0x44/0xa8
       [<c05a751c>] __dabt_svc+0x3c/0x60
       [<c02d17c0>] __clear_user_std+0x34/0x64
 
-> #0 (&mm->mmap_sem){++++++}:
       [<c059fa20>] print_circular_bug+0x78/0x2c8
       [<c008c59c>] __lock_acquire+0x1a38/0x1b04
       [<c008cb70>] lock_acquire+0x9c/0x114
       [<c05a4f78>] down_read+0x40/0x50
       [<c00f1928>] get_user_pages_fast+0x48/0x88
       [<c0151110>] __blockdev_direct_IO+0x1be4/0x3840
       [<c01ae1ac>] ext4_direct_IO+0x2d8/0x450
       [<c00d9db0>] generic_file_direct_write+0xf4/0x20c
       [<c00da23c>] __generic_file_aio_write+0x374/0x454
       [<c01aa49c>] ext4_file_write+0x2ac/0x4d0
       [<c015cf18>] aio_rw_vect_retry+0x78/0x1ac
       [<c015d858>] aio_run_iocb+0x64/0x14c
       [<c015ed10>] do_io_submit+0x574/0x8ec
       [<c015f0a4>] sys_io_submit+0x1c/0x20
       [<c000e3c0>] ret_fast_syscall+0x0/0x3c
 
other info that might help us debug this:
 
 Possible unsafe locking scenario:
 
       CPU0                    CPU1
       ----                    ----
  lock(&ei->i_data_sem);
                               lock(&mm->mmap_sem);
                               lock(&ei->i_data_sem);
  lock(&mm->mmap_sem);
 
 *** DEADLOCK ***
 
1 lock held by aio-stress/8597:
 #0:  (&ei->i_data_sem){++++.+}, at: [<c01ae2a8>] ext4_direct_IO+0x3d4/0x450
 
stack backtrace:
[<c0016d18>] (unwind_backtrace+0x0/0x104) from [<c059e450>] (dump_stack+0x20/0x24)
[<c059e450>] (dump_stack+0x20/0x24) from [<c059fc24>] (print_circular_bug+0x27c/0x2c8)
[<c059fc24>] (print_circular_bug+0x27c/0x2c8) from [<c008c59c>] (__lock_acquire+0x1a38/0x1b04)
[<c008c59c>] (__lock_acquire+0x1a38/0x1b04) from [<c008cb70>] (lock_acquire+0x9c/0x114)
[<c008cb70>] (lock_acquire+0x9c/0x114) from [<c05a4f78>] (down_read+0x40/0x50)
[<c05a4f78>] (down_read+0x40/0x50) from [<c00f1928>] (get_user_pages_fast+0x48/0x88)
[<c00f1928>] (get_user_pages_fast+0x48/0x88) from [<c0151110>] (__blockdev_direct_IO+0x1be4/0x3840)
[<c0151110>] (__blockdev_direct_IO+0x1be4/0x3840) from [<c01ae1ac>] (ext4_direct_IO+0x2d8/0x450)
[<c01ae1ac>] (ext4_direct_IO+0x2d8/0x450) from [<c00d9db0>] (generic_file_direct_write+0xf4/0x20c)
[<c00d9db0>] (generic_file_direct_write+0xf4/0x20c) from [<c00da23c>] (__generic_file_aio_write+0x374/0x454)
[<c00da23c>] (__generic_file_aio_write+0x374/0x454) from [<c01aa49c>] (ext4_file_write+0x2ac/0x4d0)
[<c01aa49c>] (ext4_file_write+0x2ac/0x4d0) from [<c015cf18>] (aio_rw_vect_retry+0x78/0x1ac)
[<c015cf18>] (aio_rw_vect_retry+0x78/0x1ac) from [<c015d858>] (aio_run_iocb+0x64/0x14c)
[<c015d858>] (aio_run_iocb+0x64/0x14c) from [<c015ed10>] (do_io_submit+0x574/0x8ec)
[<c015ed10>] (do_io_submit+0x574/0x8ec) from [<c015f0a4>] (sys_io_submit+0x1c/0x20)
[<c015f0a4>] (sys_io_submit+0x1c/0x20) from [<c000e3c0>] (ret_fast_syscall+0x0/0x3c)
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html