Re: XFS: Assertion failed: bp->b_bn != XFS_BUF_DADDR_NULL, file: fs/xfs/xfs_buf.c, line: 598

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2012-03-30 at 18:02 +0200, Rafał Kupka wrote:
> On Tue, 2012-03-27 at 21:17 +1100, Dave Chinner wrote:
Hi,

I've collected more data.

> I've applied this patch to linus/v3.3 kernel. There is no more I/O
> errors but xfsdump still hangs.
[other details in my previous e-mail]

This new code is executed but broken struct xfs_buf is never reclaimed
from the cache:

         xfsdump-1455  [000] .... 40888.509350: xfs_perag_get: dev 254:2 agno 1 refcount 441 caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.509355: xfs_perag_put: dev 254:2 agno 1 refcount 440 caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.509356: xfs_buf_trylock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.509357: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1455  [000] .... 40888.509358: xfs_buf_unlock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 1 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller xfs_buf_relse
         xfsdump-1455  [000] .... 40888.509359: xfs_buf_rele: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 1 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.513340: xfs_perag_get: dev 254:2 agno 1 refcount 441 caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.513345: xfs_perag_put: dev 254:2 agno 1 refcount 440 caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.513346: xfs_buf_trylock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.513347: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1455  [000] .... 40888.513348: xfs_buf_unlock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 1 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller xfs_buf_relse
         xfsdump-1455  [000] .... 40888.513349: xfs_buf_rele: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 1 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.517336: xfs_perag_get: dev 254:2 agno 1 refcount 441 caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.517341: xfs_perag_put: dev 254:2 agno 1 refcount 440 caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.517342: xfs_buf_trylock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller _xfs_buf_find
         xfsdump-1455  [000] .... 40888.517343: xfs_buf_find_broken: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 0 flags READ|LOCK|DONT_BLOCK caller xfs_buf_get
         xfsdump-1455  [000] .... 40888.517344: xfs_buf_unlock: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 1 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller xfs_buf_relse
         xfsdump-1455  [000] .... 40888.517345: xfs_buf_rele: dev 254:2 bno 0xffffffffffffffff len 0x2000 hold 2 pincount 0 lock 1 flags READ|ASYNC|BROKEN|TRYLOCK|PAGES caller _xfs_buf_find

As I lurked to XFS mailing list I see that better fix is to never put
half-initialized buf into rbtree.
http://article.gmane.org/gmane.comp.file-systems.xfs.general/44356

I've tried another simple patch. If these broken XBF_READ_AHEAD buffers
can be picked up by non read-ahead code (as long as it's in rbtree)
maybe it's not reasonable to treat them specially in
xfs_buf_allocate_memory().

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index e24b320..d18c070 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -341,11 +341,13 @@ use_alloc_page:
 retry:
                page = alloc_page(gfp_mask);
                if (unlikely(page == NULL)) {
+                       /*
                        if (flags & XBF_READ_AHEAD) {
                                bp->b_page_count = i;
                                error = ENOMEM;
                                goto out_free_pages;
                        }
+                        */
 
                        /*
                         * This could deadlock.

Patch is probably broken by Evolution but you can see what I've done.

XFS patched this way survived my test for a long(er) time without hangs.

Retries counter in /proc/fs/xfs/stat raises slowly as expected.
"buf 13869637 753868 13946555 17174 8848657 92215 555 146160 90802"

There is one "XFS: possible memory allocation deadlock" message as you
can see in kernel logs below.

Yet there is another (really?) assertion:
Apr  4 02:58:22 goscie kernel: [    5.617480] sshd (892): /proc/892/oom_adj is deprecated, please use /proc/892/oom_score_adj instead.
Apr  4 04:35:14 goscie kernel: [ 5817.096240] XFS: possible memory allocation deadlock in xfs_buf_allocate_memory (mode:0x1200)
Apr  4 04:41:24 goscie kernel: [ 6187.984118] XFS: Assertion failed: ((iflags & BMV_IF_DELALLOC) != 0) || (map[i].br_startblock != DELAYSTARTBLOCK), file: fs/xfs/xfs_bmap.c, line: 5612
Apr  4 04:41:24 goscie kernel: [ 6187.986603] ------------[ cut here ]------------
Apr  4 04:41:24 goscie kernel: [ 6187.987608] kernel BUG at fs/xfs/xfs_message.c:101!
Apr  4 04:41:24 goscie kernel: [ 6187.988050] invalid opcode: 0000 [#1] SMP
Apr  4 04:41:24 goscie kernel: [ 6187.988050] CPU 0
Apr  4 04:41:24 goscie kernel: [ 6187.988050] Modules linked in: loop tpm_tis psmouse i2c_piix4 tpm i2c_core evdev pcspkr tpm_bios serio_raw processor thermal_sys button xfs sg ata_generic ata_piix libata scsi_mod virtio_net [last unloaded: scsi_wait_scan]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]
Apr  4 04:41:24 goscie kernel: [ 6187.988050] Pid: 3019, comm: xfsdump Not tainted 3.3.0xfsdump1-00001-g26e1ccc-dirty #4 Bochs Bochs
Apr  4 04:41:24 goscie kernel: [ 6187.988050] RIP: 0010:[<ffffffffa009043e>]  [<ffffffffa009043e>] assfail+0x1d/0x23 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050] RSP: 0000:ffff880011c71af8  EFLAGS: 00010296
Apr  4 04:41:24 goscie kernel: [ 6187.988050] RAX: 00000000000000a0 RBX: ffff88000317b6a0 RCX: 0000000007b707b7
Apr  4 04:41:24 goscie kernel: [ 6187.988050] RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
Apr  4 04:41:24 goscie kernel: [ 6187.988050] RBP: ffff880011c71bb8 R08: 0000000000000000 R09: ffffffff8167d50a
Apr  4 04:41:24 goscie kernel: [ 6187.988050] R10: 0000000000000000 R11: ffff88001780d000 R12: 0000000000000000
Apr  4 04:41:24 goscie kernel: [ 6187.988050] R13: ffff8800153080f0 R14: ffff880003130800 R15: ffff8800031cd840
Apr  4 04:41:24 goscie kernel: [ 6187.988050] FS:  0000000000000000(0000) GS:ffff880017c00000(0063) knlGS:00000000f765c700
Apr  4 04:41:24 goscie kernel: [ 6187.988050] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
Apr  4 04:41:24 goscie kernel: [ 6187.988050] CR2: 00000000f586541c CR3: 0000000012155000 CR4: 00000000000006f0
Apr  4 04:41:24 goscie kernel: [ 6187.988050] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr  4 04:41:24 goscie kernel: [ 6187.988050] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr  4 04:41:24 goscie kernel: [ 6187.988050] Process xfsdump (pid: 3019, threadinfo ffff880011c70000, task ffff8800120dd750)
Apr  4 04:41:24 goscie kernel: [ 6187.988050] Stack:
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  ffff880011c71bb8 ffffffffa00a6fdc ffff8800031cd988 00000005115338c0
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  0000001000000005 0000000000000008 ffff88000317b600 000000000000c000
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  ffff880015308000 0000000800000000 ffffffffa008b2d2 000000008115ecf6
Apr  4 04:41:24 goscie kernel: [ 6187.988050] Call Trace:
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa00a6fdc>] ? xfs_getbmap+0x45b/0x5ae [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa008b2d2>] ? xfs_open_by_handle+0x182/0x182 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa008b3f0>] ? xfs_ioc_getbmapx+0x4e/0x85 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa008c8f6>] ? xfs_file_ioctl+0x43f/0x751 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff810be2f2>] ? __lock_page_killable+0x5d/0x63
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff81106cc7>] ? __dentry_open+0x1d6/0x2c8
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff81105ca6>] ? fd_install+0x27/0x4f
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa008b2ad>] ? xfs_open_by_handle+0x15d/0x182 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa00eb0f7>] ? xfs_file_compat_ioctl+0x355/0x486 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa008a3b7>] ? xfs_iunlock+0xea/0x146 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffffa009509f>] ? xfs_free_eofblocks+0xa5/0x217 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff81142365>] ? compat_sys_ioctl+0x1a5/0xe28
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff81071d92>] ? timekeeping_get_ns+0xd/0x2a
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff810729fa>] ? getnstimeofday+0x4d/0x80
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff81072a7a>] ? do_gettimeofday+0x10/0x2f
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  [<ffffffff81361a36>] ? cstar_dispatch+0x7/0x21
Apr  4 04:41:24 goscie kernel: [ 6187.988050] Code: 89 64 24 28 e8 86 fc ff ff 48 83 c4 68 c3 48 89 f1 41 89 d0 48 83 ec 08 48 89 fa 48 c7 c6 7b 6f 0f a0 31 ff 31 c0 e8 8d ff ff ff <0f> 0b eb fe 90 90 45 31 c0 31 c0 48 83 7f 38 00 75 67 e9 bd 00
Apr  4 04:41:24 goscie kernel: [ 6187.988050] RIP  [<ffffffffa009043e>] assfail+0x1d/0x23 [xfs]
Apr  4 04:41:24 goscie kernel: [ 6187.988050]  RSP <ffff880011c71af8>
Apr  4 04:41:24 goscie kernel: [ 6188.005126] ---[ end trace 00a3f5114a46a7b5 ]---

Should I investigate further or it is related to my (broken) patch?

Best Regards,
Rafal Kupka

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux