Re: need help how to debug xfs crash issue xfs_iunlink_remove: xfs_inotobp() returned error 22

符永涛 <yongtaofu@xxxxxxxxx> · Mon, 15 Apr 2013 23:24:09 +0800

Hi Brain,
Here's the meta_dump file:
https://docs.google.com/file/d/0B7n2C4T5tfNCRGpoUWIzaTlvM0E/edit?usp=sharing

Thank you.

2013/4/15 符永涛 <yongtaofu@xxxxxxxxx>

Hi Eric,
I'm sorry for spaming. 
And I got some more info and hope you're interested.

In glusterfs3.3
glusterfsd/src/glusterfsd.c line 1332 there's an unlink operation.
        if (ctx->cmd_args.pid_file) {

                unlink (ctx->cmd_args.pid_file);
                ctx->cmd_args.pid_file = NULL;
        }
Glusterfs try to unlink the rebalance pid file after complete and may be this is where the issue happens.

See logs bellow:
1.
/var/log/secure indicates I start rebalance on Apr 15 11:58:11
Apr 15 11:58:11 10 sudo:     root : TTY=pts/2 ; PWD=/root ; USER=root ; COMMAND=/usr/sbin/gluster volume rebalance testbug start

2.
After xfs shutdown I got the following log:
--- xfs_iunlink_remove -- module("xfs").function("xfs_iunlink_remove@fs/xfs/xfs_inode.c:1680").return -- return=0x16

vars: tp=0xffff881c81797c70 ip=0xffff881003c13c00 next_ino=? mp=? agi=? dip=? agibp=0xffff880109b47e20 ibp=? agno=? agino=? next_agino=? last_ibp=? last_dip=0xffff882000000000 bucket_index=? offset=? last_offset=0xffffffffffff8810 error=? __func__=[...]

ip: i_ino = 0x113, i_flags = 0x0
the inode is lead to xfs shutdown is
0x113
3. 
I repair xfs and in lost+foud I find the inode:

[root@10.23.72.93 lost+found]# pwd
/mnt/xfsd/lost+found
[root@10.23.72.93 lost+found]# ls -l 275
---------T 1 root root 0 Apr 15 11:58 275

[root@10.23.72.93 lost+found]# stat 275
  File: `275'
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: 810h/2064d      Inode: 275         Links: 1

Access: (1000/---------T)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-04-15 11:58:25.833443445 +0800
Modify: 2013-04-15 11:58:25.912461256 +0800
Change: 2013-04-15 11:58:25.915442091 +0800

This file is created aroud 2013-04-15 11:58.
And the other files in lost+foud has extended attribute but this file doesn't. Which means it is not part of glusterfs backend files. It should be the rebalance pid file.

So may be unlink the rebalance pid file leads to xfs shutdown.

Thank you.

2013/4/15 Eric Sandeen <sandeen@xxxxxxxxxxx>

On 4/15/13 8:45 AM, 符永涛 wrote:

> And at the same time we got the following error log of glusterfs:

> [2013-04-15 20:43:03.851163] I [dht-rebalance.c:1611:gf_defrag_status_get] 0-glusterfs: Rebalance is completed

> [2013-04-15 20:43:03.851248] I [dht-rebalance.c:1614:gf_defrag_status_get] 0-glusterfs: Files migrated: 1629, size: 1582329065954, lookups: 11036, failures: 561

> [2013-04-15 20:43:03.887634] W [glusterfsd.c:831:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3bd16e767d] (-->/lib64/libpthread.so.0() [0x3bd1a07851] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x405c9d]))) 0-: received signum (15), shutting down

> [2013-04-15 20:43:03.887878] E [rpcsvc.c:1155:rpcsvc_program_unregister_portmap] 0-rpc-service: Could not unregister with portmap

>

We'll take a look, thanks.

Going forward, could I ask that you take a few minutes to batch up the information, rather than sending several emails in a row?  It makes it much harder to collect the information when it's spread across so many emails.

Thanks,

-Eric

-- 
符永涛

-- 
符永涛

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs