Re: xfs_iunlink_remove: xfs_inotobp() returned error 22 -- debugging

符永涛 <yongtaofu@xxxxxxxxx> · Sat, 20 Apr 2013 12:11:16 +0800

And glusterfs always uses hardlink for sel-heal too(a backend file has a hardlink under a hidden directory which name is .glusterfs). So as you have mentioned reduce di_nlink may also conflicts.

2013/4/20 符永涛 <yongtaofu@xxxxxxxxx>

Hi Eric,
I will enable them and run test again. I can only reproduce it with glusterfs rebalance. Glusterfs uses a mechanism it called syncop to unlink file. For rebalance it uses syncop_unlink(glusterfs/libglusterfs/src/syncop.c). In the glusterfs sync_task framework(glusterfs/libglusterfs/src/syncop.c) it uses  "makecontext/swapcontext". Does it leads to racing unlink from different CPU core?

Thank you. 

2013/4/20 Eric Sandeen <sandeen@xxxxxxxxxxx>

On 4/19/13 7:51 PM, 符永涛 wrote:

> After change mount option to sync shutdown still happens, and I got a trace again, the inode 0x1c57d is abnormal.

since this is a race on namespace operations, I wouldn't have expected sync to matter.

> https://docs.google.com/file/d/0B7n2C4T5tfNCYW1jNWhBbXBYakE/edit?usp=sharing

> I have a question if the problem is hard to reproduce why I got 8 times in a week only in a test cluster with 8 node?

> What's the problem?

you must have something unique in your environment, and we don't know what it is.

To gather more information, can you also turn on tracepoints for:

xfs_rename

xfs_create

xfs_link

xfs_remove

in addition to xfs_iunlink and xfs_iunlink_remove,

and we'll see what that tells us.

There are many paths that manipulate the di_nlink count, and something is racing, but we don't yet know what two callchains they are.

The above are all the callers that manipulate the link count, so they will yield more information about who is manipulating the counts.

Thanks,

-Eric

-- 
符永涛

-- 
符永涛

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs