And glusterfs always uses hardlink for sel-heal too(a backend file has a hardlink under a hidden directory which name is .glusterfs). So as you have mentioned reduce di_nlink may also conflicts.
2013/4/20 符永涛 <yongtaofu@xxxxxxxxx>
Thank you.Hi Eric,I will enable them and run test again. I can only reproduce it with glusterfs rebalance. Glusterfs uses a mechanism it called syncop to unlink file. For rebalance it uses syncop_unlink(glusterfs/libglusterfs/src/syncop.c). In the glusterfs sync_task framework(glusterfs/libglusterfs/src/syncop.c) it uses "makecontext/swapcontext". Does it leads to racing unlink from different CPU core?
--2013/4/20 Eric Sandeen <sandeen@xxxxxxxxxxx>
On 4/19/13 7:51 PM, 符永涛 wrote:since this is a race on namespace operations, I wouldn't have expected sync to matter.
> After change mount option to sync shutdown still happens, and I got a trace again, the inode 0x1c57d is abnormal.
you must have something unique in your environment, and we don't know what it is.
> https://docs.google.com/file/d/0B7n2C4T5tfNCYW1jNWhBbXBYakE/edit?usp=sharing
> I have a question if the problem is hard to reproduce why I got 8 times in a week only in a test cluster with 8 node?
> What's the problem?
To gather more information, can you also turn on tracepoints for:
xfs_rename
xfs_create
xfs_link
xfs_remove
in addition to xfs_iunlink and xfs_iunlink_remove,
and we'll see what that tells us.
There are many paths that manipulate the di_nlink count, and something is racing, but we don't yet know what two callchains they are.
The above are all the callers that manipulate the link count, so they will yield more information about who is manipulating the counts.
Thanks,
-Eric
符永涛
--
符永涛
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs