Possible reason for "meta-data data entry missing-entry gfid self-heal failed"?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've found this thread by searching for a problem we're having. After a certain time, and under sustained write load, our GlusterFS system experiences what appears to be the same problem as Marc Seeger has shown.

We don't know exactly how the event is triggered, but here's the last few lines of the fuse mount log (/var/log/mnt-gv0.log):

[2013-07-08 10:41:22.222438] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv0-client-3: remote operation failed: File exists. Path: /folder/ba/5d/ba5d2531274e683f8aa67494ebae8955 (00000000-0000-0000-0000-000000000000)
[2013-07-08 10:41:25.837297] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv0-client-3: remote operation failed: File exists. Path: /folder/8d/00/8d00b4d980cec0e2e6ebd800927e01de (00000000-0000-0000-0000-000000000000)
[2013-07-08 10:41:25.837408] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv0-client-2: remote operation failed: File exists. Path: /folder/8d/00/8d00b4d980cec0e2e6ebd800927e01de (00000000-0000-0000-0000-000000000000)
[2013-07-08 10:41:25.837724] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 99513056: OPEN() /folder/8d/00/8d00b4d980cec0e2e6ebd800927e01de => -1 (Invalid argument)
[2013-07-08 10:41:25.875407] I [afr-self-heal-common.c:1941:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-gv0-replicate-1: Non blocking entrylks failed.
[2013-07-08 10:41:25.875456] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-gv0-replicate-1: background  meta-data data entry missing-entry gfid self-heal failed on /folder/d3/4a/d34a2601138ea0b7d243d146e411c9a2
[2013-07-08 10:41:26.032067] W [inode.c:914:inode_lookup] (-->/usr/lib/glusterfs/3.3.1/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x104) [0x7fc0723553e4] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0x14232) [0x7fc075ada232] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0x140a7) [0x7fc075ada0a7]))) 0-fuse: inode not found
[2013-07-08 10:45:01.928054] I [fuse-bridge.c:4091:fuse_thread_proc] 0-fuse: unmounting /mnt/gv0
[2013-07-08 10:45:01.943226] W [glusterfsd.c:831:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc0776e44bd] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7fc0779b6e9a] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7fc07849ca55]))) 0-: received signum (15), shutting down
[2013-07-08 10:45:01.943290] I [fuse-bridge.c:4648:fini] 0-fuse: Unmounting '/mnt/gv0'.

In the past few months, every node has about 5 to 10 occurrences of this. It seems the sequence of events it:
- Errors about other nodes trying to do something on a file that already exists?
- Failed OPEN on a file that exists
- Replication "non blocking entrylks"
- Replication "background meta-data data entry missing"
- inode not found, which triggers unmounting immediately or a few minutes later.

Here's another event:

[2013-07-08 10:57:29.762798] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv0-client-3: remote operation failed: File exists. Path: /assets/acu/79/95/79952dc84cc096d79dad886f16ef1c58 (00000000-0000-0000-0000-000000000000)
[2013-07-08 10:57:29.763110] W [fuse-bridge.c:713:fuse_fd_cbk] 0-glusterfs-fuse: 24903: OPEN() /assets/acu/79/95/79952dc84cc096d79dad886f16ef1c58 => -1 (Invalid argument)
[2013-07-08 10:57:29.779663] I [afr-self-heal-common.c:1941:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 0-gv0-replicate-1: Non blocking entrylks failed. 
[2013-07-08 10:57:29.779713] E [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] 0-gv0-replicate-1: background  meta-data data entry missing-entry gfid self-heal failed on /assets/acu/cd/4f/cd4ffce6846aa50bfcf233b4b0ebbe7e
[2013-07-08 10:57:42.796401] I [afr-self-heal-entry.c:2333:afr_sh_entry_fix] 0-gv0-replicate-0: /assets/acu/b8/a0: Performing conservative merge
[2013-07-08 10:57:55.036157] I [afr-self-heal-common.c:1970:afr_sh_post_nb_entrylk_gfid_sh_cbk] 0-gv0-replicate-1: Non blocking entrylks failed.
[2013-07-08 10:57:55.036775] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv0-client-3: remote operation failed: File exists. Path: /assets/acu/4a/eb/4aeb9e8041f36e55f64e6761d4d2afde (00000000-0000-0000-0000-000000000000)
[2013-07-08 10:57:55.036855] W [client3_1-fops.c:258:client3_1_mknod_cbk] 0-gv0-client-2: remote operation failed: File exists. Path: /assets/acu/4a/eb/4aeb9e8041f36e55f64e6761d4d2afde (00000000-0000-0000-0000-000000000000)
[2013-07-08 10:57:55.037211] W [inode.c:914:inode_lookup] (-->/usr/lib/glusterfs/3.3.1/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x104) [0x7feec832c3e4] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0x14232) [0x7feecbab1232] (-->/usr/lib/glusterfs/3.3.1/xlator/mount/fuse.so(+0x140a7) [0x7feecbab10a7]))) 0-fuse: inode not found
[2013-07-08 11:00:01.989980] I [fuse-bridge.c:4091:fuse_thread_proc] 0-fuse: unmounting /mnt/gv0
[2013-07-08 11:00:01.990196] W [glusterfsd.c:831:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7feecd6bb4bd] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7feecd98de9a] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xd5) [0x7feece473a55]))) 0-: received signum (15), shutting down
[2013-07-08 11:00:01.990222] I [fuse-bridge.c:4648:fini] 0-fuse: Unmounting '/mnt/gv0'.

I checked through all other log files and could not find anything relevant or indicative of the problem around those time stamps.

Is it related to the bug that Marc Seeger indicated? (http://review.gluster.org/#/c/4689/ )

Thanks,

Laurent Chouinard



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux