when there are dangling entry(without gfid) in only one brick dir, the glusterfs heal info will keep showing the entry, glustershd can not really remove this entry from brick .

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi glusterfs expert,

I meet one problem in my test bed (3 brick on 3 sn nodes), the “/” is always in glusterfs v heal info output. In my ftest+reboot-sn-nodes-randomly test, the gluster v heal info output keeps showing entry “/” even for hours, and even you do some touch or ls of /mnt/mstate , it will not help to solve this issue.

 

[root@sn-0:/mnt/bricks/mstate/brick]

# gluster v heal mstate info

Brick sn-0.local:/mnt/bricks/mstate/brick

/

Status: Connected

Number of entries: 1

 

Brick sn-2.local:/mnt/bricks/mstate/brick

/

Status: Connected

Number of entries: 1

 

Brick sn-1.local:/mnt/bricks/mstate/brick

/

Status: Connected

Number of entries: 1

 

 

From sn glustershd.log I find following prints

[2018-10-10 08:13:00.005250] I [MSGID: 108026] [afr-self-heald.c:432:afr_shd_index_heal] 0-mstate-replicate-0: got entry: 00000000-0000-0000-0000-000000000001 from mstate-client-0

[2018-10-10 08:13:00.006077] I [MSGID: 108026] [afr-self-heald.c:341:afr_shd_selfheal] 0-mstate-replicate-0: entry: path /, gfid: 00000000-0000-0000-0000-000000000001

[2018-10-10 08:13:00.011599] I [MSGID: 108026] [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-mstate-replicate-0: performing entry selfheal on 00000000-0000-0000-0000-000000000001

[2018-10-10 08:16:28.722059] W [MSGID: 108015] [afr-self-heal-entry.c:47:afr_selfheal_entry_delete] 0-mstate-replicate-0: expunging dir 00000000-0000-0000-0000-000000000001/fstest_76f272545249be5d71359f06962e069b (00000000-0000-0000-0000-000000000000) on mstate-client-0

[2018-10-10 08:16:28.722975] W [MSGID: 114031] [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-mstate-client-0: remote operation failed [No such file or directory]

 

 

When I check the env I find that fstest_76f272545249be5d71359f06962e069b only exists on sn-0 node brick only and the getfattr of this is empty!

 

[root@sn-0:/mnt/bricks/mstate/brick]

# getfattr -m . -d -e hex fstest_76f272545249be5d71359f06962e069b    //return is empty output

[root@sn-0:/mnt/bricks/mstate/brick]

# getfattr -m . -d -e hex .

# file: .

trusted.afr.dirty=0x000000000000000000000000

trusted.afr.mstate-client-1=0x000000000000000000000000

trusted.afr.mstate-client-2=0x0000000000000000000002a7

trusted.gfid=0x00000000000000000000000000000001

trusted.glusterfs.dht=0x000000010000000000000000ffffffff

trusted.glusterfs.volume-id=0xbf7aad0e4ce44196aa9b0a33928ea2ff

 

[root@sn-1:/root]

# stat /mnt/bricks/mstate/brick/fstest_76f272545249be5d71359f06962e069b

stat: cannot stat '/mnt/bricks/mstate/brick/fstest_76f272545249be5d71359f06962e069b': No such file or directory

[root@sn-1:/root]

[root@sn-1:/mnt/bricks/mstate/brick]

# getfattr -m . -d -e hex .

# file: .

trusted.afr.dirty=0x000000000000000000000000

trusted.afr.mstate-client-0=0x000000000000000000000006

trusted.afr.mstate-client-1=0x000000000000000000000000

trusted.gfid=0x00000000000000000000000000000001

trusted.glusterfs.dht=0x000000010000000000000000ffffffff

trusted.glusterfs.volume-id=0xbf7aad0e4ce44196aa9b0a33928ea2ff

 

[root@sn-2:/mnt/bricks/mstate/brick]

# stat /mnt/bricks/mstate/brick/fstest_76f272545249be5d71359f06962e069b

stat: cannot stat '/mnt/bricks/mstate/brick/fstest_76f272545249be5d71359f06962e069b': No such file or directory

[root@sn-2:/mnt/bricks/mstate/brick]

 

[root@sn-2:/mnt/bricks/mstate/brick]

# getfattr -m . -d -e hex .

# file: .

trusted.afr.dirty=0x000000000000000000000000

trusted.afr.mstate-client-0=0x000000000000000000000006

trusted.afr.mstate-client-2=0x000000000000000000000000

trusted.gfid=0x00000000000000000000000000000001

trusted.glusterfs.dht=0x000000010000000000000000ffffffff

trusted.glusterfs.volume-id=0xbf7aad0e4ce44196aa9b0a33928ea2ff

 

 

I think the entry fstest_76f272545249be5d71359f06962e069b should either be assigned gfid or be removed, from the glustershd.log it shows clearly that glustershd on sn-0 try to remove this dangling entry but meet some error. And when I do some gdb I find that in this case the entry is not assigned to gfid, because the __afr_selfheal_heal_dirent input param source is 1, so  replies[source].op_ret == -1, and can not assign gfid to it. my question is in this case there is no fstest_76f272545249be5d71359f06962e069b on sn-1 and sn-2, so if want to remove this dangling entry on sn-0 can not use syncop_rmdir, I would like your opinion on this issue, thanks!

 

Thread 12 "glustershdheal" hit Breakpoint 1, __afr_selfheal_heal_dirent (frame=0x7f5aec009350, this=0x7f5b1001d8d0, fd=0x7f5b0800c8a0,

    name=0x7f5b08059db0 "fstest_76f272545249be5d71359f06962e069b", inode=0x7f5aec001e80, source=1, sources=0x7f5af8fefb20 "",

    healed_sinks=0x7f5af8fefae0 "\001", locked_on=0x7f5af8fefac0 "\001\001\001\370Z\177", replies=0x7f5af8fef190) at afr-self-heal-entry.c:172

172 afr-self-heal-entry.c: No such file or directory.

(gdb) print name

$17 = 0x7f5b08059db0 "fstest_76f272545249be5d71359f06962e069b"

(gdb) print source

$18 = 1

(gdb) print replies[0].op_ret

$19 = 0

(gdb) print replies[1].op_ret

$20 = -1

(gdb) print replies[2].op_ret

$21 = -1

(gdb) print replies[1].op_errno

$22 = 2

 

 

When set brick point to afr_selfheal_entry_delete

Thread 12 "glustershdheal" hit Breakpoint 1, afr_selfheal_entry_delete (this=0x7f5b1001d8d0, dir=0x7f5b100847f0,

    name=0x7f5b08059510 "fstest_76f272545249be5d71359f06962e069b", inode=0x7f5aec001650, child=0, replies=0x7f5af8fef190) at afr-self-heal-entry.c:24

24  afr-self-heal-entry.c: No such file or directory.

(gdb) print uuid_utoa(inode->gfid)

$1 = 0x7f5aec0022a0 "00000000-0000-0000-0000-", '0' <repeats 12 times>

(gdb) print name

$2 = 0x7f5b08059510 "fstest_76f272545249be5d71359f06962e069b"

(gdb) bt

#0  afr_selfheal_entry_delete (this=0x7f5b1001d8d0, dir=0x7f5b100847f0, name=0x7f5b08059510 "fstest_76f272545249be5d71359f06962e069b",

    inode=0x7f5aec001650, child=0, replies=0x7f5af8fef190) at afr-self-heal-entry.c:24

#1  0x00007f5b141e517c in __afr_selfheal_heal_dirent (frame=0x7f5aec21f470, this=0x7f5b1001d8d0, fd=0x7f5b10004510,

    name=0x7f5b08059510 "fstest_76f272545249be5d71359f06962e069b", inode=0x7f5aec001650, source=1, sources=0x7f5af8fefb20 "",

    healed_sinks=0x7f5af8fefae0 "\001", locked_on=0x7f5af8fefac0 "\001\001\001\370Z\177", replies=0x7f5af8fef190) at afr-self-heal-entry.c:201

#2  0x00007f5b141e59ab in __afr_selfheal_entry_dirent (frame=0x7f5aec21f470, this=0x7f5b1001d8d0, fd=0x7f5b10004510,

    name=0x7f5b08059510 "fstest_76f272545249be5d71359f06962e069b", inode=0x7f5aec001650, source=1, sources=0x7f5af8fefb20 "",

    healed_sinks=0x7f5af8fefae0 "\001", locked_on=0x7f5af8fefac0 "\001\001\001\370Z\177", replies=0x7f5af8fef190) at afr-self-heal-entry.c:383

#3  0x00007f5b141e63ec in afr_selfheal_entry_dirent (frame=0x7f5aec21f470, this=0x7f5b1001d8d0, fd=0x7f5b10004510,

    name=0x7f5b08059510 "fstest_76f272545249be5d71359f06962e069b", parent_idx_inode=0x0, subvol=0x7f5b10016f70, full_crawl=_gf_true)

    at afr-self-heal-entry.c:610

#4  0x00007f5b141e6a1a in afr_selfheal_entry_do_subvol (frame=0x7f5b100011c0, this=0x7f5b1001d8d0, fd=0x7f5b10004510, child=0) at afr-self-heal-entry.c:742

#5  0x00007f5b141e7207 in afr_selfheal_entry_do (frame=0x7f5b100011c0, this=0x7f5b1001d8d0, fd=0x7f5b10004510, source=1, sources=0x7f5af8ff07f0 "",

    healed_sinks=0x7f5af8ff07b0 "\001") at afr-self-heal-entry.c:908

#6  0x00007f5b141e7846 in __afr_selfheal_entry (frame=0x7f5b100011c0, this=0x7f5b1001d8d0, fd=0x7f5b10004510, locked_on=0x7f5af8ff0900 "\001\001\001[")

    at afr-self-heal-entry.c:1002

#7  0x00007f5b141e7d4a in afr_selfheal_entry (frame=0x7f5b100011c0, this=0x7f5b1001d8d0, inode=0x7f5b100847f0) at afr-self-heal-entry.c:1112

#8  0x00007f5b141df3aa in afr_selfheal_do (frame=0x7f5b100011c0, this=0x7f5b1001d8d0, gfid=0x7f5af8ff0b00 "") at afr-self-heal-common.c:2534

#9  0x00007f5b141df4a0 in afr_selfheal (this=0x7f5b1001d8d0, gfid=0x7f5af8ff0b00 "") at afr-self-heal-common.c:2575

#10 0x00007f5b141eadec in afr_shd_selfheal (healer=0x7f5b10084c30, child=0, gfid=0x7f5af8ff0b00 "") at afr-self-heald.c:343

#11 0x00007f5b141eb19b in afr_shd_index_heal (subvol=0x7f5b10016f70, entry=0x7f5b100012f0, parent=0x7f5af8ff0dc0, data="">

    at afr-self-heald.c:440

#12 0x00007f5b1a682ed3 in syncop_mt_dir_scan (frame=0x7f5b100b89e0, subvol=0x7f5b10016f70, loc=0x7f5af8ff0dc0, pid=-6, data=""

    fn=0x7f5b141eb04c <afr_shd_index_heal>, xdata=0x7f5b100b88d0, max_jobs=1, max_qlen=1024) at syncop-utils.c:407

#13 0x00007f5b141eb445 in afr_shd_index_sweep (healer=0x7f5b10084c30, vgfid=0x7f5b14213790 "glusterfs.xattrop_index_gfid") at afr-self-heald.c:494

#14 0x00007f5b141eb524 in afr_shd_index_sweep_all (healer=0x7f5b10084c30) at afr-self-heald.c:517

#15 0x00007f5b141eb827 in afr_shd_index_healer (data="" at afr-self-heald.c:597

#16 0x00007f5b193cd5da in start_thread () from /lib64/libpthread.so.0

#17 0x00007f5b18ca3cbf in clone () from /lib64/libc.so.6

(gdb) quit

A debugging session is active.

 

      Inferior 1 [process 2230] will be detached.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux