On 16/03/21 11:45 pm, Zenon Panoussis wrote:
Yes if the dataset is small, you can try rm -rf of the dir
from the mount (assuming no other application is accessing
them on the volume) launch heal once so that the heal info
becomes zero and then copy it over again .
I did approximately so; the rm -rf took its sweet time and the
number of entries to be healed kept diminishing as the deletion
progressed. At the end I was left with
Mon Mar 15 22:57:09 CET 2021
Gathering count of entries to be healed on volume gv0 has been successful
Brick node01:/gfs/gv0
Number of entries: 3
Brick mikrivouli:/gfs/gv0
Number of entries: 2
Brick nanosaurus:/gfs/gv0
Number of entries: 3
--------------
and that's where I've been ever since, for the past 20 hours.
SHD has kept trying to heal them all along and the log brings
us back to square one:
[2021-03-16 14:51:35.059593 +0000] I [MSGID: 108026] [afr-self-heal-entry.c:1053:afr_selfheal_entry_do] 0-gv0-replicate-0: performing entry selfheal on 94aefa13-9828-49e5-9bac-6f70453c100f
Does this gfid correspond to the same directory path as last time?
[2021-03-16 15:39:43.680380 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gv0-client-0: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-03-16 15:39:43.769604 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gv0-client-2: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[2021-03-16 15:39:43.908425 +0000] E [MSGID: 114031] [client-rpc-fops_v2.c:214:client4_0_mkdir_cbk] 0-gv0-client-1: remote operation failed. [{path=(null)}, {errno=22}, {error=Invalid argument}]
[...]
Wonder if you can attach gdb the glustershd process at the function
client4_0_mkdir() and try to print args->loc->path to see on which file
the mkdir is attempted on.
In other words, deleting and recreating the unhealable files
and directories was a workaround, but the underlying problem
persists and I can't even begin to look for it when I have no
clue what errno 22 means in plain English.
In any case, glusterd.log is full of messages like
server-quorum messages in the glusterd log are unrelated, you can raise
a separate github issue for that. (And you can leave it at 'off').
[2021-03-16 15:37:03.398619 +0000] I [MSGID: 106533] [glusterd-volume-ops.c:717:__glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0
[2021-03-16 15:37:03.791452 +0000] E [MSGID: 106061] [glusterd-server-quorum.c:260:glusterd_is_volume_in_server_quorum] 0-management: Dict get failed [{Key=cluster.server-quorum-type}]
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users