Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

We're having an issue with a geo-replication process with unusually high CPU use and giving "Entry not present on master. Fixing gfid mismatch in slave" errors. Can anyone help on this?

We have 3 GlusterFS replica nodes (we'll call the master), which also push data to a remote server (slave) using geo-replication. This has been running fine for a couple of months, but yesterday one of the master nodes started having unusually high CPU use. It's this process:

root@cafs30:/var/log/glusterfs# ps aux | grep 32048
root     32048 68.7  0.6 1843140 845756 ?      Rl   02:51 493:51 python2 /usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py worker gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path /nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1 --resource-remote nvfs30 --resource-remote-id 1e698ccd-aeec-4ec4-96fe-383da8fc3b78

Here's what is being logged in /var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log:

[2020-05-29 21:57:18.843524] I [master(worker /nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time      stime=(1590789408, 0)
[2020-05-29 21:57:30.626172] I [master(worker /nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures] _GMaster: Entry not present on master. Fixing gfid mismatch in slave. Deleting the entry    retry_count=1   entry=({u'uid': 108, u'gfid': u'7c0b75e5-d8b7-454f-8010-112d613c599e', u'gid': 117, u'mode': 33204, u'entry': u'.gfid/c5422396-1578-4b50-a29d-315be2a9c5d8/00a859f7xxxx.cfg', u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True, u'slave_name': None, u'slave_gfid': u'ec4b0ace-2ec4-4ea5-adbc-9f519b81917c', u'name_mismatch': False, u'dst': False})
[2020-05-29 21:57:30.627893] I [master(worker /nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures] _GMaster: Entry not present on master. Fixing gfid mismatch in slave. Deleting the entry    retry_count=1   entry=({u'uid': 108, u'gfid': u'a4d52e40-2e2f-4885-be5f-65fe95a8ebd7', u'gid': 117, u'mode': 33204, u'entry': u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/polycom_00a859f7xxxx.cfg', u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True, u'slave_name': None, u'slave_gfid': u'ece8da77-b5ea-45a7-9af7-7d4d8f55f74a', u'name_mismatch': False, u'dst': False})
[2020-05-29 21:57:30.629532] I [master(worker /nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures] _GMaster: Entry not present on master. Fixing gfid mismatch in slave. Deleting the entry    retry_count=1   entry=({u'uid': 108, u'gfid': u'3c525ad8-aeb2-46b6-9c41-7fb4987916f8', u'gid': 117, u'mode': 33204, u'entry': u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/00a859f7xxxx-directory.xml', u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True, u'slave_name': None, u'slave_gfid': u'06717b5a-d842-495d-bd25-aab9cd454490', u'name_mismatch': False, u'dst': False})
[2020-05-29 21:57:30.659123] I [master(worker /nodirectwritedata/gluster/gvol0):942:handle_entry_failures] _GMaster: Sucessfully fixed entry ops with gfid mismatch     retry_count=1
[2020-05-29 21:57:30.659343] I [master(worker /nodirectwritedata/gluster/gvol0):1194:process_change] _GMaster: Retry original entries. count = 1
[2020-05-29 21:57:30.725810] I [master(worker /nodirectwritedata/gluster/gvol0):1197:process_change] _GMaster: Sucessfully fixed all entry ops with gfid mismatch
[2020-05-29 21:57:31.747319] I [master(worker /nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken   duration=0.7409 num_files=18    job=1   return_code=0

We've verified that the files like polycom_00a859f7xxxx.cfg referred to in the error do exist on the master nodes and slave.

We found this bug fix:
https://bugzilla.redhat.com/show_bug.cgi?id=1642865

However that fix went in 5.1, and we're running 5.12 on the master nodes and slave. A couple of GlusterFS clients connected to the master nodes are running 5.13.

Would anyone have any suggestions? Thank you in advance.

--
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux