Hi David, Looks like you are running a workload that involves lots of rename and geo-rep is trying to handle those. you can try below patches which will give you performance benefits. [1]. https://review.gluster.org/#/c/glusterfs/+/23570/ [2]. https://review.gluster.org/#/c/glusterfs/+/23459/ [3]. https://review.gluster.org/#/c/glusterfs/+/22720/ /sunny On Sat, May 30, 2020 at 9:20 AM Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote: > > Hey David, > > for me a gfid mismatch means that the file was replaced/recreated - just like vim in linux does (and it is expected for config file). > > Have you checked the gfid of the file on both source and destination, do they really match or they are different ? > > What happens when you move away the file from the slave , does it fixes the issue ? > > Best Regards, > Strahil Nikolov > > На 30 май 2020 г. 1:10:56 GMT+03:00, David Cunningham <dcunningham@xxxxxxxxxxxxx> написа: > >Hello, > > > >We're having an issue with a geo-replication process with unusually > >high > >CPU use and giving "Entry not present on master. Fixing gfid mismatch > >in > >slave" errors. Can anyone help on this? > > > >We have 3 GlusterFS replica nodes (we'll call the master), which also > >push > >data to a remote server (slave) using geo-replication. This has been > >running fine for a couple of months, but yesterday one of the master > >nodes > >started having unusually high CPU use. It's this process: > > > >root@cafs30:/var/log/glusterfs# ps aux | grep 32048 > >root 32048 68.7 0.6 1843140 845756 ? Rl 02:51 493:51 > >python2 > >/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py worker > >gvol0 nvfs10::gvol0 --feedback-fd 15 --local-path > >/nodirectwritedata/gluster/gvol0 --local-node cafs30 --local-node-id > >b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id > >cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd 12,11,9,13 --subvol-num 1 > >--resource-remote nvfs30 --resource-remote-id > >1e698ccd-aeec-4ec4-96fe-383da8fc3b78 > > > >Here's what is being logged in > >/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log: > > > >[2020-05-29 21:57:18.843524] I [master(worker > >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster: slave's time > > stime=(1590789408, 0) > >[2020-05-29 21:57:30.626172] I [master(worker > >/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures] > >_GMaster: Entry not present on master. Fixing gfid mismatch in slave. > >Deleting the entry retry_count=1 entry=({u'uid': 108, u'gfid': > >u'7c0b75e5-d8b7-454f-8010-112d613c599e', u'gid': 117, u'mode': 33204, > >u'entry': > >u'.gfid/c5422396-1578-4b50-a29d-315be2a9c5d8/00a859f7xxxx.cfg', > >u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True, > >u'slave_name': None, u'slave_gfid': > >u'ec4b0ace-2ec4-4ea5-adbc-9f519b81917c', u'name_mismatch': False, > >u'dst': > >False}) > >[2020-05-29 21:57:30.627893] I [master(worker > >/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures] > >_GMaster: Entry not present on master. Fixing gfid mismatch in slave. > >Deleting the entry retry_count=1 entry=({u'uid': 108, u'gfid': > >u'a4d52e40-2e2f-4885-be5f-65fe95a8ebd7', u'gid': 117, u'mode': 33204, > >u'entry': > >u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/polycom_00a859f7xxxx.cfg', > >u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True, > >u'slave_name': None, u'slave_gfid': > >u'ece8da77-b5ea-45a7-9af7-7d4d8f55f74a', u'name_mismatch': False, > >u'dst': > >False}) > >[2020-05-29 21:57:30.629532] I [master(worker > >/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures] > >_GMaster: Entry not present on master. Fixing gfid mismatch in slave. > >Deleting the entry retry_count=1 entry=({u'uid': 108, u'gfid': > >u'3c525ad8-aeb2-46b6-9c41-7fb4987916f8', u'gid': 117, u'mode': 33204, > >u'entry': > >u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/00a859f7xxxx-directory.xml', > >u'op': u'CREATE'}, 17, {u'slave_isdir': False, u'gfid_mismatch': True, > >u'slave_name': None, u'slave_gfid': > >u'06717b5a-d842-495d-bd25-aab9cd454490', u'name_mismatch': False, > >u'dst': > >False}) > >[2020-05-29 21:57:30.659123] I [master(worker > >/nodirectwritedata/gluster/gvol0):942:handle_entry_failures] _GMaster: > >Sucessfully fixed entry ops with gfid mismatch retry_count=1 > >[2020-05-29 21:57:30.659343] I [master(worker > >/nodirectwritedata/gluster/gvol0):1194:process_change] _GMaster: Retry > >original entries. count = 1 > >[2020-05-29 21:57:30.725810] I [master(worker > >/nodirectwritedata/gluster/gvol0):1197:process_change] _GMaster: > >Sucessfully fixed all entry ops with gfid mismatch > >[2020-05-29 21:57:31.747319] I [master(worker > >/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync Time Taken > >duration=0.7409 num_files=18 job=1 return_code=0 > > > >We've verified that the files like polycom_00a859f7xxxx.cfg referred to > >in > >the error do exist on the master nodes and slave. > > > >We found this bug fix: > >https://bugzilla.redhat.com/show_bug.cgi?id=1642865 > > > >However that fix went in 5.1, and we're running 5.12 on the master > >nodes > >and slave. A couple of GlusterFS clients connected to the master nodes > >are > >running 5.13. > > > >Would anyone have any suggestions? Thank you in advance. > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > https://lists.gluster.org/mailman/listinfo/gluster-users ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users