Re: Geo-replication: Entry not present on master. Fixing gfid mismatch in slave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello David,

The  .gfid  directory is  there but you cannot traverse (cd) in it - you need to specify just like in the example.I had some cases where the 'transprt endpoint is not connected' was received, but usually this is due to a gfid missing.

About the meetings, one of the topics is discussing open bugs and issues  reported in the mailing  list. It will be nice to join the meeting and discuss that in audio, as there could be other devs willing to join the 'fight'.

@Sankarshan,
any idea  how to enable debug on the python script ?


Best Regards,
Strahil Nikolov


На 12 юни 2020 г. 6:49:57 GMT+03:00, David Cunningham <dcunningham@xxxxxxxxxxxxx> написа:
>Hi Strahil,
>
>Is there a trick to getting the .gfid directory to appear besides
>adding
>"-o aux-gfid-mount" to the mount? I mounted it using "mount -t
>glusterfs -o
>aux-gfid-mount cafs30:/gvol0 /mnt/glusterfs" and there's no .gfid
>directory
>under /mnt/glusterfs.
>
>I haven't tried joining a gluster meeting. Are bugs/problems usually
>discussed on such things? I usually find that people need to look into
>things and respond in their own time so email can be better.
>
>Thanks for your help.
>
>
>On Thu, 11 Jun 2020 at 15:16, Strahil Nikolov <hunter86_bg@xxxxxxxxx>
>wrote:
>
>> You can try the path of a file based on gfid (method 2) via:
>>
>> https://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/
>>
>> The gfids from the strace should be there, but if the file was
>> renamed/deleted   - it is normall  to be missing.
>>
>> Have you joined the last gluster meeting to discuss the problem ?
>>
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> На 11 юни 2020 г. 3:15:36 GMT+03:00, David Cunningham <
>> dcunningham@xxxxxxxxxxxxx> написа:
>> >Hi Strahil,
>> >
>> >Thanks for that. I did search for a file with the gfid in the name,
>on
>> >both
>> >the master nodes and geo-replication slave, but none of them had
>such a
>> >file. I guess maybe by the time I looked the file had been deleted?
>> >Either
>> >that or something is more seriously wrong with invalid gfids.
>> >
>> >BTW, I used strace to try and see what gsyncd was up to when using
>all
>> >that
>> >CPU. Running strace attached to gsyncd for 10 seconds gave 168,000
>> >lines of
>> >output, mostly like the following:
>> >
>> >read(6, "CHANGELOG.1585775398\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) =
>4096
>> >lseek(6, 1088012288, SEEK_SET)          = 1088012288
>> >lseek(6, 1088012288, SEEK_SET)          = 1088012288
>> >read(6, "est\nE da6ed6e8-2b49-4a56-b783-d0"..., 4096) = 4096
>> >lseek(6, 1088016384, SEEK_SET)          = 1088016384
>> >lseek(6, 1088016384, SEEK_SET)          = 1088016384
>> >read(6, "lock-05380315\nE cf5fe292-2ebf-43"..., 4096) = 4096
>> >lseek(6, 1088020480, SEEK_SET)          = 1088020480
>> >lseek(6, 1088020480, SEEK_SET)          = 1088020480
>> >read(6, "7 10a14313-4f92-4071-83cb-c900ef"..., 4096) = 4096
>> >lseek(6, 1088024576, SEEK_SET)          = 1088024576
>> >lseek(6, 1088024576, SEEK_SET)          = 1088024576
>> >read(6, "D b70ba2e8-d954-4fb2-b17a-77c8cc"..., 4096) = 4096
>> >lseek(6, 1088028672, SEEK_SET)          = 1088028672
>> >lseek(6, 1088028672, SEEK_SET)          = 1088028672
>> >read(6, "01681-e324-4f13-ab3a-0e8ae50ff95"..., 4096) = 4096
>> >lseek(6, 1088032768, SEEK_SET)          = 1088032768
>> >read(6, "09ef519/voicemail_1585336530_158"..., 4096) = 4096
>> >lseek(6, 1088036864, SEEK_SET)          = 1088036864
>> >read(6, "6-4539-8d7f-d17fb8f71d6d\nD 1236c"..., 4096) = 4096
>> >lseek(6, 1088040960, SEEK_SET)          = 1088040960
>> >lseek(6, 1088040960, SEEK_SET)          = 1088040960
>> >read(6, "6-4d54-8b9b-4146930b3a2d\nD 88287"..., 4096) = 4096
>> >
>> >I'm guessing those reads are mostly on the files under .glusterfs? I
>> >did
>> >check for files matching the names above and there aren't any
>matching
>> >"da6ed6e8-2b49-4a56-b783-d0", "cf5fe292-2ebf-43", or
>> >"10a14313-4f92-4071-83cb-c900ef" though.
>> >
>> >Any guidance would be appreciated.
>> >
>> >
>> >On Wed, 10 Jun 2020 at 16:06, Strahil Nikolov
><hunter86_bg@xxxxxxxxx>
>> >wrote:
>> >
>> >> Hey David,
>> >>
>> >> Sadly I just have a feeling that on any brick there  is a gfid
>> >mismatch,
>> >> but I could be wrong.
>> >>
>> >> As you have  the gfid list, please check  on all  bricks (both
>master
>> >and
>> >> slave)  that the file exists (not the one in .gluster , but the
>real
>> >one)
>> >> and it has the same gfid.
>> >>
>> >> You can find the inode via ls and then run a find (don't forget
>the
>> >> ionice) against the brick and that inode number.
>> >>
>> >> Once you have the full path to the file , test:
>> >> - Mount with FUSE
>> >> - Check file exists ( no '??????' for permissions, size, etc) and
>can
>> >be
>> >> manipulated (maybe 'touch' can be used ?)
>> >> - Find (on all replica  sets ) the file and check the gfid
>> >> - Check for heals pending for that gfid
>> >>
>> >>
>> >> Best  Regards,
>> >> Strahil Nikolov
>> >>
>> >> На 10 юни 2020 г. 6:37:35 GMT+03:00, David Cunningham <
>> >> dcunningham@xxxxxxxxxxxxx> написа:
>> >> >Hi Strahil,
>> >> >
>> >> >Thank you for that. Do you know if these "Stale file handle"
>errors
>> >on
>> >> >the
>> >> >geo-replication slave could be related?
>> >> >
>> >> >[2020-06-10 01:02:32.268989] E [MSGID: 109040]
>> >> >[dht-helper.c:1332:dht_migration_complete_check_task]
>0-gvol0-dht:
>> >> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba: failed to lookup the
>> >file
>> >> >on
>> >> >gvol0-dht [Stale file handle]
>> >> >[2020-06-10 01:02:32.269092] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >> >0-glusterfs-fuse: 7434237: STAT()
>> >> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>> >handle)
>> >> >[2020-06-10 01:02:32.329280] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >> >0-glusterfs-fuse: 7434251: STAT()
>> >> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>> >handle)
>> >> >[2020-06-10 01:02:32.387129] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >> >0-glusterfs-fuse: 7434264: STAT()
>> >> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>> >handle)
>> >> >[2020-06-10 01:02:32.448838] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >> >0-glusterfs-fuse: 7434277: STAT()
>> >> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>> >handle)
>> >> >[2020-06-10 01:02:32.507196] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >> >0-glusterfs-fuse: 7434290: STAT()
>> >> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>> >handle)
>> >> >[2020-06-10 01:02:32.566033] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >> >0-glusterfs-fuse: 7434303: STAT()
>> >> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>> >handle)
>> >> >[2020-06-10 01:02:32.625168] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >> >0-glusterfs-fuse: 7434316: STAT()
>> >> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>> >handle)
>> >> >[2020-06-10 01:02:32.772442] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >> >0-glusterfs-fuse: 7434329: STAT()
>> >> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>> >handle)
>> >> >[2020-06-10 01:02:32.832481] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >> >0-glusterfs-fuse: 7434342: STAT()
>> >> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>> >handle)
>> >> >[2020-06-10 01:02:32.891835] W [fuse-bridge.c:897:fuse_attr_cbk]
>> >> >0-glusterfs-fuse: 7434403: STAT()
>> >> >/.gfid/d4265a0c-d881-48d8-8ca1-0920ab5ae9ba => -1 (Stale file
>> >handle)
>> >> >
>> >> >
>> >> >
>> >> >On Tue, 9 Jun 2020 at 16:31, Strahil Nikolov
><hunter86_bg@xxxxxxxxx>
>> >> >wrote:
>> >> >
>> >> >> Hey David,
>> >> >>
>> >> >> Can you check the cpu usage  in the sar on the rest of the
>cluster
>> >> >(going
>> >> >> backwards from the day you found the high cpu usage),  so we
>can
>> >know
>> >> >if
>> >> >> this behaviour was obseerved on other nodes.
>> >> >>
>> >> >> Maybe that behaviour was "normal" for the push node (which
>could
>> >be
>> >> >> another one) .
>> >> >>
>> >> >> As  this  script  is python,  I guess  you can put some debug
>> >print
>> >> >> statements in it.
>> >> >>
>> >> >> Best Regards,
>> >> >> Strahil Nikolov
>> >> >>
>> >> >> На 9 юни 2020 г. 5:07:11 GMT+03:00, David Cunningham <
>> >> >> dcunningham@xxxxxxxxxxxxx> написа:
>> >> >> >Hi Sankarshan,
>> >> >> >
>> >> >> >Thanks for that. So what should we look for to figure out what
>> >this
>> >> >> >process
>> >> >> >is doing? In
>> >> >>
>>/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log
>> >we
>> >> >see
>> >> >> >something like the following logged regularly:
>> >> >> >
>> >> >> >
>> >> >> >[[2020-06-09 02:01:19.670595] D [master(worker
>> >> >>
>>/nodirectwritedata/gluster/gvol0):1454:changelogs_batch_process]
>> >> >> >_GMaster:
>> >> >> >processing changes
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>
>>>>batch=['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040',
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>
>>>>'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055']
>> >> >> >[2020-06-09 02:01:19.674927] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):1289:process] _GMaster:
>> >processing
>> >> >> >change
>> >> >> >
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>
>>>>changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040
>> >> >> >[2020-06-09 02:01:19.683098] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):1170:process_change]
>_GMaster:
>> >> >> >entries: []
>> >> >> >[2020-06-09 02:01:19.695125] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster:
>files
>> >> >> >files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',
>> >> >> >'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',
>> >> >> >'.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0',
>> >> >> >'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])
>> >> >> >[2020-06-09 02:01:19.695344] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:
>> >> >candidate
>> >> >> >for
>> >> >> >syncing file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
>> >> >> >[2020-06-09 02:01:19.695508] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:
>> >> >candidate
>> >> >> >for
>> >> >> >syncing file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
>> >> >> >[2020-06-09 02:01:19.695638] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:
>> >> >candidate
>> >> >> >for
>> >> >> >syncing file=.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0
>> >> >> >[2020-06-09 02:01:19.695759] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:
>> >> >candidate
>> >> >> >for
>> >> >> >syncing file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
>> >> >> >[2020-06-09 02:01:19.695883] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):1289:process] _GMaster:
>> >processing
>> >> >> >change
>> >> >> >
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>
>>>>changelog=/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055
>> >> >> >[2020-06-09 02:01:19.696170] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):1170:process_change]
>_GMaster:
>> >> >> >entries: []
>> >> >> >[2020-06-09 02:01:19.714097] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):312:a_syncdata] _GMaster:
>files
>> >> >> >files=set(['.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17',
>> >> >> >'.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77',
>> >> >> >'.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435'])
>> >> >> >[2020-06-09 02:01:19.714286] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:
>> >> >candidate
>> >> >> >for
>> >> >> >syncing file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
>> >> >> >[2020-06-09 02:01:19.714433] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:
>> >> >candidate
>> >> >> >for
>> >> >> >syncing file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
>> >> >> >[2020-06-09 02:01:19.714577] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):315:a_syncdata] _GMaster:
>> >> >candidate
>> >> >> >for
>> >> >> >syncing file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
>> >> >> >[2020-06-09 02:01:20.179656] D [resource(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):1419:rsync] SSH: files:
>> >> >> >.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17,
>> >> >> >.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77,
>> >> >> >.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0,
>> >> >> >.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435,
>> >> >> >.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17,
>> >> >> >.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77,
>> >> >> >.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
>> >> >> >[2020-06-09 02:01:20.738632] I [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):1954:syncjob] Syncer: Sync
>Time
>> >> >Taken
>> >> >> >duration=0.5588 num_files=7     job=2   return_code=0
>> >> >> >[2020-06-09 02:01:20.739650] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
>> >> >> > file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
>> >> >> >[2020-06-09 02:01:20.740041] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
>> >> >> > file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
>> >> >> >[2020-06-09 02:01:20.740200] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
>> >> >> > file=.gfid/779cd2b3-1571-446a-8903-48d6183d3dd0
>> >> >> >[2020-06-09 02:01:20.740343] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
>> >> >> > file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
>> >> >> >[2020-06-09 02:01:20.740482] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
>> >> >> > file=.gfid/0f98f9cd-1800-4c0f-b449-edcd7446bf17
>> >> >> >[2020-06-09 02:01:20.740616] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
>> >> >> > file=.gfid/512b4710-5af7-4e5a-8f3a-0a3dece42f77
>> >> >> >[2020-06-09 02:01:20.740741] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):321:regjob] _GMaster: synced
>> >> >> > file=.gfid/8ae32eec-f766-4cd9-a788-4561ba1fa435
>> >> >> >[2020-06-09 02:01:20.745385] D [repce(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
>> >> >> >22499:140085349934848:1591668080.75
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>
>>>>done('/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668040',)
>> >> >> >...
>> >> >> >[2020-06-09 02:01:20.746252] D [repce(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient:
>call
>> >> >> >22499:140085349934848:1591668080.75 done -> None
>> >> >> >[2020-06-09 02:01:20.746515] D [repce(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
>> >> >> >22499:140085349934848:1591668080.75
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>
>>>>done('/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668055',)
>> >> >> >...
>> >> >> >[2020-06-09 02:01:20.747045] D [repce(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient:
>call
>> >> >> >22499:140085349934848:1591668080.75 done -> None
>> >> >> >[2020-06-09 02:01:31.364780] I [gsyncd(config-get):308:main]
>> ><top>:
>> >> >> >Using
>> >> >> >session config file
>> >> >>
>> >>
>>
>>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
>> >> >> >[2020-06-09 02:01:31.458915] I [gsyncd(status):308:main]
><top>:
>> >> >Using
>> >> >> >session config file
>> >> >>
>> >>
>>
>>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
>> >> >> >[2020-06-09 02:01:32.156028] I [gsyncd(config-get):308:main]
>> ><top>:
>> >> >> >Using
>> >> >> >session config file
>> >> >>
>> >>
>>
>>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
>> >> >> >[2020-06-09 02:01:32.250082] I [gsyncd(status):308:main]
><top>:
>> >> >Using
>> >> >> >session config file
>> >> >>
>> >>
>>
>>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
>> >> >> >[2020-06-09 02:01:33.778866] I [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):1384:process] _GMaster:
>Entry
>> >Time
>> >> >> >Taken
>> >> >> >       MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0   CRE=0
>> >> >> >duration=0.0000 UNL=0
>> >> >> >[2020-06-09 02:01:33.779000] I [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):1394:process] _GMaster:
>> >> >Data/Metadata
>> >> >> >Time Taken        SETA=0  SETX=0  meta_duration=0.0000
>> >> >> > data_duration=14.0839   DATA=11 XATT=0
>> >> >> >[2020-06-09 02:01:33.779195] I [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):1404:process] _GMaster:
>Batch
>> >> >> >Completed
>> >> >> >changelog_end=1591668055        entry_stime=(1591668024, 0)
>> >> >> >changelog_start=1591668040      stime=(1591668054, 0)
>> >> >> >duration=14.1043
>> >> >> >     num_changelogs=2        mode=live_changelog
>> >> >> >[2020-06-09 02:01:35.337511] I [gsyncd(config-get):308:main]
>> ><top>:
>> >> >> >Using
>> >> >> >session config file
>> >> >>
>> >>
>>
>>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
>> >> >> >[2020-06-09 02:01:35.432674] I [gsyncd(status):308:main]
><top>:
>> >> >Using
>> >> >> >session config file
>> >> >>
>> >>
>>
>>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
>> >> >> >[2020-06-09 02:01:38.784449] D [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):551:crawlwrap] _GMaster: ...
>> >crawl
>> >> >> >#18195
>> >> >> >done, took 32.533401 seconds
>> >> >> >[2020-06-09 02:01:38.790709] D [repce(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
>> >> >> >22499:140085349934848:1591668098.79 scan() ...
>> >> >> >[2020-06-09 02:01:38.791342] D [repce(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient:
>call
>> >> >> >22499:140085349934848:1591668098.79 scan -> None
>> >> >> >[2020-06-09 02:01:38.791479] D [repce(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient: call
>> >> >> >22499:140085349934848:1591668098.79 getchanges() ...
>> >> >> >[2020-06-09 02:01:38.791989] D [repce(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):216:__call__] RepceClient:
>call
>> >> >> >22499:140085349934848:1591668098.79 getchanges ->
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>
>>>>['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668069',
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>
>>>>'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668083',
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>
>>>>'/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591668097']
>> >> >> >[2020-06-09 02:01:38.792114] I [master(worker
>> >> >> >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster:
>slave's
>> >time
>> >> >> > stime=(1591668054, 0)
>> >> >> >
>> >> >> >Something very similar is logged around every 32 seconds, and
>I
>> >> >notice
>> >> >> >the
>> >> >> >log above says the crawl took 32 seconds.
>> >> >> >
>> >> >> >
>> >> >> >On Mon, 8 Jun 2020 at 16:22, sankarshan <sankarshan@xxxxxxxxx>
>> >> >wrote:
>> >> >> >
>> >> >> >> Reading through the thread it occurs to me that it would be
>a
>> >> >> >stronger
>> >> >> >> approach to understand the workload (a general description
>of
>> >the
>> >> >> >> application) and in terms of the releases of GlusterFS
>running,
>> >> >> >assess
>> >> >> >> if there are new issues to be addressed or if existing sets
>of
>> >> >> >patches
>> >> >> >> tend to work. Thanks for setting up the debug level in the
>log.
>> >> >High
>> >> >> >> CPU usage by a geo-replication process would need to be
>traced
>> >> >back
>> >> >> >to
>> >> >> >> why it really requires that %-age of CPU if it was not doing
>so
>> >> >> >> previously.
>> >> >> >>
>> >> >> >> On Mon, 8 Jun 2020 at 05:29, David Cunningham
>> >> >> ><dcunningham@xxxxxxxxxxxxx>
>> >> >> >> wrote:
>> >> >> >> >
>> >> >> >> > Hi Strahil,
>> >> >> >> >
>> >> >> >> > The CPU is still quite high, with "top" regularly showing
>> >100%
>> >> >CPU
>> >> >> >usage
>> >> >> >> by that process. However it's not clear whether this is
>really
>> >a
>> >> >> >problem,
>> >> >> >> or if it's just normal geo-replication activity. While CPU
>> >usage
>> >> >was
>> >> >> >not
>> >> >> >> previously as high on this server, it's not clear whether
>> >> >GlusterFS
>> >> >> >might
>> >> >> >> have made this the "push" node for geo-replication when it
>used
>> >to
>> >> >be
>> >> >> >some
>> >> >> >> other server. Having said that, no other server shows a
>sudden
>> >> >drop
>> >> >> >in CPU
>> >> >> >> usage.
>> >> >> >> >
>> >> >> >> > Ideally we could find out what the gsyncd process is
>doing,
>> >and
>> >> >> >> therefore whether it's expected or not. Any ideas on that?
>We
>> >did
>> >> >set
>> >> >> >the
>> >> >> >> log level to DEBUG as Sunny suggested.
>> >> >> >> >
>> >> >> >> > Looking at
>> >> >> >>
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>
>>>>/var/log/glusterfs/geo-replication-slaves/gvol0_nvfs10_gvol0/mnt-cafs30-nodirectwritedata-gluster-gvol0.log
>> >> >> >> on the geo-replication slave, we have a lot of lines like
>the
>> >> >> >following
>> >> >> >> logged.
>> >> >> >> >
>> >> >> >> > [2020-06-06 00:32:43.155856] W
>> >[fuse-bridge.c:897:fuse_attr_cbk]
>> >> >> >> 0-glusterfs-fuse: 875853: STAT()
>> >> >> >> /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac => -1 (Stale
>file
>> >> >handle)
>> >> >> >> > [2020-06-06 00:32:43.219759] W
>> >[fuse-bridge.c:897:fuse_attr_cbk]
>> >> >> >> 0-glusterfs-fuse: 875923: STAT()
>> >> >> >> /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac => -1 (Stale
>file
>> >> >handle)
>> >> >> >> > [2020-06-06 00:32:43.280357] W
>> >[fuse-bridge.c:897:fuse_attr_cbk]
>> >> >> >> 0-glusterfs-fuse: 876001: STAT()
>> >> >> >> /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac => -1 (Stale
>file
>> >> >handle)
>> >> >> >> > The message "E [MSGID: 109040]
>> >> >> >> [dht-helper.c:1332:dht_migration_complete_check_task]
>> >0-gvol0-dht:
>> >> >> >> /.gfid/25d78a6c-41a9-4364-84a8-31f5571223ac: failed to
>lookup
>> >the
>> >> >> >file on
>> >> >> >> gvol0-dht [Stale file handle]" repeated 9 times between
>> >> >[2020-06-06
>> >> >> >> 00:32:42.689780] and [2020-06-06 00:32:43.280322]
>> >> >> >> > [2020-06-06 09:03:03.660956] E [MSGID: 109040]
>> >> >> >> [dht-helper.c:1332:dht_migration_complete_check_task]
>> >0-gvol0-dht:
>> >> >> >> /.gfid/decdd552-d58b-4ddf-b27a-0da9a6fbc38b: failed to
>lookup
>> >the
>> >> >> >file on
>> >> >> >> gvol0-dht [Stale file handle]
>> >> >> >> > [2020-06-06 09:03:03.661057] W
>> >[fuse-bridge.c:897:fuse_attr_cbk]
>> >> >> >> 0-glusterfs-fuse: 965375: STAT()
>> >> >> >> /.gfid/decdd552-d58b-4ddf-b27a-0da9a6fbc38b => -1 (Stale
>file
>> >> >handle)
>> >> >> >> > [2020-06-06 09:03:10.258798] E [MSGID: 109040]
>> >> >> >> [dht-helper.c:1332:dht_migration_complete_check_task]
>> >0-gvol0-dht:
>> >> >> >> /.gfid/1d90e129-d40f-4ca4-bea8-0bb3c4c7985a: failed to
>lookup
>> >the
>> >> >> >file on
>> >> >> >> gvol0-dht [Stale file handle]
>> >> >> >> > [2020-06-06 09:03:10.258880] W
>> >[fuse-bridge.c:897:fuse_attr_cbk]
>> >> >> >> 0-glusterfs-fuse: 969455: STAT()
>> >> >> >> /.gfid/1d90e129-d40f-4ca4-bea8-0bb3c4c7985a => -1 (Stale
>file
>> >> >handle)
>> >> >> >> > [2020-06-06 09:09:41.259362] E [MSGID: 109040]
>> >> >> >> [dht-helper.c:1332:dht_migration_complete_check_task]
>> >0-gvol0-dht:
>> >> >> >> /.gfid/740efff6-74dd-45be-847f-66919d1179e0: failed to
>lookup
>> >the
>> >> >> >file on
>> >> >> >> gvol0-dht [Stale file handle]
>> >> >> >> > [2020-06-06 09:09:41.259458] W
>> >[fuse-bridge.c:897:fuse_attr_cbk]
>> >> >> >> 0-glusterfs-fuse: 1040904: STAT()
>> >> >> >> /.gfid/740efff6-74dd-45be-847f-66919d1179e0 => -1 (Stale
>file
>> >> >handle)
>> >> >> >> >
>> >> >> >> > Could these errors be part of the problem?
>> >> >> >> >
>> >> >> >> > Thanks again.
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Sat, 6 Jun 2020 at 21:21, Strahil Nikolov
>> >> >> ><hunter86_bg@xxxxxxxxx>
>> >> >> >> wrote:
>> >> >> >> >>
>> >> >> >> >> Hey David,
>> >> >> >> >>
>> >> >> >> >> can you check the old logs for gfid mismatch and get a
>list
>> >of
>> >> >> >files
>> >> >> >> that were causing the high cpu .
>> >> >> >> >> Maybe they are  related  somehow (maybe created by the
>same
>> >> >> >software
>> >> >> >> ,  same client version or something else) which could help
>> >about
>> >> >> >that.
>> >> >> >> >>
>> >> >> >> >> Also  take  a  look in geo-replication-slave logs.
>> >> >> >> >>
>> >> >> >> >> Does the issue still occurs ?
>> >> >> >> >>
>> >> >> >> >> Best Regards,
>> >> >> >> >> Strahil Nikolov
>> >> >> >> >>
>> >> >> >> >> На 6 юни 2020 г. 1:21:55 GMT+03:00, David Cunningham <
>> >> >> >> dcunningham@xxxxxxxxxxxxx> написа:
>> >> >> >> >> >Hi Sunny and Strahil,
>> >> >> >> >> >
>> >> >> >> >> >Thanks again for your responses. We don't have a lot of
>> >> >renaming
>> >> >> >> >> >activity -
>> >> >> >> >> >maybe some, but not a lot. We do have files which are
>open
>> >for
>> >> >> >writing
>> >> >> >> >> >for
>> >> >> >> >> >quite a while - they're call recordings being written as
>> >the
>> >> >call
>> >> >> >> >> >happens.
>> >> >> >> >> >
>> >> >> >> >> >We've installed GlusterFS using the Ubuntu packages and
>I'd
>> >> >> >really like
>> >> >> >> >> >to
>> >> >> >> >> >avoid compiling and applying patches.
>> >> >> >> >> >
>> >> >> >> >> >After enabling DEBUG the log at
>> >> >> >> >>
>> >> >>/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log
>> >> >> >> >> >doesn't
>> >> >> >> >> >seem to show anything very unusual:
>> >> >> >> >> >
>> >> >> >> >> >[2020-06-03 02:49:01.992177] I [master(worker
>> >> >> >> >> >/nodirectwritedata/gluster/gvol0):1384:process]
>_GMaster:
>> >> >Entry
>> >> >> >Time
>> >> >> >> >> >Taken
>> >> >> >> >> >   MKD=0   MKN=0   LIN=0   SYM=0   REN=0   RMD=0CRE=0
>> >> >> >duration=0.0000
>> >> >> >> >> >UNL=0
>> >> >> >> >> >[2020-06-03 02:49:01.992465] I [master(worker
>> >> >> >> >> >/nodirectwritedata/gluster/gvol0):1394:process]
>_GMaster:
>> >> >> >Data/Metadata
>> >> >> >> >> >Time Taken        SETA=0  SETX=0
>> >> >> >> >> >meta_duration=0.0000data_duration=13.0954
>> >> >> >> >> >   DATA=8  XATT=0
>> >> >> >> >> >[2020-06-03 02:49:01.992863] I [master(worker
>> >> >> >> >> >/nodirectwritedata/gluster/gvol0):1404:process]
>_GMaster:
>> >> >Batch
>> >> >> >> >> >Completed
>> >> >> >> >> >changelog_end=1591152508        entry_stime=(1591152352,
>0)
>> >> >> >> >> >changelog_start=1591152494      stime=(1591152507, 0)
>> >> >> >> >> >duration=13.1077
>> >> >> >> >> >     num_changelogs=2        mode=live_changelog
>> >> >> >> >> >[2020-06-03 02:49:02.958687] D [repce(worker
>> >> >> >> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient:
>> >call
>> >> >> >> >> >19017:139678812452608:1591152542.96
>keep_alive({'version':
>> >(1,
>> >> >> >0),
>> >> >> >> >> >'retval': 0, 'uuid':
>> >'8ee85fae-f3aa-4285-ad48-67a1dc17ed73',
>> >> >> >'timeout':
>> >> >> >> >> >1591152662, 'volume_mark': (1583043396, 161632)},) ...
>> >> >> >> >> >[2020-06-03 02:49:02.979139] D [repce(worker
>> >> >> >> >> >/nodirectwritedata/gluster/gvol0):216:__call__]
>> >RepceClient:
>> >> >call
>> >> >> >> >> >19017:139678812452608:1591152542.96 keep_alive -> 28
>> >> >> >> >> >[2020-06-03 02:49:06.998127] D [master(worker
>> >> >> >> >> >/nodirectwritedata/gluster/gvol0):551:crawlwrap]
>_GMaster:
>> >...
>> >> >> >crawl
>> >> >> >> >> >#114
>> >> >> >> >> >done, took 30.180089 seconds
>> >> >> >> >> >[2020-06-03 02:49:07.10132] D [repce(worker
>> >> >> >> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient:
>> >call
>> >> >> >> >> >19017:139679441716992:1591152547.01 scan() ...
>> >> >> >> >> >[2020-06-03 02:49:07.10781] D [repce(worker
>> >> >> >> >> >/nodirectwritedata/gluster/gvol0):216:__call__]
>> >RepceClient:
>> >> >call
>> >> >> >> >> >19017:139679441716992:1591152547.01 scan -> None
>> >> >> >> >> >[2020-06-03 02:49:07.10935] D [repce(worker
>> >> >> >> >> >/nodirectwritedata/gluster/gvol0):196:push] RepceClient:
>> >call
>> >> >> >> >> >19017:139679441716992:1591152547.01 getchanges() ...
>> >> >> >> >> >[2020-06-03 02:49:07.11579] D [repce(worker
>> >> >> >> >> >/nodirectwritedata/gluster/gvol0):216:__call__]
>> >RepceClient:
>> >> >call
>> >> >> >> >> >19017:139679441716992:1591152547.01 getchanges ->
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>
>>>>>['/var/lib/misc/gluster/gsyncd/gvol0_nvfs10_gvol0/nodirectwritedata-gluster-gvol0/.processing/CHANGELOG.1591152522']
>> >> >> >> >> >[2020-06-03 02:49:07.11720] I [master(worker
>> >> >> >> >> >/nodirectwritedata/gluster/gvol0):1470:crawl] _GMaster:
>> >> >slave's
>> >> >> >time
>> >> >> >> >> >stime=(1591152507, 0)
>> >> >> >> >> >
>> >> >> >> >> >Am I looking in the right place to find out what that
>> >> >gsyncd.py
>> >> >> >process
>> >> >> >> >> >is
>> >> >> >> >> >doing?
>> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >> >On Tue, 2 Jun 2020 at 21:58, Sunny Kumar
>> ><sunkumar@xxxxxxxxxx>
>> >> >> >wrote:
>> >> >> >> >> >
>> >> >> >> >> >> Hi David,
>> >> >> >> >> >>
>> >> >> >> >> >> You haven't answered my previous question regarding
>the
>> >type
>> >> >of
>> >> >> >your
>> >> >> >> >> >> workload.
>> >> >> >> >> >> ---
>> >> >> >> >> >> You can use the below command to enable debug log.
>> >> >> >> >> >>
>> >> >> >> >> >> `gluster vol geo-rep <mastervol>
><slavehost>::<slavevol>
>> >> >config
>> >> >> >> >> >log-level
>> >> >> >> >> >> DEBUG`
>> >> >> >> >> >>
>> >> >> >> >> >> and after capturing log again switch back to info
>mode:
>> >> >> >> >> >>
>> >> >> >> >> >> `gluster vol geo-rep <mastervol>
><slavehost>::<slavevol>
>> >> >config
>> >> >> >> >> >log-level
>> >> >> >> >> >> INFO`
>> >> >> >> >> >>
>> >> >> >> >> >> Please share the debug log and geo-rep config to debug
>> >> >further:
>> >> >> >> >> >> for config:
>> >> >> >> >> >>
>> >> >> >> >> >> `gluster vol geo-rep <mastervol>
><slavehost>::<slavevol>
>> >> >> >config`
>> >> >> >> >> >>
>> >> >> >> >> >> /sunny
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >> >> On Tue, Jun 2, 2020 at 10:18 AM Strahil Nikolov
>> >> >> >> >> ><hunter86_bg@xxxxxxxxx>
>> >> >> >> >> >> wrote:
>> >> >> >> >> >> >
>> >> >> >> >> >> > Hi David,
>> >> >> >> >> >> >
>> >> >> >> >> >> > in which log do you see the entries ?
>> >> >> >> >> >> >
>> >> >> >> >> >> > I think I got an explanation why you see the process
>> >only
>> >> >on
>> >> >> >one
>> >> >> >> >> >of the
>> >> >> >> >> >> master nodes -  geo-rep session is established from
>only
>> >1
>> >> >> >master
>> >> >> >> >> >node  /I
>> >> >> >> >> >> hope someone corrects me if I'm wrong/ to one slave
>node.
>> >> >Thus
>> >> >> >it
>> >> >> >> >> >will be
>> >> >> >> >> >> natural to see the high CPU  usage on only 1 master
>node
>> >in
>> >> >> >your
>> >> >> >> >> >situation.
>> >> >> >> >> >> >
>> >> >> >> >> >> > Do you see anything else  in the :
>> >> >> >> >> >> var/log/glusterfs/geo-replication/<logs> (master
>nodes)
>> >or
>> >> >in
>> >> >> >> >> >> /var/log/glusterfs/geo-replication-slaves (slaves)
>that
>> >> >could
>> >> >> >hint of
>> >> >> >> >> >the
>> >> >> >> >> >> exact issue. I have  a vague feeling that that python
>> >script
>> >> >is
>> >> >> >> >> >constantly
>> >> >> >> >> >> looping over some data causing the CPU hog.
>> >> >> >> >> >> >
>> >> >> >> >> >> > Sadly, I can't find an instruction for increasing
>the
>> >log
>> >> >> >level of
>> >> >> >> >> >the
>> >> >> >> >> >> geo rep log .
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > Best  Regards,
>> >> >> >> >> >> > Strahil  Nikolov
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > На 2 юни 2020 г. 6:14:46 GMT+03:00, David Cunningham
><
>> >> >> >> >> >> dcunningham@xxxxxxxxxxxxx> написа:
>> >> >> >> >> >> > >Hi Strahil and Sunny,
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >Thank you for the replies. I checked the gfid on
>the
>> >> >master
>> >> >> >and
>> >> >> >> >> >slaves
>> >> >> >> >> >> > >and
>> >> >> >> >> >> > >they are the same. After moving the file away and
>back
>> >> >again
>> >> >> >it
>> >> >> >> >> >doesn't
>> >> >> >> >> >> > >seem to be having the issue with that file any
>more.
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >We are still getting higher CPU usage on one of the
>> >> >master
>> >> >> >nodes
>> >> >> >> >> >than
>> >> >> >> >> >> > >the
>> >> >> >> >> >> > >others. It logs this every few seconds:
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >[2020-06-02 03:10:15.637815] I [master(worker
>> >> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1384:process]
>> >_GMaster:
>> >> >> >Entry
>> >> >> >> >> >Time
>> >> >> >> >> >> > >Taken
>> >> >> >> >> >> > >   MKD=0   MKN=0   LIN=0   SYM=0   REN=0  
>RMD=0CRE=0
>> >> >> >> >> >duration=0.0000
>> >> >> >> >> >> > >UNL=0
>> >> >> >> >> >> > >[2020-06-02 03:10:15.638010] I [master(worker
>> >> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1394:process]
>> >_GMaster:
>> >> >> >> >> >Data/Metadata
>> >> >> >> >> >> > >Time Taken        SETA=0  SETX=0
>> >> >> >> >> >> > >meta_duration=0.0000data_duration=12.7878
>> >> >> >> >> >> > >   DATA=4  XATT=0
>> >> >> >> >> >> > >[2020-06-02 03:10:15.638286] I [master(worker
>> >> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1404:process]
>> >_GMaster:
>> >> >> >Batch
>> >> >> >> >> >> > >Completed
>> >> >> >> >> >> > >changelog_end=1591067378
>> >entry_stime=(1591067167,
>> >> >0)
>> >> >> >> >> >> > >changelog_start=1591067364      stime=(1591067377,
>0)
>> >> >> >> >> >> > >duration=12.8068
>> >> >> >> >> >> > >     num_changelogs=2        mode=live_changelog
>> >> >> >> >> >> > >[2020-06-02 03:10:20.658601] I [master(worker
>> >> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1470:crawl]
>> >_GMaster:
>> >> >> >slave's
>> >> >> >> >> >time
>> >> >> >> >> >> > > stime=(1591067377, 0)
>> >> >> >> >> >> > >[2020-06-02 03:10:34.21799] I [master(worker
>> >> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1954:syncjob]
>> >Syncer:
>> >> >Sync
>> >> >> >Time
>> >> >> >> >> >Taken
>> >> >> >> >> >> > > duration=0.3826 num_files=8     job=1  
>return_code=0
>> >> >> >> >> >> > >[2020-06-02 03:10:46.440535] I [master(worker
>> >> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1384:process]
>> >_GMaster:
>> >> >> >Entry
>> >> >> >> >> >Time
>> >> >> >> >> >> > >Taken
>> >> >> >> >> >> > >   MKD=0   MKN=0   LIN=0   SYM=0   REN=1  
>RMD=0CRE=2
>> >> >> >> >> >duration=0.1314
>> >> >> >> >> >> > >UNL=1
>> >> >> >> >> >> > >[2020-06-02 03:10:46.440809] I [master(worker
>> >> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1394:process]
>> >_GMaster:
>> >> >> >> >> >Data/Metadata
>> >> >> >> >> >> > >Time Taken        SETA=0  SETX=0
>> >> >> >> >> >> > >meta_duration=0.0000data_duration=13.0171
>> >> >> >> >> >> > >   DATA=14 XATT=0
>> >> >> >> >> >> > >[2020-06-02 03:10:46.441205] I [master(worker
>> >> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1404:process]
>> >_GMaster:
>> >> >> >Batch
>> >> >> >> >> >> > >Completed
>> >> >> >> >> >> > >changelog_end=1591067420
>> >entry_stime=(1591067419,
>> >> >0)
>> >> >> >> >> >> > >changelog_start=1591067392      stime=(1591067419,
>0)
>> >> >> >> >> >> > >duration=13.0322
>> >> >> >> >> >> > >     num_changelogs=3        mode=live_changelog
>> >> >> >> >> >> > >[2020-06-02 03:10:51.460925] I [master(worker
>> >> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1470:crawl]
>> >_GMaster:
>> >> >> >slave's
>> >> >> >> >> >time
>> >> >> >> >> >> > > stime=(1591067419, 0)
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >[2020-06-02 03:11:04.448913] I [master(worker
>> >> >> >> >> >> > >/nodirectwritedata/gluster/gvol0):1954:syncjob]
>> >Syncer:
>> >> >Sync
>> >> >> >Time
>> >> >> >> >> >Taken
>> >> >> >> >> >> > >duration=0.3466 num_files=3     job=1  
>return_code=0
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >Whereas the other master nodes only log this:
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >[2020-06-02 03:11:33.886938] I
>> >> >[gsyncd(config-get):308:main]
>> >> >> >> >> ><top>:
>> >> >> >> >> >> > >Using
>> >> >> >> >> >> > >session config file
>> >> >> >> >> >> >
>> >> >> >> >>
>> >> >>
>> >>
>>
>>>>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
>> >> >> >> >> >> > >[2020-06-02 03:11:33.993175] I
>> >[gsyncd(status):308:main]
>> >> >> ><top>:
>> >> >> >> >> >Using
>> >> >> >> >> >> > >session config file
>> >> >> >> >> >> >
>> >> >> >> >>
>> >> >>
>> >>
>>
>>>>>>path=/var/lib/glusterd/geo-replication/gvol0_nvfs10_gvol0/gsyncd.conf
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >Can anyone help with what might cause the high CPU
>> >usage
>> >> >on
>> >> >> >one
>> >> >> >> >> >master
>> >> >> >> >> >> > >node? The process is this one, and is using 70-100%
>of
>> >> >CPU:
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >python2
>> >> >> >> >>
>> >> >>/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py
>> >> >> >> >> >> > >worker gvol0 nvfs10::gvol0 --feedback-fd 15
>> >--local-path
>> >> >> >> >> >> > >/nodirectwritedata/gluster/gvol0 --local-node
>cafs30
>> >> >> >> >> >--local-node-id
>> >> >> >> >> >> > >b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
>> >> >> >> >> >> > >cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd
>> >12,11,9,13
>> >> >> >> >> >--subvol-num 1
>> >> >> >> >> >> > >--resource-remote nvfs30 --resource-remote-id
>> >> >> >> >> >> > >1e698ccd-aeec-4ec4-96fe-383da8fc3b78
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >Thank you in advance!
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >On Sat, 30 May 2020 at 20:20, Strahil Nikolov
>> >> >> >> >> ><hunter86_bg@xxxxxxxxx>
>> >> >> >> >> >> > >wrote:
>> >> >> >> >> >> > >
>> >> >> >> >> >> > >> Hey David,
>> >> >> >> >> >> > >>
>> >> >> >> >> >> > >> for me a gfid  mismatch means  that the file  was
>> >> >> >> >> >replaced/recreated
>> >> >> >> >> >> > > -
>> >> >> >> >> >> > >> just like  vim in linux does (and it is expected
>for
>> >> >> >config
>> >> >> >> >> >file).
>> >> >> >> >> >> > >>
>> >> >> >> >> >> > >> Have  you checked the gfid  of  the file on both
>> >source
>> >> >> >and
>> >> >> >> >> >> > >destination,
>> >> >> >> >> >> > >> do they really match or they are different ?
>> >> >> >> >> >> > >>
>> >> >> >> >> >> > >> What happens  when you move away the file  from
>the
>> >> >slave
>> >> >> >,
>> >> >> >> >> >does it
>> >> >> >> >> >> > >fixes
>> >> >> >> >> >> > >> the issue ?
>> >> >> >> >> >> > >>
>> >> >> >> >> >> > >> Best Regards,
>> >> >> >> >> >> > >> Strahil Nikolov
>> >> >> >> >> >> > >>
>> >> >> >> >> >> > >> На 30 май 2020 г. 1:10:56 GMT+03:00, David
>> >Cunningham <
>> >> >> >> >> >> > >> dcunningham@xxxxxxxxxxxxx> написа:
>> >> >> >> >> >> > >> >Hello,
>> >> >> >> >> >> > >> >
>> >> >> >> >> >> > >> >We're having an issue with a geo-replication
>> >process
>> >> >with
>> >> >> >> >> >unusually
>> >> >> >> >> >> > >> >high
>> >> >> >> >> >> > >> >CPU use and giving "Entry not present on master.
>> >> >Fixing
>> >> >> >gfid
>> >> >> >> >> >> > >mismatch
>> >> >> >> >> >> > >> >in
>> >> >> >> >> >> > >> >slave" errors. Can anyone help on this?
>> >> >> >> >> >> > >> >
>> >> >> >> >> >> > >> >We have 3 GlusterFS replica nodes (we'll call
>the
>> >> >> >master),
>> >> >> >> >> >which
>> >> >> >> >> >> > >also
>> >> >> >> >> >> > >> >push
>> >> >> >> >> >> > >> >data to a remote server (slave) using
>> >geo-replication.
>> >> >> >This has
>> >> >> >> >> >been
>> >> >> >> >> >> > >> >running fine for a couple of months, but
>yesterday
>> >one
>> >> >of
>> >> >> >the
>> >> >> >> >> >master
>> >> >> >> >> >> > >> >nodes
>> >> >> >> >> >> > >> >started having unusually high CPU use. It's this
>> >> >process:
>> >> >> >> >> >> > >> >
>> >> >> >> >> >> > >> >root@cafs30:/var/log/glusterfs# ps aux | grep
>32048
>> >> >> >> >> >> > >> >root     32048 68.7  0.6 1843140 845756 ?     
>Rl
>> >> >02:51
>> >> >> >> >> >493:51
>> >> >> >> >> >> > >> >python2
>> >> >> >> >> >> > >>
>> >> >>
>>>/usr/lib/x86_64-linux-gnu/glusterfs/python/syncdaemon/gsyncd.py
>> >> >> >> >> >> > >worker
>> >> >> >> >> >> > >> >gvol0 nvfs10::gvol0 --feedback-fd 15
>--local-path
>> >> >> >> >> >> > >> >/nodirectwritedata/gluster/gvol0 --local-node
>> >cafs30
>> >> >> >> >> >--local-node-id
>> >> >> >> >> >> > >> >b7521445-ee93-4fed-8ced-6a609fa8c7d4 --slave-id
>> >> >> >> >> >> > >> >cdcdb210-839c-4306-a4dc-e696b165ed17 --rpc-fd
>> >> >12,11,9,13
>> >> >> >> >> >> > >--subvol-num 1
>> >> >> >> >> >> > >> >--resource-remote nvfs30 --resource-remote-id
>> >> >> >> >> >> > >> >1e698ccd-aeec-4ec4-96fe-383da8fc3b78
>> >> >> >> >> >> > >> >
>> >> >> >> >> >> > >> >Here's what is being logged in
>> >> >> >> >> >> > >>
>> >> >> >> >>
>> >> >>
>> >>>>/var/log/glusterfs/geo-replication/gvol0_nvfs10_gvol0/gsyncd.log:
>> >> >> >> >> >> > >> >
>> >> >> >> >> >> > >> >[2020-05-29 21:57:18.843524] I [master(worker
>> >> >> >> >> >> > >> >/nodirectwritedata/gluster/gvol0):1470:crawl]
>> >> >_GMaster:
>> >> >> >slave's
>> >> >> >> >> >time
>> >> >> >> >> >> > >> > stime=(1590789408, 0)
>> >> >> >> >> >> > >> >[2020-05-29 21:57:30.626172] I [master(worker
>> >> >> >> >> >> > >>
>> >> >> >> >>
>> >> >>
>>
>>>>>/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
>> >> >> >> >> >> > >> >_GMaster: Entry not present on master. Fixing
>gfid
>> >> >> >mismatch in
>> >> >> >> >> >> > >slave.
>> >> >> >> >> >> > >> >Deleting the entry    retry_count=1
>> >entry=({u'uid':
>> >> >> >108,
>> >> >> >> >> >u'gfid':
>> >> >> >> >> >> > >> >u'7c0b75e5-d8b7-454f-8010-112d613c599e', u'gid':
>> >117,
>> >> >> >u'mode':
>> >> >> >> >> >> > >33204,
>> >> >> >> >> >> > >> >u'entry':
>> >> >> >> >> >> > >>
>> >> >>
>>>u'.gfid/c5422396-1578-4b50-a29d-315be2a9c5d8/00a859f7xxxx.cfg',
>> >> >> >> >> >> > >> >u'op': u'CREATE'}, 17, {u'slave_isdir': False,
>> >> >> >> >> >u'gfid_mismatch':
>> >> >> >> >> >> > >True,
>> >> >> >> >> >> > >> >u'slave_name': None, u'slave_gfid':
>> >> >> >> >> >> > >> >u'ec4b0ace-2ec4-4ea5-adbc-9f519b81917c',
>> >> >> >u'name_mismatch':
>> >> >> >> >> >False,
>> >> >> >> >> >> > >> >u'dst':
>> >> >> >> >> >> > >> >False})
>> >> >> >> >> >> > >> >[2020-05-29 21:57:30.627893] I [master(worker
>> >> >> >> >> >> > >>
>> >> >> >> >>
>> >> >>
>>
>>>>>/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
>> >> >> >> >> >> > >> >_GMaster: Entry not present on master. Fixing
>gfid
>> >> >> >mismatch in
>> >> >> >> >> >> > >slave.
>> >> >> >> >> >> > >> >Deleting the entry    retry_count=1
>> >entry=({u'uid':
>> >> >> >108,
>> >> >> >> >> >u'gfid':
>> >> >> >> >> >> > >> >u'a4d52e40-2e2f-4885-be5f-65fe95a8ebd7', u'gid':
>> >117,
>> >> >> >u'mode':
>> >> >> >> >> >> > >33204,
>> >> >> >> >> >> > >> >u'entry':
>> >> >> >> >> >> > >>
>> >> >> >> >> >> >
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> >>
>>
>>
>>>>>>>u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/polycom_00a859f7xxxx.cfg',
>> >> >> >> >> >> > >> >u'op': u'CREATE'}, 17, {u'slave_isdir': False,
>> >> >> >> >> >u'gfid_mismatch':
>> >> >> >> >> >> > >True,
>> >> >> >> >> >> > >> >u'slave_name': None, u'slave_gfid':
>> >> >> >> >> >> > >> >u'ece8da77-b5ea-45a7-9af7-7d4d8f55f74a',
>> >> >> >u'name_mismatch':
>> >> >> >> >> >False,
>> >> >> >> >> >> > >> >u'dst':
>> >> >> >> >> >> > >> >False})
>> >> >> >> >> >> > >> >[2020-05-29 21:57:30.629532] I [master(worker
>> >> >> >> >> >> > >>
>> >> >> >> >>
>> >> >>
>>
>>>>>/nodirectwritedata/gluster/gvol0):813:fix_possible_entry_failures]
>> >> >> >> >> >> > >> >_GMaster: Entry not present on master. Fixing
>gfid
>> >> >> >mismatch in
>> >> >> >> >> >> > >slave.
>> >> >> >> >> >> > >> >Deleting the entry    retry_count=1
>> >entry=({u'uid':
>> >> >> >108,
>> >> >> >> >> >u'gfid':
>> >> >> >> >> >> > >> >u'3c525ad8-aeb2-46b6-9c41-7fb4987916f8', u'gid':
>> >117,
>> >> >> >u'mode':
>> >> >> >> >> >> > >33204,
>> >> >> >> >> >> > >> >u'entry':
>> >> >> >> >> >> > >>
>> >> >> >> >> >> >
>> >> >> >> >> >>
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> >>
>> >>
>>
>>
>>>>>>>u'.gfid/f857c42e-22f1-4ce4-8f2e-13bdadedde45/00a859f7xxxx-directory.xml',
>> >> >> >> >> >> > >> >u'op': u'CREATE'}, 17, {u'slave_isdir': False,
>> >> >> >> >> >u'gfid_mismatch':
>> >> >> >> >> >> > >True,
>> >> >> >> >> >> > >> >u'slave_name': None, u'slave_gfid':
>> >> >> >> >> >> > >> >u'06717b5a-d842-495d-bd25-aab9cd454490',
>> >> >> >u'name_mismatch':
>> >> >> >> >> >False,
>> >> >> >> >> >> > >> >u'dst':
>> >> >> >> >> >> > >> >False})
>> >> >> >> >> >> > >> >[2020-05-29 21:57:30.659123] I [master(worker
>> >> >> >> >> >> > >>
>> >> >> >>/nodirectwritedata/gluster/gvol0):942:handle_entry_failures]
>> >> >> >> >> >> > >_GMaster:
>> >> >> >> >> >> > >> >Sucessfully fixed entry ops with gfid mismatch
>> >> >> >> >> >retry_count=1
>> >> >> >> >> >> > >> >[2020-05-29 21:57:30.659343] I [master(worker
>> >> >> >> >> >> > >>
>> >>/nodirectwritedata/gluster/gvol0):1194:process_change]
>> >> >> >> >> >_GMaster:
>> >> >> >> >> >> > >Retry
>> >> >> >> >> >> > >> >original entries. count = 1
>> >> >> >> >> >> > >> >[2020-05-29 21:57:30.725810] I [master(worker
>> >> >> >> >> >> > >>
>> >>/nodirectwritedata/gluster/gvol0):1197:process_change]
>> >> >> >> >> >_GMaster:
>> >> >> >> >> >> > >> >Sucessfully fixed all entry ops with gfid
>mismatch
>> >> >> >> >> >> > >> >[2020-05-29 21:57:31.747319] I [master(worker
>> >> >> >> >> >> > >> >/nodirectwritedata/gluster/gvol0):1954:syncjob]
>> >> >Syncer:
>> >> >> >Sync
>> >> >> >> >> >Time
>> >> >> >> >> >> > >Taken
>> >> >> >> >> >> > >> >duration=0.7409 num_files=18    job=1
>> >return_code=0
>> >> >> >> >> >> > >> >
>> >> >> >> >> >> > >> >We've verified that the files like
>> >> >> >polycom_00a859f7xxxx.cfg
>> >> >> >> >> >referred
>> >> >> >> >> >> > >to
>> >> >> >> >> >> > >> >in
>> >> >> >> >> >> > >> >the error do exist on the master nodes and
>slave.
>> >> >> >> >> >> > >> >
>> >> >> >> >> >> > >> >We found this bug fix:
>> >> >> >> >> >> > >>
>>https://bugzilla.redhat.com/show_bug.cgi?id=1642865
>> >> >> >> >> >> > >> >
>> >> >> >> >> >> > >> >However that fix went in 5.1, and we're running
>> >5.12
>> >> >on
>> >> >> >the
>> >> >> >> >> >master
>> >> >> >> >> >> > >> >nodes
>> >> >> >> >> >> > >> >and slave. A couple of GlusterFS clients
>connected
>> >to
>> >> >the
>> >> >> >> >> >master
>> >> >> >> >> >> > >nodes
>> >> >> >> >> >> > >> >are
>> >> >> >> >> >> > >> >running 5.13.
>> >> >> >> >> >> > >> >
>> >> >> >> >> >> > >> >Would anyone have any suggestions? Thank you in
>> >> >advance.
>> >> >> >> >> >> > >>
>> >> >> >> >> >> > ________
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> > Community Meeting Calendar:
>> >> >> >> >> >> >
>> >> >> >> >> >> > Schedule -
>> >> >> >> >> >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> >> >> >> >> >> > Bridge: https://bluejeans.com/441850968
>> >> >> >> >> >> >
>> >> >> >> >> >> > Gluster-users mailing list
>> >> >> >> >> >> > Gluster-users@xxxxxxxxxxx
>> >> >> >> >> >> >
>> >https://lists.gluster.org/mailman/listinfo/gluster-users
>> >> >> >> >> >>
>> >> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > David Cunningham, Voisonics Limited
>> >> >> >> > http://voisonics.com/
>> >> >> >> > USA: +1 213 221 1092
>> >> >> >> > New Zealand: +64 (0)28 2558 3782
>> >> >> >> > ________
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > Community Meeting Calendar:
>> >> >> >> >
>> >> >> >> > Schedule -
>> >> >> >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> >> >> >> > Bridge: https://bluejeans.com/441850968
>> >> >> >> >
>> >> >> >> > Gluster-users mailing list
>> >> >> >> > Gluster-users@xxxxxxxxxxx
>> >> >> >> > https://lists.gluster.org/mailman/listinfo/gluster-users
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> sankarshan@xxxxxxxxx | TZ: UTC+0530 | +91 99606 03294
>> >> >> >> kadalu.io : Making it easy to provision storage in k8s!
>> >> >> >> ________
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> Community Meeting Calendar:
>> >> >> >>
>> >> >> >> Schedule -
>> >> >> >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> >> >> >> Bridge: https://bluejeans.com/441850968
>> >> >> >>
>> >> >> >> Gluster-users mailing list
>> >> >> >> Gluster-users@xxxxxxxxxxx
>> >> >> >> https://lists.gluster.org/mailman/listinfo/gluster-users
>> >> >> >>
>> >> >>
>> >>
>>
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users





[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux