I caught one of the nodes transitioning into faulty mode, log output is below.
In master nodes, look for log messages. Let us know if you feel any issue in log messages. (/var/log/glusterfs/geo-replication/)
When one of the nodes drops into "faulty", which happens periodically, this is the type of output that appears in the log:
[root@gfs-a-1 ~]# tail /usr/local/var/log/glusterfs/geo-replication/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares.log
[2015-05-05 09:22:58.140913] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/065c09f9-4502-4a2c-81fa-5e8fcaf22712 [errcode: 23]
[2015-05-05 09:22:58.152951] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/28a237a4-4346-48c5-bd1c-713273f591c7 [errcode: 23]
[2015-05-05 09:22:58.327603] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/5755db3e-e9d8-42d2-b415-890842b086ae [errcode: 23]
[2015-05-05 09:22:58.336714] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/0b7fc219-1e31-4e66-865f-5ae1c26d5e54 [errcode: 23]
[2015-05-05 09:22:58.360308] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/955cd0e4-dd06-4db6-9391-34dbf72c9b06 [errcode: 23]
[2015-05-05 09:22:58.367522] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/1d455725-c3e1-4111-92e5-335610d3f513 [errcode: 23]
[2015-05-05 09:22:58.368226] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/7ce881ae-3491-4e21-b38b-0a27fb620c74 [errcode: 23]
[2015-05-05 09:22:58.368959] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/056732c1-1537-4925-a30c-b905c110a5b2 [errcode: 23]
[2015-05-05 09:22:58.369635] W [master(/mnt/a-1-shares-brick-2/brick):250:regjob] <top>: Rsync: .gfid/8c58d6c5-9975-43c6-8f4c-2a92337f7350 [errcode: 23]
[2015-05-05 09:22:58.369790] W [master(/mnt/a-1-shares-brick-2/brick):877:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1430830891
When the node is in "active" mode, I get a lot of log output that resembles this:
[2015-05-05 09:23:54.735502] W [master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1430832227
[2015-05-05 09:23:55.449265] W [master(/mnt/a-1-shares-brick-3/brick):250:regjob] <top>: Rsync: .gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23]
[2015-05-05 09:23:55.449491] W [master(/mnt/a-1-shares-brick-3/brick):877:process] _GMaster: incomplete sync, retrying changelogs: XSYNC-CHANGELOG.1430832227
[2015-05-05 09:23:56.277033] W [master(/mnt/a-1-shares-brick-3/brick):250:regjob] <top>: Rsync: .gfid/0665be16-04e9-4cbe-a2c9-a633caa8c79d [errcode: 23]
[2015-05-05 09:23:56.277259] W [master(/mnt/a-1-shares-brick-3/brick):860:process] _GMaster: changelogs XSYNC-CHANGELOG.1430832227 could not be processed - moving on...
[2015-05-05 09:23:56.294038] W [master(/mnt/a-1-shares-brick-3/brick):862:process] _GMaster: SKIPPED GFID =
[2015-05-05 09:23:56.381592] I [master(/mnt/a-1-shares-brick-3/brick):1130:crawl] _GMaster: finished hybrid crawl syncing
[2015-05-05 09:24:24.404884] I [master(/mnt/a-1-shares-brick-4/brick):445:crawlwrap] _GMaster: 1 crawls, 1 turns
[2015-05-05 09:24:24.437452] I [master(/mnt/a-1-shares-brick-4/brick):1124:crawl] _GMaster: starting hybrid crawl...
[2015-05-05 09:24:24.588865] I [master(/mnt/a-1-shares-brick-1/brick):1133:crawl] _GMaster: processing xsync changelog /usr/local/var/run/gluster/shares/ssh%3A%2F%2Froot%4010.XX.XXX.X%3Agluster%3A%2F%2F127.0.0.1%3Abkpshares/9d9a72f468c582609e97e8929e58b9ff/xsync/XSYNC-CHANGELOG.1430832135
This begs a couple of questions for me:
- Are these errcode:23 issues files that have been deleted/renamed since the changelog was created?
- Is it correct/expected for the node to drop into faulty and then recover itself to active periodically?
Thank you again for your assistance!
Dave
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users