Hi, Looks like GFID conflict in Slave. (Same filename with different GFID exists in Slave undeleted may be due to unlink failure or any other failure) Need to identify the cause for GFID conflict. Please share the workload details or share the changelogs from brick backend(/data/media/.glusterfs/changelogs) "ENTRY FAILED" shows file exists error but shows different GFID [2015-11-20 11:40:14.93090] W [master(/data/media):803:log_failures] _GMaster: ENTRY FAILED: ({'uid': 33, 'gfid': '31d66429-c700-4a10-bb32-35e1b36a479f', 'gid': 33, 'mode': 33206, 'entry': '.gfid/b1dc6c6d-dac7-4da9-9577-4614942a72a0/official-nightmare-before-christmas-vampire-teddy-girls-dress-body-web.jpg', 'op': 'CREATE'}, 17, 'df0e67f5-f2ce-45c3-b4f1-224aa3059ec7') Also looks like Split brain issues in Slave. Refer this document to resolve Split brain issues in Slave.https://github.com/gluster/glusterfs-specs/blob/master/done/Features/heal-info-and-split-brain-resolution.md regards Aravinda On 11/25/2015 03:08 AM, Audrius
Butkevicius wrote:
So the version of rsync is 3.1.0, but the bug mentioned only applies to large files, where as in my case the files are less than a MB. I've started digging through the logs and found a bunch of these on the slave: [2015-11-20 11:40:46.730805] W [fuse-bridge.c:1978:fuse_create_cbk] 0-glusterfs-fuse: 1882288: /.gfid/31d66429-c700-4a10-bb32-35e1b36a479f => -1 (Operation not permitted) [2015-11-20 12:39:59.269844] W [fuse-bridge.c:1978:fuse_create_cbk] 0-glusterfs-fuse: 1918306: /.gfid/6802a0c6-1f62-4213-a70d-7b46d9ff8f3a => -1 (Operation not permitted) So something funky was happening for an hour 4 days ago. Given the volume is on EBS, maybe there was some glitch there. I can also find the corresponding failures on the master: [2015-11-20 11:40:14.93090] W [master(/data/media):803:log_failures] _GMaster: ENTRY FAILED: ({'uid': 33, 'gfid': '31d66429-c700-4a10-bb32-35e1b36a479f', 'gid': 33, 'mode': 33206, 'entry': '.gfid/b1dc6c6d-dac7-4da9-9577-4614942a72a0/official-nightmare-before-christmas-vampire-teddy-girls-dress-body-web.jpg', 'op': 'CREATE'}, 17, 'df0e67f5-f2ce-45c3-b4f1-224aa3059ec7') [2015-11-20 11:40:14.265054] W [master(/data/media):803:log_failures] _GMaster: META FAILED: ({'go': '.gfid/31d66429-c700-4a10-bb32-35e1b36a479f', 'stat': {'atime': 1448019600.232466, 'gid': 33, 'mtime': 1448019600.316466, 'mode': 33279, 'uid': 33}, 'op': 'META'}, 2) If I grep for SKIPPED GFID I get the following: [2015-11-20 11:40:40.704817] W [master(/data/media):1014:process] _GMaster: SKIPPED GFID = 192632af-28c5-4e03-a62d-458fe7f3b5f9,7ea8d7a8-524b-4dd0-b97a-dc7d3481f341,204f6112-0e8d-4f6d-855b-bf10f9c63b62,7e626e8f-edad-4f39-a6c6-547a1da34aa1,1f0d0208-1962-4eb1-91d4-cf7ed297d8e3,95d389c4-3258-4ca0-8fc4-26b8427b1eaf,425cedc6-6343-4326-8540-996d2d56dc9c,5955928b-2b8f-4cc9-a336-3eac4382789b,8932efcd-ba90-46ec-84c8-5e9e51cc84e9,2530275d-5f03-4143-9abf-d07cc79bf80a,73574466-86f3-4ab2-b5da-c31ac28c27c1,776e5e8f-5c6a-46b1-ad54-733e157d2097,008a69f3-217c-4dbc-a469-5a5bc8ecd589,dca8d8d9-03cf-4793-92e4-bfcfddd262f6,c85b7a29-73af-4f44-a07e-a44082d7a93a,6c1f56d6-4ea6-4910-9677-ea33edd35d28,0ea56588-87fa-4355-9403-e311525454fc,c8ce76c9-e21d-46ce-a2b5-14dfd0070f64,db9e6484-0e5e-4f6e-815b-3c2b273deee5,35d10752-43b5-4398-be5f-17cb9de73a6b,396e5faf-74a1-4849-97e3-009dbfb22836,d148e7d5-c2f3-4d06-8cd6-8588e6aac196,404d20c5-1c6c-4aad-98be-2c23930173b3,f1fae11c-db8e-4cd5-8e47-a3870316f89c,d8daa413-e57f-44fb-b907-b1a497f2dcfa,5f6ee8c2-84fb-432e-95cd-e428ab256e83,6bf54dcd-c3b4-4187-a390-eca! 841e46570, 335c07ca-d339-4d3a-aa88-3b5753d24fbf,8fdbac00-6628-4f22-8fb4-b7a6524cae49,31d66429-c700-4a10-bb32-35e1b36a479f [2015-11-20 11:41:35.907850] W [master(/data/media):1014:process] _GMaster: SKIPPED GFID = 03069c7f-8eaa-45b0-92ed-50cb648cd912,788f5ed1-923e-4b86-9696-2a6de07ebb2e,43d12b40-b6e2-43c4-8883-85e89dc81321 [2015-11-20 12:11:55.492068] W [master(/data/media):1014:process] _GMaster: SKIPPED GFID = eb02369f-7ca8-480a-b00c-768964410ed8,17045ac9-27dd-4bf9-9f90-d7b146070dd5,265e3d9c-1657-45cb-bbf6-db439eb18ccf,553c420f-b3cc-47f2-8d5f-cfc2ffdd1a92 [2015-11-20 12:12:53.372432] W [master(/data/media):1014:process] _GMaster: SKIPPED GFID = 66c5878e-8c00-4f7d-a3ad-4adec84a5e22,f4dc086d-9c2b-449c-9e31-bbae9ebcdea7,f99317b2-72e8-49e3-b676-647abad508b1 [2015-11-20 12:37:55.773813] W [master(/data/media):1014:process] _GMaster: SKIPPED GFID = 4af54f1c-e8e1-4915-9328-a458d5d35d5d,acbe1f12-87e8-4192-b864-d90030269bba,7d27a795-da63-4742-9e91-abd8fa543612,8d4e642d-fd40-44d6-8419-8d3459df7ce3 [2015-11-20 12:39:28.852575] W [master(/data/media):1014:process] _GMaster: SKIPPED GFID = d90dc121-02e7-4a79-bc03-1bd8fddd9f48,54bb563f-ab44-4e91-a46b-764a122ce7fa,088141de-7545-40f9-b776-751738a89740,2dab3faf-4a6c-407a-88cd-cddef6f55299,d887806f-23b4-4389-a4dc-f9027702a2df,fc5a9bc8-ea62-4677-baed-16510541373a,33136ad2-c5b4-448c-991d-1e72fefef021,cf3e2675-e41b-4782-9478-91773eb0a4aa,6412d878-e0f1-4700-84df-05f4af35962f,ec3cf6e1-7f27-4650-b978-8a5a7f620389,d3651bb9-cd2d-4c5f-93e6-fe4fb1cdf5db,ecb0415e-1524-40f4-870e-1fd0f8371b1d,a118aaae-bd3e-4b19-a0e0-891aa9edb09a,7642d3f3-f1e5-4aca-bcfe-bdb3c44779a9,2e29f3f8-c460-48eb-9db5-b281b67cc2bf,e61db54b-3979-488a-8789-a5d0615c5a97,4212d840-9c22-4d9e-b61b-5e35271dfe80,dad1c60b-9da6-4e57-b014-daa1aca73ce3,93699a3d-40b8-4bbd-b78f-aabf965df57f,4fad7468-91f2-4deb-aaf7-6401068c9e6d,c9738295-46cc-4fe7-b359-dc94f5815ce9,91853c5c-4877-4c9e-9481-c86368942f78,59deed8e-d3d0-4ab7-854e-53a8dd455de0,20b86c13-7df1-4d13-bac1-7d628a00d6ce,b7b86a2d-7963-41a4-a423-14e25d1e78c4,3c17d7fe-bb7f-489c-a525-5c8b7bb93c3e,e230d207-7c68-4983-a958-f2d! cfc1ce694, fa8bf3c0-abae-446c-83c5-45ef8bcaa4b8,14089102-8106-45d9-a3f1-d1446b568f4e,6802a0c6-1f62-4213-a70d-7b46d9ff8f3a,0a253bbc-ef98-4da0-951f-e17c5a7f5858,ef054b76-986b-4a89-b8e6-b4988221aaa2,48c0a153-708c-44ee-b186-cf255936a02b,fa2646a6-807c-4e9d-8f2b-a9cdf2674e0c,1ed4a563-4f6a-4b5a-9866-89025fe7afd5,0f293cf7-bc32-4f8a-87d5-388a4bffb4af,f4126726-667b-451d-8214-a18bb3f468cd,e23dc8b3-da1c-4d18-aec9-22e0aa174d81,40b9f10d-7304-4c0b-8498-bef23b305d03,15c25d1e-2a62-495e-887f-14d0cb0527b1,67371804-9084-4801-b664-44e88bea8ac3,4750fa3f-d1a4-4472-b10d-3f75d0b451dc [2015-11-23 09:18:10.43391] W [master(/data/media):1014:process] _GMaster: SKIPPED GFID = 228843f3-62f0-4687-b5eb-6d1e21257ad0,b0078359-fbf0-4709-8f40-8383a11d7875,60cff4d5-8b5d-4f7f-8bc1-27081a011458,bedb6ac4-208d-47e1-812c-5547c84ab841,da6810d9-4883-45e1-b73e-55a7ff17b5e7,e03b5c03-b25c-49ba-86f0-8a709a9c2658,053673a0-c1cc-4057-83fa-f97740cb5d4f,dbd6ea84-8f24-4a47-ac41-22c3fd788ecf,43caa3e7-ca04-47ab-b950-105606b313a4,62d8b1d0-fc89-4fb1-a41a-957dcb34d325,4e8fe1fa-60cd-47fa-bad6-f617c312f53b,6c3d6cf3-62ae-4ab8-9dc3-7815552401fe,f79be814-7e78-4985-bcdd-688da23d1808,c4186455-0f06-4b5d-89be-3c5ccbdeb6f0,f9c4ccdb-2337-479d-845d-ee4d85b69ece,bcd14726-1bab-4d97-8915-ec8bbe8faf8c,cca82341-a430-4a59-a900-1af66dcf7bb8,b7043a8e-4286-4831-91ec-c146e40bc6be,995ffeb6-a906-4078-88c6-404a2b38aad4,227f9987-5057-4133-848a-2b22aca5dde1,90b35242-32db-4570-8070-cf9dd49322a5,c6863c8f-1914-4a2d-814b-6e5853134faf,e2d19b1a-fc07-441c-b110-ca816b46fc40,9a3d0c0b-7d84-416f-9f3e-21b32a11ba1d,d8163f6b-8c40-418c-9c06-b3743af24e4e,522d7247-a75b-4af9-acb2-52a99eeced89,4b56ea9d-413a-4e24-b44e-433! f7603ad6d There are also the following lines on the master, which might have some impact: E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done] 0-media-replicate-0: Failing READ on gfid abdc7d5e-9187-4916-ae83-a8b615e32a17: split-brain observed. [Input/output error] E [MSGID: 108008] [afr-read-txn.c:89:afr_read_txn_refresh_done] 0-media-replicate-0: Failing GETXATTR on gfid abdc7d5e-9187-4916-ae83-a8b615e32a17: split-brain observed. [Input/output error] E [mem-pool.c:417:mem_get0] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x809a2) [0x7f79e436b9a2] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg+0x79f) [0x7f79e430cb1f] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get0+0x81) [0x7f79e433e4a1] ) 0-mem-pool: invalid argument [Invalid argument] E [mem-pool.c:417:mem_get0] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(recursive_rmdir+0x192) [0x7f79e4329b32] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_msg+0x79f) [0x7f79e430cb1f] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(mem_get0+0x81) [0x7f79e433e4a1] ) 0-mem-pool: invalid argument [Invalid argument] E [resource(/data/media):222:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-dpY5cI/8216bb7da58a00926f369bb7ac8c7e03.sock root@xxxxxxxxxxxxxxxxxxxxxxxxxx /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd --session-owner 6922055e-49a1-4afd-a3a0-a47960d6ba54 -N --listen --timeout 120 gluster://localhost:media" returned with 143, saying: E [resource(/data/media):226:logerr] Popen: ssh> [2015-11-18 21:57:19.772896] I [cli.c:721:main] 0-cli: Started running /usr/sbin/gluster with version 3.7.5 E [resource(/data/media):226:logerr] Popen: ssh> [2015-11-18 21:57:19.772955] I [cli.c:608:cli_rpc_init] 0-cli: Connecting to remote glusterd at localhost E [resource(/data/media):226:logerr] Popen: ssh> [2015-11-18 21:57:19.871930] I [MSGID: 101190] [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 E [resource(/data/media):226:logerr] Popen: ssh> [2015-11-18 21:57:19.872018] I [socket.c:2355:socket_event_handler] 0-transport: disconnecting now E [resource(/data/media):226:logerr] Popen: ssh> [2015-11-18 21:57:19.872898] I [cli-rpc-ops.c:6348:gf_cli_getwd_cbk] 0-cli: Received resp to getwd E [resource(/data/media):226:logerr] Popen: ssh> [2015-11-18 21:57:19.872963] I [input.c:36:cli_batch] 0-: Exiting with: 0 Status detail shows the following: root@eu-gluster-1:/var/log/glusterfs/geo-replication/media# gluster volume geo-replication media root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx::media status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- eu-gluster-1.websitewebsitewebs.com media /data/media root us-west-gluster.websitewebsitewebs.com::media us-west-gluster.websitewebsitewebs.com Active Changelog Crawl 2015-11-24 20:59:25 0 0 0 633 N/A N/A N/A eu-gluster-2.websitewebsitewebs.com media /data/media root us-west-gluster.websitewebsitewebs.com::media us-west-gluster.websitewebsitewebs.com Passive N/A N/A N/A N/A N/A N/A N/A N/A N/A What is the right way to retry failed items? Can I get a list of them somehow so that I could touch them in hopes to fix this? I wonder why does it not retry the items automatically? On Tue, Nov 24, 2015 at 6:11 AM, Venky Shankar <vshankar@xxxxxxxxxx> wrote:On Tue, Nov 24, 2015 at 1:23 AM, Audrius Butkevicius <audrius.butkevicius@xxxxxxxxx> wrote:Hi, I've got a geo-replicated gluster volume, with a few hundred thousand images, which get generated on demand. I started getting replication failures in the status detail view, butit'snot obvious to me where to find the actual errors or how to actually fix them.Chris here[1] mentioned about a bug in rsync (thanks!). Could that be the issue here? Mind checking rsync version used? [1]: http://www.gluster.org/pipermail/gluster-users/2015-November/024423.htmlThe docs seem to be secretive about this as well. It seems if I tear the geo-replication down, and do a force create from scratch, it goes back in sync again, but as the files get generated, it starts getting failuresagainat some point. Can someone provide me with information on how to check which files are causing failures, and what are the actual failures? Or point me to the relevant part in the docs? Version 3.7.5-ubuntu1~trusty1 Related SO question:http://stackoverflow.com/questions/33839056/gluster-geo-replication-debugging-failuresThanks, Audrius. _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users |
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users