Thanks for you input, Anirban.
I ran the commands on both servers, with the following results:
root@web3:/var/www/site-images# time getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png
real 0m34.524s
user 0m0.004s
sys 0m0.000s
root@web4:/var/www/site-images# time getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png
getfattr: templates/assets/prod/temporary/13/user_1339200.png: Input/output error
real 0m11.315s
user 0m0.001s
sys 0m0.003s
root@web4:/var/www/site-images# ls templates/assets/prod/temporary/13/user_1339200.png
ls: cannot access templates/assets/prod/temporary/13/user_1339200.png: Input/output error
Not sure if it elucidate the issue..
Also, I saw at /var/log/gluster.log a zillion entries like these:
[2015-01-26 17:35:39.973268] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/9616964 (00000000-0000-0000-0000-000000000000)
[2015-01-26 17:35:39.973435] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/9594915 (00000000-0000-0000-0000-000000000000)
[2015-01-26 17:35:39.973571] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/9681971 (00000000-0000-0000-0000-000000000000)
[2015-01-26 17:35:39.973686] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/19615 (00000000-0000-0000-0000-000000000000)
[2015-01-26 17:35:39.973802] W [client-rpc-fops.c:2779:client3_3_lookup_cbk] 0-site-images-client-1: remote operation failed: Transport endpoint is not connected. Path: /templates/apache/template/prod/facebook/130392 (00000000-0000-0000-0000-000000000000)
I have talked with some guys at #gluster that pointed it could be network issues. I'm still looking into it, but since the issue also happens locally (within the same server), would that still be a valid point?
Also, less often, I see entries like these:
[2015-01-26 17:41:25.956418] E [afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk] 0-site-images-replicate-0: Conflicting entries for /webhost/sites/clipart/assets/apache/images/graphics/215126/image1.png
[2015-01-26 17:41:26.588753] E [afr-self-heal-common.c:1615:afr_sh_common_lookup_cbk] 0-site-images-replicate-0: Conflicting entries for /webhost/sites/clipart/assets/apache/images/graphics/215126/image1.png
Are those a definitive indication of a split-brain? Or just something usual until self-heal takes care of recently updated files?
On Mon, Jan 26, 2015 at 2:25 PM, A Ghoshal <a.ghoshal@xxxxxxx> wrote:
I am plagued with something of this sort, too!
What I mostly see when I explore these things is that
A) it's a split-brain.
B) the split-brain is because the gfid's on the two replicas are at odds.
You could check that out by
1. On each server, first 'cd' to where your brick is mounted.
2. getfattr -m . -d -e hex templates/assets/prod/temporary/13/user_1339200.png
You will see a trusted.gfid kind of extended attribute. If it's not the same on both servers, there's a problem.
Thanks,
Anirban
Regards,
-- Tiago Santos
MustHaveMenus.com
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users