Data consistency with Gluster 3.5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have set up a replicated, four-node gluster config for a web farm. The 
idea is that each web node is its own server, and will have its own copy 
of the entire web root locally. It then serves the cluster to itself.  
We're running it over dual GigE NICs bonded.

The problem I am having is when we switch live traffic to nodes in the 
cluster, they almost immediately get out of sync. The issue seems to be 
with cache files that are read/written a lot. Here is an excerpt 
pointing to issues with our OpenX banner cache:

[2012-02-25 18:53:04.198326] E 
[afr-self-heal-common.c:2074:afr_self_heal_completion_cbk] 
0-web-pub-replicate-0: background  meta-data data missing-entry 
self-heal failed on 
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php
[2012-02-25 18:53:04.199191] W 
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php: 
gfid differs on subvolume 0 (53fa373a-3830-4c5e-aa22-6ed35c947d97, 
c12e0cdd-9b6c-4988-b793-819db0472780)
[2012-02-25 18:53:04.199210] W 
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php: 
gfid differs on subvolume 0 (53fa373a-3830-4c5e-aa22-6ed35c947d97, 
c12e0cdd-9b6c-4988-b793-819db0472780)
[2012-02-25 18:53:04.199219] W 
[afr-common.c:882:afr_detect_self_heal_by_iatt] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php: 
gfid different on subvolume
[2012-02-25 18:53:04.199236] I [afr-common.c:1038:afr_launch_self_heal] 
0-web-pub-replicate-0: background  meta-data data missing-entry 
self-heal triggered. path: 
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php
[2012-02-25 18:53:04.200752] W 
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php: 
gfid differs on subvolume 0 (53fa373a-3830-4c5e-aa22-6ed35c947d97, 
c12e0cdd-9b6c-4988-b793-819db0472780)
[2012-02-25 18:53:04.200971] I 
[afr-self-heal-common.c:963:afr_sh_missing_entries_done] 
0-web-pub-replicate-0: split brain found, aborting selfheal of 
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php
[2012-02-25 18:53:04.200986] E 
[afr-self-heal-common.c:2074:afr_self_heal_completion_cbk] 
0-web-pub-replicate-0: background  meta-data data missing-entry 
self-heal failed on 
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php
[2012-02-25 18:53:04.202159] W 
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php: 
gfid differs on subvolume 1 (375e1754-0420-4e26-9176-bb2128c6596b, 
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.202178] W 
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php: 
gfid differs on subvolume 1 (375e1754-0420-4e26-9176-bb2128c6596b, 
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.202188] W 
[afr-common.c:882:afr_detect_self_heal_by_iatt] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php: 
gfid different on subvolume
[2012-02-25 18:53:04.202204] I [afr-common.c:1038:afr_launch_self_heal] 
0-web-pub-replicate-0: background  meta-data data missing-entry 
self-heal triggered. path: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php
[2012-02-25 18:53:04.203463] W 
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php: 
gfid differs on subvolume 0 (375e1754-0420-4e26-9176-bb2128c6596b, 
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.203678] I 
[afr-self-heal-common.c:963:afr_sh_missing_entries_done] 
0-web-pub-replicate-0: split brain found, aborting selfheal of 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php
[2012-02-25 18:53:04.203693] E 
[afr-self-heal-common.c:2074:afr_self_heal_completion_cbk] 
0-web-pub-replicate-0: background  meta-data data missing-entry 
self-heal failed on 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php
[2012-02-25 18:53:04.204759] W 
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php: 
gfid differs on subvolume 0 (375e1754-0420-4e26-9176-bb2128c6596b, 
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.204781] W 
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php: 
gfid differs on subvolume 0 (375e1754-0420-4e26-9176-bb2128c6596b, 
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.204800] W 
[afr-common.c:882:afr_detect_self_heal_by_iatt] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php: 
gfid different on subvolume
[2012-02-25 18:53:04.204818] I [afr-common.c:1038:afr_launch_self_heal] 
0-web-pub-replicate-0: background  meta-data data missing-entry 
self-heal triggered. path: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php
[2012-02-25 18:53:04.206150] W 
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php: 
gfid differs on subvolume 0 (375e1754-0420-4e26-9176-bb2128c6596b, 
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.206384] I 
[afr-self-heal-common.c:963:afr_sh_missing_entries_done] 
0-web-pub-replicate-0: split brain found, aborting selfheal of 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php
[2012-02-25 18:53:04.206400] E 
[afr-self-heal-common.c:2074:afr_self_heal_completion_cbk] 
0-web-pub-replicate-0: background  meta-data data missing-entry 
self-heal failed on 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php
[2012-02-25 18:53:04.207725] W 
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php: 
gfid differs on subvolume 0 (375e1754-0420-4e26-9176-bb2128c6596b, 
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.207746] W 
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php: 
gfid differs on subvolume 0 (375e1754-0420-4e26-9176-bb2128c6596b, 
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.207756] W 
[afr-common.c:882:afr_detect_self_heal_by_iatt] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php: 
gfid different on subvolume
[2012-02-25 18:53:04.207772] I [afr-common.c:1038:afr_launch_self_heal] 
0-web-pub-replicate-0: background  meta-data data missing-entry 
self-heal triggered. path: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php
[2012-02-25 18:53:04.209217] W 
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0: 
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php: 
gfid differs on subvolume 0 (375e1754-0420-4e26-9176-bb2128c6596b, 
3e9eca35-3351-450e-b8ab-c62785968953)

Nodes and network are fine. I have tried mounting the volumes using both 
the Gluster native client and with the Gluster NFS client but get the 
same results. It's killing performance.

Here is the config:

   1: volume web-pub-client-0
   2:     type protocol/client
   3:     option remote-host web-web1
   4:     option remote-subvolume /glusterfs/pub
   5:     option transport-type tcp
   6: end-volume
   7:
   8: volume web-pub-client-1
   9:     type protocol/client
  10:     option remote-host web-web2
  11:     option remote-subvolume /glusterfs/pub
  12:     option transport-type tcp
  13: end-volume
  14:
  15: volume web-pub-client-2
  16:     type protocol/client
  17:     option remote-host web-web3
  18:     option remote-subvolume /glusterfs/pub
  19:     option transport-type tcp
  20: end-volume
  21:
  22: volume web-pub-client-3
  23:     type protocol/client
  24:     option remote-host web-web4
  25:     option remote-subvolume /glusterfs/pub
  26:     option transport-type tcp
  27: end-volume
  28:
  29: volume web-pub-replicate-0
  30:     type cluster/replicate
  31:     subvolumes web-pub-client-0 web-pub-client-1 web-pub-client-2 
web-pub-client-3
  32: end-volume
  33:
  34: volume web-pub-write-behind
  35:     type performance/write-behind
  36:     subvolumes web-pub-replicate-0
  37: end-volume
  38:
  39: volume web-pub-read-ahead
  40:     type performance/read-ahead
  41:     subvolumes web-pub-write-behind
  42: end-volume
  43:
  44: volume web-pub-io-cache
  45:     type performance/io-cache
  46:     option cache-size 256MB
  47:     subvolumes web-pub-read-ahead
  48: end-volume
  49:
  50: volume web-pub-quick-read
  51:     type performance/quick-read
  52:     option cache-size 256MB
  53:     subvolumes web-pub-io-cache
  54: end-volume
  55:
  56: volume web-pub
  57:     type debug/io-stats
  58:     option latency-measurement off
  59:     option count-fop-hits off
  60:     subvolumes web-pub-quick-read
  61: end-volume
  62:
  63: volume nfs-server
  64:     type nfs/server
  65:     option nfs.dynamic-volumes on
  66:     option rpc-auth.addr.web-pub.allow *
  67:     option nfs3.web-pub.volume-id ac556d2e-e8a9-4857-bd17-cab603820fcb
  68:     subvolumes web-pub
  69: end-volume


Any ideas or help would be greatly appreciated.

sean

-- 
Sean Fulton
GCN Publishing, Inc.
Internet Design, Development and Consulting For Today's Media Companies
http://www.gcnpublishing.com
(203) 665-6211, x203




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux