what action is required for this log entry?

"Khoi Mai" <KHOIMAI@xxxxxx> · Tue, 10 Dec 2013 22:49:40 -0600

Gluster community,

[2013-12-11 04:40:06.609091] W [server-resolve.c:419:resolve_anonfd_simple]
0-server: inode for the gfid (76240621-1362-494d-a70a-f5824c3ce56e) is
not found. anonymous fd creation failed
[2013-12-11 04:40:06.610588] W [server-resolve.c:419:resolve_anonfd_simple]
0-server: inode for the gfid (03ada1a2-ee51-4c85-a79f-a72aabde116d) is
not found. anonymous fd creation failed
[2013-12-11 04:40:06.616978] W [server-resolve.c:419:resolve_anonfd_simple]
0-server: inode for the gfid (64fbc834-e00b-4afd-800e-97d64a32de92) is
not found. anonymous fd creation failed
[2013-12-11 04:40:06.617069] W [server-resolve.c:419:resolve_anonfd_simple]
0-server: inode for the gfid (64fbc834-e00b-4afd-800e-97d64a32de92) is
not found. anonymous fd creation failed
[2013-12-11 04:40:06.624845] W [server-resolve.c:419:resolve_anonfd_simple]
0-server: inode for the gfid (27837527-5dea-4367-a050-248a6266b2db) is
not found. anonymous fd creation failed
followed by 
[2013-12-11 04:40:10.462202] W [marker-quota.c:2039:mq_inspect_directory_xattr]
0-devstatic-marker: cannot add a new contribution node
[2013-12-11 04:40:29.331476] W [marker-quota.c:2039:mq_inspect_directory_xattr]
0-devstatic-marker: cannot add a new contribution node
[2013-12-11 04:40:53.125088] W [marker-quota.c:2039:mq_inspect_directory_xattr]
0-devstatic-marker: cannot add a new contribution node
[2013-12-11 04:41:00.975222] W [marker-quota.c:2039:mq_inspect_directory_xattr]
0-devstatic-marker: cannot add a new contribution node
[2013-12-11 04:41:01.517990] W [marker-quota.c:2039:mq_inspect_directory_xattr]
0-devstatic-marker: cannot add a new contribution node
Tue Dec 10 22:41:01 CST 2013
[2013-12-11 04:41:05.874819] W [marker-quota.c:2039:mq_inspect_directory_xattr]
0-devstatic-marker: cannot add a new contribution node
[2013-12-11 04:41:05.878135] W [marker-quota.c:2039:mq_inspect_directory_xattr]
0-devstatic-marker: cannot add a new contribution node
Tue Dec 10 22:42:01 CST 2013
[2013-12-11 04:42:05.136054] W [marker-quota.c:2039:mq_inspect_directory_xattr]
0-devstatic-marker: cannot add a new contribution node
[2013-12-11 04:42:05.330591] W [marker-quota.c:2039:mq_inspect_directory_xattr]
0-devstatic-marker: cannot add a new contribution node
[2013-12-11 04:42:41.224927] W [marker-quota.c:2039:mq_inspect_directory_xattr]
0-devstatic-marker: cannot add a new contribution node

Please help me understand what is being
logged from the /var/log/glusterfs/bricks/static-content.log file

Here is my config for this particular
brick in a 4 node distr/rep design.

cat /var/lib/glusterd/vols/devstatic/devstatic.host2.static-content.vol

volume devstatic-posix
    type storage/posix
    option volume-id 75832afb-f20e-4018-8d74-8550a92233fc
    option directory /static/content
end-volume

volume devstatic-access-control
    type features/access-control
    subvolumes devstatic-posix
end-volume

volume devstatic-locks
    type features/locks
    subvolumes devstatic-access-control
end-volume

volume devstatic-io-threads
    type performance/io-threads
    subvolumes devstatic-locks
end-volume

volume devstatic-index
    type features/index
    option index-base /static/content/.glusterfs/indices
    subvolumes devstatic-io-threads
end-volume
volume devstatic-marker
    type features/marker
    option quota on
    option xtime off
    option timestamp-file
/var/lib/glusterd/vols/devstatic/marker.tstamp
    option volume-uuid 75832afb-f20e-4018-8d74-8550a92233fc
    subvolumes devstatic-index
end-volume

volume /static/content
    type debug/io-stats
    option count-fop-hits
off
    option latency-measurement
off
    subvolumes devstatic-marker
end-volume

volume devstatic-server
    type protocol/server
    option auth.addr./static/content.allow
*
    option auth.login.6173ce00-d694-4793-a755-cd1d80f5001f.password
13702989-510c-44c1-9bc4-8f1f21b65403
    option auth.login./static/content.allow
6173ce00-d694-4793-a755-cd1d80f5001f
    option transport-type
tcp
    subvolumes /static/content
end-volume

Khoi Mai

From:      
 gluster-users-request@xxxxxxxxxxx
To:      
 gluster-users@xxxxxxxxxxx
Date:      
 12/10/2013 05:58 AM
Subject:    
   Gluster-users
Digest, Vol 68, Issue 11
Sent by:    
   gluster-users-bounces@xxxxxxxxxxx

Send Gluster-users mailing list submissions to

gluster-users@xxxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit

http://supercolony.gluster.org/mailman/listinfo/gluster-users
or, via email, send a message with subject or body 'help' to

gluster-users-request@xxxxxxxxxxx

You can reach the person managing the list at

gluster-users-owner@xxxxxxxxxxx

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Gluster-users digest..."

Today's Topics:

   1. Re: Testing failover and recovery (Per Hallsmark)
   2. Gluster - replica - Unable to self-heal contents of    
            '/'
      (possible split-brain) (Alexandru Coseru)
   3. Gluster infrastructure question (Heiko Kr?mer)
   4. Re: How reliable is XFS under Gluster? (Kal Black)
   5. Re: Gluster infrastructure question (Nux!)
   6. Scalability - File system or Object Store (Randy Breunling)
   7. Re: Scalability - File system or Object Store (Jay Vyas)
   8. Re: Gluster infrastructure question (Joe Julian)
   9. Re: [Gluster-devel] GlusterFest Test Weekend - 3.5    
            Test #1
      (John Mark Walker)
  10. Re: Gluster infrastructure question (Nux!)
  11. compatibility between 3.3 and 3.4 (samuel)
  12. Re: Gluster infrastructure question (bernhard glomm)
  13. Re: Gluster infrastructure question (Ben Turner)
  14. Re: Gluster infrastructure question (Ben Turner)
  15. Re: Scalability - File system or Object Store (Jeff Darcy)
  16. Re: Gluster infrastructure question (Dan Mons)
  17. Re: Gluster infrastructure question (Joe Julian)
  18. Re: Gluster infrastructure question (Dan Mons)
  19. Re: [CentOS 6] Upgrade to the glusterfs version in base or in
      glusterfs-epel (Diep Pham Van)
  20. Where does the 'date' string in        
        '/var/log/glusterfs/gl.log'
      come from? (harry mangalam)
  21. Re: Where does the 'date' string in
      '/var/log/glusterfs/gl.log' come from? (Sharuzzaman
Ahmat Raslan)
  22. FW: Self Heal Issue GlusterFS 3.3.1 (Bobby Jacob)
  23. Re: Self Heal Issue GlusterFS 3.3.1 (Joe Julian)
  24. Pausing rebalance (Franco Broi)
  25. Re: Where does the 'date' string in
      '/var/log/glusterfs/gl.log' come from? (Vijay Bellur)
  26. Re: Pausing rebalance (shishir gowda)
  27. Re: replace-brick failing - transport.address-family not
      specified (Vijay Bellur)
  28. Re: [CentOS 6] Upgrade to the glusterfs version in base or in
      glusterfs-epel (Vijay Bellur)
  29. Re: Pausing rebalance (Franco Broi)
  30. Re: replace-brick failing - transport.address-family not
      specified (Vijay Bellur)
  31. Re: Pausing rebalance (Kaushal M)
  32. Re: Pausing rebalance (Franco Broi)
  33. Re: Self Heal Issue GlusterFS 3.3.1 (Bobby Jacob)
  34. Structure needs cleaning on some files (Johan Huysmans)
  35. Re: replace-brick failing -        
        transport.address-family not
      specified (Bernhard Glomm)
  36. Re: Structure needs cleaning on some files (Johan Huysmans)
  37. Re: Gluster infrastructure question (Heiko Kr?mer)
  38. Re: Errors from PHP stat() on files and directories in a
      glusterfs mount (Johan Huysmans)
  39. Re: Gluster infrastructure question (Andrew Lau)
  40. Re: replace-brick failing - transport.address-family not
      specified (Vijay Bellur)
  41. Re: Gluster - replica - Unable to self-heal contents of '/'
      (possible split-brain) (Vijay Bellur)
  42. Error after crash of Virtual Machine during      
          migration
      (Mariusz Sobisiak)
  43. Re: Structure needs cleaning on some files (Johan Huysmans)

----------------------------------------------------------------------

Message: 1
Date: Mon, 9 Dec 2013 14:12:22 +0100
From: Per Hallsmark <per@xxxxxxxxxxxx>
To: gluster-users@xxxxxxxxxxx
Subject: Re:  Testing failover and recovery
Message-ID:

<CAPaVuL-DL8R3GBNzv9fMJq-rTOYCs=NufTf-B5V7xKpoNML+7Q@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"

Hello,

Interesting, we seems to be several users with issues regarding recovery
but there is no to little replies... ;-)

I did some more testing over the weekend. Same initial workload (two
glusterfs servers, one client that continuesly
updates a file with timestamps) and then two easy testcases:

1. one of the glusterfs servers is constantly rebooting (just a initscript
that sleeps for 60 seconds before issuing "reboot")

2. similar to 1 but instead of rebooting itself, it is rebooting the other
glusterfs server so that the result is that they a server
    comes up, wait for a bit and then rebooting the other server.

During the whole weekend this has progressed nicely. The client is running
all the time without issues and the glusterfs
that comes back (either only one or one of the servers, depending on the
testcase shown above) is actively getting into
sync and updates it's copy of the file.

So it seems to me that we need to look deeper in the recovery case (of
course, but it is interesting to know about the
nice&easy usescases as well). I'm surprised that the recovery from
a
failover (to restore the rendundancy) isn't getting
higher attention here. Are we (and others that has difficulties in this
area) running a unusual usecase?

BR,
Per

On Wed, Dec 4, 2013 at 12:17 PM, Per Hallsmark <per@xxxxxxxxxxxx>
wrote:

> Hello,
>
> I've found GlusterFS to be an interesting project. Not so much experience
> of it
> (although from similar usecases with DRBD+NFS setups) so I setup some
> testcase to try out failover and recovery.
>
> For this I have a setup with two glusterfs servers (each is a VM)
and one
> client (also a VM).
> I'm using GlusterFS 3.4 btw.
>
> The servers manages a gluster volume created as:
>
> gluster volume create testvol rep 2 transport tcp gs1:/export/vda1/brick
> gs2:/export/vda1/brick
> gluster volume start testvol
> gluster volume set testvol network.ping-timeout 5
>
> Then the client mounts this volume as:
> mount -t glusterfs gs1:/testvol /import/testvol
>
> Everything seems to work good in normal usecases, I can write/read
to the
> volume, take servers down and up again etc.
>
> As a fault scenario, I'm testing a fault injection like this:
>
> 1. continuesly writing timestamps to a file on the volume from the
client.
> It is automated in a smaller testscript like:
> :~/glusterfs-test$ cat scripts/test-gfs-client.sh
> #!/bin/sh
>
> gfs=/import/testvol
>
> while true; do
> date +%s >> $gfs/timestamp.txt
> ts=`tail -1 $gfs/timestamp.txt`
>  md5sum=`md5sum $gfs/timestamp.txt | cut -f1 -d" "`
> echo "Timestamp = $ts, md5sum = $md5sum"
>  sleep 1
> done
> :~/glusterfs-test$
>
> As can be seen, the client is a quite simple user of the glusterfs
volume.
> Low datarate and single user for example.
>
>
> 2. disabling ethernet in one of the VM (ifconfig eth0 down) to simulate
> like a broken network
>
> 3. After a short while, the failed server is brought alive again (ifconfig
> eth0 up)
>
> Step 2 and 3 is also automated in a testscript like:
>
> :~/glusterfs-test$ cat scripts/fault-injection.sh
> #!/bin/sh
>
> # fault injection script tailored for two glusterfs nodes named gs1
and gs2
>
> if [ "$HOSTNAME" == "gs1" ]; then
> peer="gs2"
> else
> peer="gs1"
> fi
>
> inject_eth_fault() {
> echo "network down..."
> ifconfig eth0 down
>  sleep 10
> ifconfig eth0 up
> echo "... and network up again."
> }
>
> recover() {
> echo "recovering from fault..."
> service glusterd restart
> }
>
> while true; do
> sleep 60
> if [ ! -f /tmp/nofault ]; then
> if ping -c 1 $peer; then
>  inject_eth_fault
> recover
> fi
> fi
> done
> :~/glusterfs-test$
>
>
> I then see that:
>
> A. This goes well first time, one server leaves the cluster and the
client
> hang for like 8 seconds before beeing able to write to the volume
again.
>
> B. When the failed server comes back, I can check that from both servers
> they see each other and "gluster peer status" shows they
believe the other
> is in connected state.
>
> C. When the failed server comes back, it is not automatically seeking
> active participation on syncing volume etc (the local storage timestamp
> file isn't updated).
>
> D. If I do restart of glusterd service (service glusterd restart)
the
> failed node seems to get back like it was before. Not always though...
The
> chance is higher if I have long time between fault injections (long
= 60
> sec or so, with a forced faulty state of 10 sec)
> With a period time of some minutes, I could have the cluster servicing
the
> client OK for up to 8+ hours at least.
> Shortening the period, I'm easily down to like 10-15 minutes.
>
> E. Sooner or later I enter a state where the two servers seems to
be up,
> seeing it's peer (gluster peer status) and such but none is serving
the
> volume to the client.
> I've tried to "heal" the volume in different way but it
doesn't help.
> Sometimes it is just that one of the timestamp copies in each of
> the servers is ahead which is simpler but sometimes both the timestamp
> files have added data at end that the other doesnt have.
>
> To the questions:
>
> * Is it so that from a design point of perspective, the choice in
the
> glusterfs team is that one shouldn't rely soley on glusterfs daemons
beeing
> able to  recover from a faulty state? There is need for cluster
manager
> services (like heartbeat for example) to be part? That would make
> experience C understandable and one could then take heartbeat or similar
> packages to start/stop services.
>
> * What would then be the recommended procedure to recover from a faulty
> glusterfs node? (so that experience D and E is not happening)
>
> * What is the expected failover timing (of course depending on config,
but
> say with a give ping timeout etc)?
>   and expected recovery timing (with similar dependency on config)?
>
> * What/how is glusterfs team testing to make sure that the failover,
> recovery/healing functionality etc works?
>
> Any opinion if the testcase is bad is of course also very welcome.
>
> Best regards,
> Per
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/69c23114/attachment-0001.html>

------------------------------

Message: 2
Date: Mon, 9 Dec 2013 15:51:31 +0200
From: "Alexandru Coseru" <alex.coseru@xxxxxxxxxx>
To: <gluster-users@xxxxxxxxxxx>
Subject:  Gluster - replica - Unable to self-heal

contents of                
'/' (possible split-brain)
Message-ID: <01fe01cef4e5$c3f2cb00$4bd86100$@coseru@xxxxxxxxxx>
Content-Type: text/plain; charset="us-ascii"

Hello,

I'm trying to build a replica volume, on two servers.

The servers are:  blade6 and blade7.  (another blade1 in the
peer, but with
no volumes)

The volume seems ok, but I cannot mount it from NFS.

Here are some logs:

[root@blade6 stor1]# df -h

/dev/mapper/gluster_stor1  882G  200M  837G   1% /gluster/stor1

[root@blade7 stor1]# df -h

/dev/mapper/gluster_fast   846G  158G  646G  20% /gluster/stor_fast

/dev/mapper/gluster_stor1  882G   72M  837G   1% /gluster/stor1

[root@blade6 stor1]# pwd

/gluster/stor1

[root@blade6 stor1]# ls -lh

total 0

[root@blade7 stor1]# pwd

/gluster/stor1

[root@blade7 stor1]# ls -lh

total 0

[root@blade6 stor1]# gluster volume info

Volume Name: stor_fast

Type: Distribute

Volume ID: ad82b554-8ff0-4903-be32-f8dcb9420f31

Status: Started

Number of Bricks: 1

Transport-type: tcp

Bricks:

Brick1: blade7.xen:/gluster/stor_fast

Options Reconfigured:

nfs.port: 2049

Volume Name: stor1

Type: Replicate

Volume ID: 6bd88164-86c2-40f6-9846-b21e90303e73

Status: Started

Number of Bricks: 1 x 2 = 2

Transport-type: tcp

Bricks:

Brick1: blade7.xen:/gluster/stor1

Brick2: blade6.xen:/gluster/stor1

Options Reconfigured:

nfs.port: 2049

[root@blade7 stor1]# gluster volume info

Volume Name: stor_fast

Type: Distribute

Volume ID: ad82b554-8ff0-4903-be32-f8dcb9420f31

Status: Started

Number of Bricks: 1

Transport-type: tcp

Bricks:

Brick1: blade7.xen:/gluster/stor_fast

Options Reconfigured:

nfs.port: 2049

Volume Name: stor1

Type: Replicate

Volume ID: 6bd88164-86c2-40f6-9846-b21e90303e73

Status: Started

Number of Bricks: 1 x 2 = 2

Transport-type: tcp

Bricks:

Brick1: blade7.xen:/gluster/stor1

Brick2: blade6.xen:/gluster/stor1

Options Reconfigured:

nfs.port: 2049

[root@blade6 stor1]# gluster volume status

Status of volume: stor_fast

Gluster process                

  Port    Online  Pid

----------------------------------------------------------------------------
--

Brick blade7.xen:/gluster/stor_fast          49152
  Y       1742

NFS Server on localhost              
                  2049  
 Y
20074

NFS Server on blade1.xen              
      2049    Y       22255

NFS Server on blade7.xen              
      2049    Y       7574

There are no active volume tasks

Status of volume: stor1

Gluster process                

  Port    Online  Pid

----------------------------------------------------------------------------
--

Brick blade7.xen:/gluster/stor1            
 49154   Y       7562

Brick blade6.xen:/gluster/stor1            
 49154   Y       20053

NFS Server on localhost              
                  2049  
 Y
20074

Self-heal Daemon on localhost            
              N/A     Y
20079

NFS Server on blade1.xen              
      2049    Y       22255

Self-heal Daemon on blade1.xen            
  N/A     Y       22260

NFS Server on blade7.xen              
      2049    Y       7574

Self-heal Daemon on blade7.xen            
  N/A     Y       7578

There are no active volume tasks

[root@blade7 stor1]# gluster volume status

Status of volume: stor_fast

Gluster process                

  Port    Online  Pid

----------------------------------------------------------------------------
--

Brick blade7.xen:/gluster/stor_fast          
 49152   Y       1742

NFS Server on localhost              
               2049    Y
      7574

NFS Server on blade6.xen              
                2049    Y
      20074

NFS Server on blade1.xen              
        2049    Y       22255

There are no active volume tasks

Status of volume: stor1

Gluster process                

  Port    Online  Pid

----------------------------------------------------------------------------
--

Brick blade7.xen:/gluster/stor1            
 49154   Y       7562

Brick blade6.xen:/gluster/stor1            
 49154   Y       20053

NFS Server on localhost              
                  2049  
 Y       7574

Self-heal Daemon on localhost            
              N/A     Y  
    7578

NFS Server on blade1.xen              
      2049    Y       22255

Self-heal Daemon on blade1.xen            
  N/A     Y       22260

NFS Server on blade6.xen              
               2049    Y
      20074

Self-heal Daemon on blade6.xen            
           N/A     Y    
  20079

There are no active volume tasks

[root@blade6 stor1]# gluster peer status

Number of Peers: 2

Hostname: blade1.xen

Port: 24007

Uuid: 194a57a7-cb0e-43de-a042-0ac4026fd07b

State: Peer in Cluster (Connected)

Hostname: blade7.xen

Port: 24007

Uuid: 574eb256-30d2-4639-803e-73d905835139

State: Peer in Cluster (Connected)

[root@blade7 stor1]# gluster peer status

Number of Peers: 2

Hostname: blade6.xen

Port: 24007

Uuid: a65cadad-ef79-4821-be41-5649fb204f3e

State: Peer in Cluster (Connected)

Hostname: blade1.xen

Uuid: 194a57a7-cb0e-43de-a042-0ac4026fd07b

State: Peer in Cluster (Connected)

[root@blade6 stor1]# gluster volume heal stor1 info

Gathering Heal info on volume stor1 has been successful

Brick blade7.xen:/gluster/stor1

Number of entries: 0

Brick blade6.xen:/gluster/stor1

Number of entries: 0

[root@blade7 stor1]# gluster volume heal stor1 info

Gathering Heal info on volume stor1 has been successful

Brick blade7.xen:/gluster/stor1

Number of entries: 0

Brick blade6.xen:/gluster/stor1

Number of entries: 0

When I'm trying to mount the volume with NFS, I have the following errors:

[2013-12-09 13:20:52.066978] E
[afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
0-stor1-replicate-0: Unable to self-heal contents of '/' (possible
split-brain). Please delete the file from all but the preferred subvolume.-
Pending matrix:  [ [ 0 2 ] [ 2 0 ] ]

[2013-12-09 13:20:52.067386] E
[afr-self-heal-common.c:2212:afr_self_heal_completion_cbk]
0-stor1-replicate-0: background  meta-data self-heal failed on /

[2013-12-09 13:20:52.067452] E [mount3.c:290:mnt3svc_lookup_mount_cbk]
0-nfs: error=Input/output error

[2013-12-09 13:20:53.092039] E
[afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
0-stor1-replicate-0: Unable to self-heal contents of '/' (possible
split-brain). Please delete the file from all but the preferred subvolume.-
Pending matrix:  [ [ 0 2 ] [ 2 0 ] ]

[2013-12-09 13:20:53.092497] E
[afr-self-heal-common.c:2212:afr_self_heal_completion_cbk]
0-stor1-replicate-0: background  meta-data self-heal failed on /

[2013-12-09 13:20:53.092559] E [mount3.c:290:mnt3svc_lookup_mount_cbk]
0-nfs: error=Input/output error

What I'm doing wrong ?

PS:  Volume stor_fast works like a charm.

Best Regards,

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/b0b21677/attachment-0001.html>

------------------------------

Message: 3
Date: Mon, 09 Dec 2013 14:18:28 +0100
From: Heiko Kr?mer <hkraemer@xxxxxxxxxxx>
To: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
Subject:  Gluster infrastructure question
Message-ID: <52A5C324.4090408@xxxxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Heyho guys,

I'm running since years glusterfs in a small environment without big
problems.

Now I'm going to use glusterFS for a bigger cluster but I've some
questions :)

Environment:
* 4 Servers
* 20 x 2TB HDD, each
* Raidcontroller
* Raid 10
* 4x bricks => Replicated, Distributed volume
* Gluster 3.4

1)
I'm asking me, if I can delete the raid10 on each server and create
for each HDD a separate brick.
In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there
any experience about the write throughput in a production system with
many of bricks like in this case? In addition i'll get double of HDD
capacity.

2)
I've heard a talk about glusterFS and out scaling. The main point was
if more bricks are in use, the scale out process will take a long
time. The problem was/is the Hash-Algo. So I'm asking me how is it if
I've one very big brick (Raid10 20TB on each server) or I've much more
bricks, what's faster and is there any issues?
Is there any experiences ?

3)
Failover of a HDD is for a raid controller with HotSpare HDD not a big
deal. Glusterfs will rebuild automatically if a brick fails and there
are no data present, this action will perform a lot of network traffic
between the mirror bricks but it will handle it equal as the raid
controller right ?

Thanks and cheers
Heiko

- -- 
Anynines.com

Avarteq GmbH
B.Sc. Informatik
Heiko Kr?mer
CIO
Twitter: @anynines

- ----
Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer
Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
Sitz: Saarbr?cken
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B
lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0
GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ
N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs
TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6
Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY=
=bDly
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hkraemer.vcf
Type: text/x-vcard
Size: 277 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/d70112ef/attachment-0001.vcf>

------------------------------

Message: 4
Date: Mon, 9 Dec 2013 09:51:41 -0500
From: Kal Black <kaloblak@xxxxxxxxx>
To: Paul Robert Marino <prmarino1@xxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject: Re:  How reliable is XFS under Gluster?
Message-ID:

<CADZk1LMcRjn=qG-mWbc5S8SeJtkFB2AZica2NKuU3Z7mwQ=2kQ@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"

Thank you all for the wonderful input,
I haven't used extensively XFS so far and my concerns primarily came from
reading an article (mostly the discussion after it) by Jonathan Corbetrom
on LWN (http://lwn.net/Articles/476263/)
and another one
http://toruonu.blogspot.ca/2012/12/xfs-vs-ext4.html.
They are both
relatively recent and I was under the impression the XFS still has
problems, in certain cases of power loss, where the metadata and the actual
data are not being in sync, which might lead existing data being corrupted.
But again, like Paul Robert Marino pointed out, choosing a right IO
scheduler might greatly reduce the risk of this to happen.

On Sun, Dec 8, 2013 at 11:04 AM, Paul Robert Marino <prmarino1@xxxxxxxxx>wrote:

> XFS is fine Ive been using it on various distros in production for
> over a decade now and I've rarely had any problems with it and when
I
> have they have been trivial to fix which is something I honestly cant
> say about ext3 or ext4.
>
> Usually when there is a power failure during a write if the
> transaction wasn't completely committed to the disk it is rolled back
> via the journal.the one exception to this is when you have a battery
> backed cache where the battery discharges before power is restored,
or
> a very cheap consumer grade disk which uses its cache for writes and
> lies about the sync state.
> in either of these scenarios any file system will have problems.
>
> Out of any of the filesystems Ive worked with in general XFS handles
> the battery discharge senario the cleanest and is the easiest to
> recover.
> if you have the second scenario with the cheap disks with a cache
that
> lies nothing will help you not even a fsync because the hardware lies.
> Also the subject of fsync is a little more complicated than most
> people think there are several kinds of fsync and each behaves
> differently on different filesystems. PostgreSQL has documentation
> about it here
> http://www.postgresql.org/docs/9.1/static/runtime-config-wal.html
> looks at wal_sync_method if you would like to have a better about
how
> fsync works without getting too deep into the subject.
>
> By the way most apps don't need to do fsyncs and it would bring your
> system to a crawl if they all did so take people saying
> all programs should fsync with a grain of salt.
>
> In most cases when these problems come up its really that they didn't
> set the right IO scheduler for what the server does. For example CFQ
> which is the EL default can leave your write in ram cache for quite
a
> while before sending it to disk in an attempt to optimize your IO;
> however the deadline scheduler will attempt to optimize your IO but
> will predictably sync it to disk after a period of time regardless
of
> whether it was able to fully optimize it or not. Also there is noop
> which does no optimization at all and leaves every thing to the
> hardware, this is common and recommended for VM's and there is some
> argument to use it with high end raid controllers for things like
> financial data where you need to absolutely ensure the write happen
> ASAP because there may be fines or other large penalties if you loose
> any data.
>
>
>
> On Sat, Dec 7, 2013 at 3:04 AM, Franco Broi <Franco.Broi@xxxxxxxxxx>
> wrote:
> > Been using ZFS for about 9 months and am about to add as other
400TB, no
> > issues so far.
> >
> > On 7 Dec 2013 04:23, Brian Foster <bfoster@xxxxxxxxxx>
wrote:
> > On 12/06/2013 01:57 PM, Kal Black wrote:
> >> Hello,
> >> I am in the point of picking up a FS for new brick nodes.
I was used to
> >> like and use ext4 until now but I recently red for an issue
introduced
> by
> >> a
> >> patch in ext4 that breaks the distributed translator. In
the same time,
> it
> >> looks like the recommended FS for a brick is no longer ext4
but XFS
> which
> >> apparently will also be the default FS in the upcoming RedHat7.
On the
> >> other hand, XFS is being known as a file system that can
be easily
> >> corrupted (zeroing files) in case of a power failure. Supporters
of the
> >> file system claim that this should never happen if an application
has
> been
> >> properly coded (properly committing/fsync-ing data to storage)
and the
> >> storage itself has been properly configured (disk cash disabled
on
> >> individual disks and battery backed cache used on the controllers).
My
> >> question is, should I be worried about losing data in a power
failure or
> >> similar scenarios (or any) using GlusterFS and XFS? Are there
best
> >> practices for setting up a Gluster brick + XFS? Has the ext4
issue been
> >> reliably fixed? (my understanding is that this will be impossible
unless
> >> ext4 isn't being modified to allow popper work with Gluster)
> >>
> >
> > Hi Kal,
> >
> > You are correct in that Red Hat recommends using XFS for gluster
bricks.
> > I'm sure there are plenty of ext4 (and other fs) users as well,
so other
> > users should chime in as far as real experiences with various
brick
> > filesystems goes. Also, I believe the dht/ext issue has been
resolved
> > for some time now.
> >
> > With regard to "XFS zeroing files on power failure,"
I'd suggest you
> > check out the following blog post:
> >
> >
> http://sandeen.net/wordpress/computers/xfs-does-not-null-files-and-requires-no-flux/
> >
> > My cursory understanding is that there were apparently situations
where
> > the inode size of a recently extended file would be written to
the log
> > before the actual extending data is written to disk, thus creating
a
> > crash window where the updated size would be seen, but not the
actual
> > data. In other words, this isn't a "zeroing files"
behavior in as much
> > as it is an ordering issue with logging the inode size. This
is probably
> > why you've encountered references to fsync(), because with the
fix your
> > data is still likely lost (unless/until you've run an fsync to
flush to
> > disk), you just shouldn't see the extended inode size unless
the actual
> > data made it to disk.
> >
> > Also note that this was fixed in 2007. ;)
> >
> > Brian
> >
> >> Best regards
> >>
> >>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users@xxxxxxxxxxx
> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users@xxxxxxxxxxx
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >
> > ________________________________
> >
> >
> > This email and any files transmitted with it are confidential
and are
> > intended solely for the use of the individual or entity to whom
they are
> > addressed. If you are not the original recipient or the person
> responsible
> > for delivering the email to the intended recipient, be advised
that you
> have
> > received this email in error, and that any use, dissemination,
> forwarding,
> > printing, or copying of this email is strictly prohibited. If
you
> received
> > this email in error, please immediately notify the sender and
delete the
> > original.
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users@xxxxxxxxxxx
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/4b56a323/attachment-0001.html>

------------------------------

Message: 5
Date: Mon, 09 Dec 2013 15:44:24 +0000
From: Nux! <nux@xxxxxxxxx>
To: gluster-users@xxxxxxxxxxx
Subject: Re:  Gluster infrastructure question
Message-ID: <9775f8114ebbc392472010f2d9bdf432@xxxxxxxxx>
Content-Type: text/plain; charset=UTF-8; format=flowed

On 09.12.2013 13:18, Heiko Kr?mer wrote:
> 1)
> I'm asking me, if I can delete the raid10 on each server and create
> for each HDD a separate brick.
> In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there
> any experience about the write throughput in a production system with
> many of bricks like in this case? In addition i'll get double of HDD
> capacity.

I have found problems with bricks to be disruptive whereas replacing a

RAID member is quite trivial. I would recommend against dropping RAID.

> 3)
> Failover of a HDD is for a raid controller with HotSpare HDD not a
big
> deal. Glusterfs will rebuild automatically if a brick fails and there
> are no data present, this action will perform a lot of network traffic
> between the mirror bricks but it will handle it equal as the raid
> controller right ?

Gluster will not "rebuild automatically" a brick, you will need
to 
manually add/remove it.
Additionally, if a brick goes bad gluster won't do anything about it, 
the affected volumes will just slow down or stop working at all.

Again, my advice is KEEP THE RAID and set up good monitoring of drives.

:)

HTH
Lucian

-- 
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

------------------------------

Message: 6
Date: Mon, 9 Dec 2013 07:57:47 -0800
From: Randy Breunling <rbreunling@xxxxxxxxx>
To: gluster-users@xxxxxxxxxxx
Cc: Randy Breunling <rbreunling@xxxxxxxxx>
Subject:  Scalability - File system or Object Store
Message-ID:

<CAJwwApQ5-SvboWV_iRGC+HJSuT25xSoz_9CBJfGDmpqT4tDJzw@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"

>From any experience...which has shown to scale better...a file system
or an
object store?

--Randy
San Jose CA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/dcf7491e/attachment-0001.html>

------------------------------

Message: 7
Date: Mon, 9 Dec 2013 11:07:58 -0500
From: Jay Vyas <jayunit100@xxxxxxxxx>
To: Randy Breunling <rbreunling@xxxxxxxxx>
Cc: "Gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Scalability - File system or Object Store
Message-ID:

<CAAu13zE4kYJ1Dt9ypOMt=M=ps7QfyPSn4LSqZ3YLYBnW5pE4yA@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"

in object stores you sacrifice the consistency gauranteed by filesystems
for **higher** availability.     probably by "scale"
you mean higher
availability, so...  the answer is probably object storage.

That said, gluster is an interesting file system in that it is
"object-like" --- it is really fast for lookups.... and so if
you aren't
really sure you need objects, you might be able to do just fine with
gluster out of the box.

One really cool idea that is permeating the gluster community nowadays
is
this "UFO" concept, -- you can easily start with regular gluster,
and then
layer an object store on top at a later date if you want to  sacrifice
posix operations for (even) higher availability.

"Unified File and Object Storage - Unified file and object storage
allows
admins to utilize the same data store for both POSIX-style mounts as well
as S3 or Swift-compatible APIs."   (from
http://gluster.org/community/documentation/index.php/3.3beta)

On Mon, Dec 9, 2013 at 10:57 AM, Randy Breunling <rbreunling@xxxxxxxxx>wrote:

> From any experience...which has shown to scale better...a file system
or
> an object store?
>
> --Randy
> San Jose CA
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>

-- 
Jay Vyas
http://jayunit100.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/e46cf569/attachment-0001.html>

------------------------------

Message: 8
Date: Mon, 09 Dec 2013 08:09:24 -0800
From: Joe Julian <joe@xxxxxxxxxxxxxxxx>
To: Nux! <nux@xxxxxxxxx>,gluster-users@xxxxxxxxxxx
Subject: Re:  Gluster infrastructure question
Message-ID: <698ab788-9f27-44a6-bd98-a53eb25f4573@xxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset=UTF-8

Nux! <nux@xxxxxxxxx> wrote:
>On 09.12.2013 13:18, Heiko Kr?mer wrote:
>> 1)
>> I'm asking me, if I can delete the raid10 on each server and create
>> for each HDD a separate brick.
>> In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is
there
>> any experience about the write throughput in a production system
with
>> many of bricks like in this case? In addition i'll get double
of HDD
>> capacity.
>
>I have found problems with bricks to be disruptive whereas replacing
a 
>RAID member is quite trivial. I would recommend against dropping RAID.
>

Brick disruption has been addressed in 3.4.

>> 3)
>> Failover of a HDD is for a raid controller with HotSpare HDD not
a
>big
>> deal. Glusterfs will rebuild automatically if a brick fails and
there
>> are no data present, this action will perform a lot of network
>traffic
>> between the mirror bricks but it will handle it equal as the raid
>> controller right ?
>
>Gluster will not "rebuild automatically" a brick, you will
need to 
>manually add/remove it.
Not exactly, but you will have to manually add an attribute and "heal...full"
to re-mirror the replacement.

>Additionally, if a brick goes bad gluster won't do anything about it,

>the affected volumes will just slow down or stop working at all.
>

Again, addressed in 3.4.

>Again, my advice is KEEP THE RAID and set up good monitoring of drives.
>

I'm not arguing for or against RAID. It's another tool in our tool box.
I, personally, use JBOD. Our use case has a lot of different files being
used by different clients. JBOD maximizes our use of cache.

------------------------------

Message: 9
Date: Mon, 9 Dec 2013 11:28:05 -0500 (EST)
From: John Mark Walker <johnmark@xxxxxxxxxxx>
To: "Kaleb S. KEITHLEY" <kkeithle@xxxxxxxxxx>
Cc: "Gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>,

Gluster Devel <gluster-devel@xxxxxxxxxx>
Subject: Re: [Gluster-users] [Gluster-devel] GlusterFest Test Weekend

- 3.5                
Test #1
Message-ID:

<1654421306.26844542.1386606485161.JavaMail.root@xxxxxxxxxx>
Content-Type: text/plain; charset=utf-8

Incidentally, we're wrapping this up today. If you want to be included
in the list of swag-receivers (t-shirt, USB car charger, and stickers),
you still have a couple of hours to file a bug and have it verified by
the dev team.

Thanks, everyone :)

-JM

----- Original Message -----
> On 12/05/2013 09:31 PM, John Mark Walker wrote:
> > Greetings,
> >
> > If you've been keeping up with our weekly meetings and the 3.5
planning
> > page, then you know that tomorrow, December 6, is the first testing
"day"
> > for 3.5. But since this is a Friday, we're going to make the
party last
> > all weekend, through mid-day Monday.
> >
> 
> YUM repos with 3.5.0qa3 RPMs for EPEL-6 and Fedora 18, 19, and 20
 are
> available at
> http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.5.0qa3/
> 
> 
> --
> 
> Kaleb
> 
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> https://lists.nongnu.org/mailman/listinfo/gluster-devel
> 

------------------------------

Message: 10
Date: Mon, 09 Dec 2013 16:43:42 +0000
From: Nux! <nux@xxxxxxxxx>
To: Joe Julian <joe@xxxxxxxxxxxxxxxx>
Cc: gluster-users@xxxxxxxxxxx
Subject: Re:  Gluster infrastructure question
Message-ID: <b48aa7ed1b14432fc4047c934320e941@xxxxxxxxx>
Content-Type: text/plain; charset=UTF-8; format=flowed

On 09.12.2013 16:09, Joe Julian wrote:
>> 
> 
> Brick disruption has been addressed in 3.4.

Good to know! What exactly happens when the brick goes unresponsive?

>> Additionally, if a brick goes bad gluster won't do anything about
it,
>> the affected volumes will just slow down or stop working at all.
>> 
> 
> Again, addressed in 3.4.

How? What is the expected behaviour now?

Thanks!

-- 
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

------------------------------

Message: 11
Date: Mon, 9 Dec 2013 18:03:59 +0100
From: samuel <samu60@xxxxxxxxx>
To: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject:  compatibility between 3.3 and 3.4
Message-ID:

<CAOg=WDc-JT=CfqE39qWSPTjP2OqKj4L_oCfDG8icQKVTpi+0JQ@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"

Hi all,

We're playing around with new versions and uprading options. We currently
have a 2x2x2 stripped-distributed-replicated volume based on 3.3.0 and
we're planning to upgrade to 3.4 version.

We've tried upgrading fist the clients and we've tried with 3.4.0, 3.4.1
and 3.4.2qa2 but all of them caused the same error:

Failed to get stripe-size

So it seems as if 3.4 clients are not compatible to 3.3 volumes. Is this
assumtion right?

Is there any procedure to upgrade the gluster from 3.3 to 3.4 without
stopping the service?
Where are the compatibility limitations between these 2 versions?

Any hint or link to documentation would be highly appreciated.

Thank you in advance,
Samuel.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/cec50893/attachment-0001.html>

------------------------------

Message: 12
Date: Mon, 9 Dec 2013 19:52:57 +0100
From: bernhard glomm <bernhard.glomm@xxxxxxxxxxx>
To: Heiko Kr?mer <hkraemer@xxxxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Gluster infrastructure question
Message-ID: <E2AB54DC-4D82-4734-9BE2-E7B0B700BBA3@xxxxxxxxxxx>
Content-Type: text/plain; charset="windows-1252"

Hi Heiko,

some years ago I had to deliver a reliable storage that should be easy
to grow in size over time.
For that I was in close contact with
presto prime who produced a lot of interesting research results accessible
to the public.
http://www.prestoprime.org/project/public.en.html
what was striking me was the general concern of how and when and with which
pattern hard drives will fail,
and the rebuilding time in case a "big" (i.e. 2TB+) drive fails.
(one of the papers at pp was dealing in detail with that)
From that background my approach was to build relatively small raid6 bricks
(9 * 2 TB + 1 Hot-Spare)
and connect them together with a distributed glusterfs.
I never experienced any problems with that and felt quite comfortable about
it.
That was for just a lot of big file data exported via samba.
At the same time I used another, mirrored, glusterfs as a storage backend
for 
my VM-images, same there, no problem and much less hazel and headache than
drbd and ocfs2 
which I run on another system.
hth
best 

Bernhard

 Bernhard Glomm
IT Administration

Phone:                
 +49 (30) 86880 134
Fax:                
 +49 (30) 86880 100
Skype:                
 bernhard.glomm.ecologic

Ecologic Institut gemeinn?tzige GmbH | Pfalzburger Str. 43/44 | 10717 Berlin
| Germany
GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.:
DE811963464
Ecologic? is a Trade Mark (TM) of Ecologic Institut gemeinn?tzige GmbH

On Dec 9, 2013, at 2:18 PM, Heiko Kr?mer <hkraemer@xxxxxxxxxxx> wrote:

> Signed PGP part
> Heyho guys,
> 
> I'm running since years glusterfs in a small environment without big
> problems.
> 
> Now I'm going to use glusterFS for a bigger cluster but I've some
> questions :)
> 
> Environment:
> * 4 Servers
> * 20 x 2TB HDD, each
> * Raidcontroller
> * Raid 10
> * 4x bricks => Replicated, Distributed volume
> * Gluster 3.4
> 
> 1)
> I'm asking me, if I can delete the raid10 on each server and create
> for each HDD a separate brick.
> In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there
> any experience about the write throughput in a production system with
> many of bricks like in this case? In addition i'll get double of HDD
> capacity.
> 
> 2)
> I've heard a talk about glusterFS and out scaling. The main point
was
> if more bricks are in use, the scale out process will take a long
> time. The problem was/is the Hash-Algo. So I'm asking me how is it
if
> I've one very big brick (Raid10 20TB on each server) or I've much
more
> bricks, what's faster and is there any issues?
> Is there any experiences ?
> 
> 3)
> Failover of a HDD is for a raid controller with HotSpare HDD not a
big
> deal. Glusterfs will rebuild automatically if a brick fails and there
> are no data present, this action will perform a lot of network traffic
> between the mirror bricks but it will handle it equal as the raid
> controller right ?
> 
> 
> 
> Thanks and cheers
> Heiko
> 
> 
> 
> --
> Anynines.com
> 
> Avarteq GmbH
> B.Sc. Informatik
> Heiko Kr?mer
> CIO
> Twitter: @anynines
> 
> ----
> Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer
> Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
> Sitz: Saarbr?cken
> 
> <hkraemer.vcf>_______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/c95b9cc8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/c95b9cc8/attachment-0001.sig>

------------------------------

Message: 13
Date: Mon, 9 Dec 2013 14:26:45 -0500 (EST)
From: Ben Turner <bturner@xxxxxxxxxx>
To: Heiko Kr?mer <hkraemer@xxxxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Gluster infrastructure question
Message-ID: <124648027.2334242.1386617205234.JavaMail.root@xxxxxxxxxx>
Content-Type: text/plain; charset=utf-8

----- Original Message -----
> From: "Heiko Kr?mer" <hkraemer@xxxxxxxxxxx>
> To: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
> Sent: Monday, December 9, 2013 8:18:28 AM
> Subject:  Gluster infrastructure question
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Heyho guys,
> 
> I'm running since years glusterfs in a small environment without big
> problems.
> 
> Now I'm going to use glusterFS for a bigger cluster but I've some
> questions :)
> 
> Environment:
> * 4 Servers
> * 20 x 2TB HDD, each
> * Raidcontroller
> * Raid 10
> * 4x bricks => Replicated, Distributed volume
> * Gluster 3.4
> 
> 1)
> I'm asking me, if I can delete the raid10 on each server and create
> for each HDD a separate brick.
> In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there
> any experience about the write throughput in a production system with
> many of bricks like in this case? In addition i'll get double of HDD
> capacity.

Have a look at:

http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf

Specifically:

? RAID arrays
? More RAID LUNs for better concurrency
? For RAID6, 256-KB stripe size

I use a single RAID 6 that is divided into several LUNs for my bricks.
 For example, on my Dell servers(with PERC6 RAID controllers) each
server has 12 disks that I put into raid 6.  Then I break the RAID
6 into 6 LUNs and create a new PV/VG/LV for each brick.  From there
I follow the recommendations listed in the presentation.

HTH!

-b

> 2)
> I've heard a talk about glusterFS and out scaling. The main point
was
> if more bricks are in use, the scale out process will take a long
> time. The problem was/is the Hash-Algo. So I'm asking me how is it
if
> I've one very big brick (Raid10 20TB on each server) or I've much
more
> bricks, what's faster and is there any issues?
> Is there any experiences ?
> 
> 3)
> Failover of a HDD is for a raid controller with HotSpare HDD not a
big
> deal. Glusterfs will rebuild automatically if a brick fails and there
> are no data present, this action will perform a lot of network traffic
> between the mirror bricks but it will handle it equal as the raid
> controller right ?
> 
> 
> 
> Thanks and cheers
> Heiko
> 
> 
> 
> - --
> Anynines.com
> 
> Avarteq GmbH
> B.Sc. Informatik
> Heiko Kr?mer
> CIO
> Twitter: @anynines
> 
> - ----
> Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer
> Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
> Sitz: Saarbr?cken
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> 
> iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B
> lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0
> GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ
> N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs
> TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6
> Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY=
> =bDly
> -----END PGP SIGNATURE-----
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

------------------------------

Message: 14
Date: Mon, 9 Dec 2013 14:31:00 -0500 (EST)
From: Ben Turner <bturner@xxxxxxxxxx>
To: Heiko Kr?mer <hkraemer@xxxxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Gluster infrastructure question
Message-ID:

<1676822821.2336090.1386617460049.JavaMail.root@xxxxxxxxxx>
Content-Type: text/plain; charset=utf-8

----- Original Message -----
> From: "Ben Turner" <bturner@xxxxxxxxxx>
> To: "Heiko Kr?mer" <hkraemer@xxxxxxxxxxx>
> Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
> Sent: Monday, December 9, 2013 2:26:45 PM
> Subject: Re: [Gluster-users] Gluster infrastructure question
> 
> ----- Original Message -----
> > From: "Heiko Kr?mer" <hkraemer@xxxxxxxxxxx>
> > To: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
> > Sent: Monday, December 9, 2013 8:18:28 AM
> > Subject:  Gluster infrastructure question
> > 
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> > 
> > Heyho guys,
> > 
> > I'm running since years glusterfs in a small environment without
big
> > problems.
> > 
> > Now I'm going to use glusterFS for a bigger cluster but I've
some
> > questions :)
> > 
> > Environment:
> > * 4 Servers
> > * 20 x 2TB HDD, each
> > * Raidcontroller
> > * Raid 10
> > * 4x bricks => Replicated, Distributed volume
> > * Gluster 3.4
> > 
> > 1)
> > I'm asking me, if I can delete the raid10 on each server and
create
> > for each HDD a separate brick.
> > In this case have a volume 80 Bricks so 4 Server x 20 HDD's.
Is there
> > any experience about the write throughput in a production system
with
> > many of bricks like in this case? In addition i'll get double
of HDD
> > capacity.
> 
> Have a look at:
> 
> http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf

That one was from 2012, here is the latest:

http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf

-b

> Specifically:
> 
> ? RAID arrays
> ? More RAID LUNs for better concurrency
> ? For RAID6, 256-KB stripe size
> 
> I use a single RAID 6 that is divided into several LUNs for my bricks.
 For
> example, on my Dell servers(with PERC6 RAID controllers) each server
has 12
> disks that I put into raid 6.  Then I break the RAID 6 into 6
LUNs and
> create a new PV/VG/LV for each brick.  From there I follow the
> recommendations listed in the presentation.
> 
> HTH!
> 
> -b
>  
> > 2)
> > I've heard a talk about glusterFS and out scaling. The main point
was
> > if more bricks are in use, the scale out process will take a
long
> > time. The problem was/is the Hash-Algo. So I'm asking me how
is it if
> > I've one very big brick (Raid10 20TB on each server) or I've
much more
> > bricks, what's faster and is there any issues?
> > Is there any experiences ?
> > 
> > 3)
> > Failover of a HDD is for a raid controller with HotSpare HDD
not a big
> > deal. Glusterfs will rebuild automatically if a brick fails and
there
> > are no data present, this action will perform a lot of network
traffic
> > between the mirror bricks but it will handle it equal as the
raid
> > controller right ?
> > 
> > 
> > 
> > Thanks and cheers
> > Heiko
> > 
> > 
> > 
> > - --
> > Anynines.com
> > 
> > Avarteq GmbH
> > B.Sc. Informatik
> > Heiko Kr?mer
> > CIO
> > Twitter: @anynines
> > 
> > - ----
> > Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer
> > Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
> > Sitz: Saarbr?cken
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.14 (GNU/Linux)
> > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> > 
> > iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B
> > lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0
> > GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ
> > N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs
> > TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6
> > Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY=
> > =bDly
> > -----END PGP SIGNATURE-----
> > 
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users@xxxxxxxxxxx
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

------------------------------

Message: 15
Date: Mon, 09 Dec 2013 14:57:08 -0500
From: Jeff Darcy <jdarcy@xxxxxxxxxx>
To: Randy Breunling <rbreunling@xxxxxxxxx>, gluster-users@xxxxxxxxxxx
Subject: Re: [Gluster-users] Scalability - File system or Object Store
Message-ID: <52A62094.1000507@xxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 12/09/2013 10:57 AM, Randy Breunling wrote:
> From any experience...which has shown to scale better...a file system
>  or an object store?

In terms of numbers of files/objects, I'd have to say object stores.  S3
and Azure are both over a *trillion* objects, and I've never heard of a
filesystem that size.  In terms of performance it might go the other
way.  More importantly, I think the object stores give up too much
in
terms of semantics - e.g. hierarchical directories and rename, byte
granularity, consistency/durability guarantees.  It saddens me to
see so
many people working around these limitations in their apps based on
object stores - duplicating each others' work, creating
incompatibibility (e.g. with a half dozen "conventions" for simulating
hierarchical directories), and sometimes even losing data to subtle
distributed-coordination bugs.  An app that uses a subset of an
underlying filesystem's functionality is far more likely to be correct
and portable than one that tries to build extra abstractions on top of
a
bare-bones object store.

------------------------------

Message: 16
Date: Tue, 10 Dec 2013 07:58:25 +1000
From: Dan Mons <dmons@xxxxxxxxxxxxxxxxxx>
To: Ben Turner <bturner@xxxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>,

Heiko Kr?mer <hkraemer@xxxxxxxxxxx>
Subject: Re:  Gluster infrastructure question
Message-ID:

<CACa6TycgVYLNOWkk7eO2L80hhEdQLJpgk-+Bav_dfL2gPVGpjw@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset=UTF-8

I went with big RAID on each node (16x 3TB SATA disks in RAID6 with a
hot spare per node) rather than brick-per-disk.  The simple reason
being that I wanted to configure distribute+replicate at the GlusterFS
level, and be 100% guaranteed that the replication happened across to
another node, and not to another brick on the same node.  As each
node
only has one giant brick, the cluster is forced to replicate to a
separate node each time.

Some careful initial setup could probably have done the same, but I
wanted to avoid the dramas of my employer expanding the cluster one
node at a time later on, causing that design goal to fail as the new
single node with many bricks found replication partners on itself.

On a different topic, I find no real-world difference in RAID10 to
RAID6 with GlusterFS.  Most of the access delay in Gluster has little
to do with the speed of the disk.  The only downside to RAID6 is a
long rebuild time if you're unlucky enough to blow a couple of drives
at once.  RAID50 might be a better choice if you're up at 20 drives
per node.

We invested in SSD caching on our nodes, and to be honest it was
rather pointless.  Certainly not bad, but the real-world speed boost
is not noticed by end users.

-Dan

----------------
Dan Mons
R&D SysAdmin
Unbreaker of broken things
Cutting Edge
http://cuttingedge.com.au

On 10 December 2013 05:31, Ben Turner <bturner@xxxxxxxxxx> wrote:
> ----- Original Message -----
>> From: "Ben Turner" <bturner@xxxxxxxxxx>
>> To: "Heiko Kr?mer" <hkraemer@xxxxxxxxxxx>
>> Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
>> Sent: Monday, December 9, 2013 2:26:45 PM
>> Subject: Re: [Gluster-users] Gluster infrastructure question
>>
>> ----- Original Message -----
>> > From: "Heiko Kr?mer" <hkraemer@xxxxxxxxxxx>
>> > To: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
>> > Sent: Monday, December 9, 2013 8:18:28 AM
>> > Subject:  Gluster infrastructure question
>> >
>> > -----BEGIN PGP SIGNED MESSAGE-----
>> > Hash: SHA1
>> >
>> > Heyho guys,
>> >
>> > I'm running since years glusterfs in a small environment
without big
>> > problems.
>> >
>> > Now I'm going to use glusterFS for a bigger cluster but I've
some
>> > questions :)
>> >
>> > Environment:
>> > * 4 Servers
>> > * 20 x 2TB HDD, each
>> > * Raidcontroller
>> > * Raid 10
>> > * 4x bricks => Replicated, Distributed volume
>> > * Gluster 3.4
>> >
>> > 1)
>> > I'm asking me, if I can delete the raid10 on each server
and create
>> > for each HDD a separate brick.
>> > In this case have a volume 80 Bricks so 4 Server x 20 HDD's.
Is there
>> > any experience about the write throughput in a production
system with
>> > many of bricks like in this case? In addition i'll get double
of HDD
>> > capacity.
>>
>> Have a look at:
>>
>> http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf
>
> That one was from 2012, here is the latest:
>
> http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf
>
> -b
>
>> Specifically:
>>
>> ? RAID arrays
>> ? More RAID LUNs for better concurrency
>> ? For RAID6, 256-KB stripe size
>>
>> I use a single RAID 6 that is divided into several LUNs for my
bricks.  For
>> example, on my Dell servers(with PERC6 RAID controllers) each
server has 12
>> disks that I put into raid 6.  Then I break the RAID 6 into
6 LUNs and
>> create a new PV/VG/LV for each brick.  From there I follow
the
>> recommendations listed in the presentation.
>>
>> HTH!
>>
>> -b
>>
>> > 2)
>> > I've heard a talk about glusterFS and out scaling. The main
point was
>> > if more bricks are in use, the scale out process will take
a long
>> > time. The problem was/is the Hash-Algo. So I'm asking me
how is it if
>> > I've one very big brick (Raid10 20TB on each server) or I've
much more
>> > bricks, what's faster and is there any issues?
>> > Is there any experiences ?
>> >
>> > 3)
>> > Failover of a HDD is for a raid controller with HotSpare
HDD not a big
>> > deal. Glusterfs will rebuild automatically if a brick fails
and there
>> > are no data present, this action will perform a lot of network
traffic
>> > between the mirror bricks but it will handle it equal as
the raid
>> > controller right ?
>> >
>> >
>> >
>> > Thanks and cheers
>> > Heiko
>> >
>> >
>> >
>> > - --
>> > Anynines.com
>> >
>> > Avarteq GmbH
>> > B.Sc. Informatik
>> > Heiko Kr?mer
>> > CIO
>> > Twitter: @anynines
>> >
>> > - ----
>> > Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer
>> > Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
>> > Sitz: Saarbr?cken
>> > -----BEGIN PGP SIGNATURE-----
>> > Version: GnuPG v1.4.14 (GNU/Linux)
>> > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>> >
>> > iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B
>> > lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0
>> > GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ
>> > N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs
>> > TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6
>> > Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY=
>> > =bDly
>> > -----END PGP SIGNATURE-----
>> >
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users@xxxxxxxxxxx
>> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@xxxxxxxxxxx
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

------------------------------

Message: 17
Date: Mon, 09 Dec 2013 14:09:11 -0800
From: Joe Julian <joe@xxxxxxxxxxxxxxxx>
To: Dan Mons <dmons@xxxxxxxxxxxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
Subject: Re: [Gluster-users] Gluster infrastructure question
Message-ID: <52A63F87.8070107@xxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset=UTF-8; format=flowed

Replicas are defined in the order bricks are listed in the volume create

command. So
   gluster volume create myvol replica 2 server1:/data/brick1 
server2:/data/brick1 server3:/data/brick1 server4:/data/brick1
will replicate between server1 and server2 and replicate between server3

and server4.

Bricks added to a replica 2 volume after it's been created will require

pairs of bricks,

The best way to "force" replication to happen on another server
is to 
just define it that way.

On 12/09/2013 01:58 PM, Dan Mons wrote:
> I went with big RAID on each node (16x 3TB SATA disks in RAID6 with
a
> hot spare per node) rather than brick-per-disk.  The simple reason
> being that I wanted to configure distribute+replicate at the GlusterFS
> level, and be 100% guaranteed that the replication happened across
to
> another node, and not to another brick on the same node.  As
each node
> only has one giant brick, the cluster is forced to replicate to a
> separate node each time.
>
> Some careful initial setup could probably have done the same, but
I
> wanted to avoid the dramas of my employer expanding the cluster one
> node at a time later on, causing that design goal to fail as the new
> single node with many bricks found replication partners on itself.
>
> On a different topic, I find no real-world difference in RAID10 to
> RAID6 with GlusterFS.  Most of the access delay in Gluster has
little
> to do with the speed of the disk.  The only downside to RAID6
is a
> long rebuild time if you're unlucky enough to blow a couple of drives
> at once.  RAID50 might be a better choice if you're up at 20
drives
> per node.
>
> We invested in SSD caching on our nodes, and to be honest it was
> rather pointless.  Certainly not bad, but the real-world speed
boost
> is not noticed by end users.
>
> -Dan
>
> ----------------
> Dan Mons
> R&D SysAdmin
> Unbreaker of broken things
> Cutting Edge
> http://cuttingedge.com.au
>
>
> On 10 December 2013 05:31, Ben Turner <bturner@xxxxxxxxxx> wrote:
>> ----- Original Message -----
>>> From: "Ben Turner" <bturner@xxxxxxxxxx>
>>> To: "Heiko Kr?mer" <hkraemer@xxxxxxxxxxx>
>>> Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
>>> Sent: Monday, December 9, 2013 2:26:45 PM
>>> Subject: Re:  Gluster infrastructure question
>>>
>>> ----- Original Message -----
>>>> From: "Heiko Kr?mer" <hkraemer@xxxxxxxxxxx>
>>>> To: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
>>>> Sent: Monday, December 9, 2013 8:18:28 AM
>>>> Subject:  Gluster infrastructure question
>>>>
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> Heyho guys,
>>>>
>>>> I'm running since years glusterfs in a small environment
without big
>>>> problems.
>>>>
>>>> Now I'm going to use glusterFS for a bigger cluster but
I've some
>>>> questions :)
>>>>
>>>> Environment:
>>>> * 4 Servers
>>>> * 20 x 2TB HDD, each
>>>> * Raidcontroller
>>>> * Raid 10
>>>> * 4x bricks => Replicated, Distributed volume
>>>> * Gluster 3.4
>>>>
>>>> 1)
>>>> I'm asking me, if I can delete the raid10 on each server
and create
>>>> for each HDD a separate brick.
>>>> In this case have a volume 80 Bricks so 4 Server x 20
HDD's. Is there
>>>> any experience about the write throughput in a production
system with
>>>> many of bricks like in this case? In addition i'll get
double of HDD
>>>> capacity.
>>> Have a look at:
>>>
>>> http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf
>> That one was from 2012, here is the latest:
>>
>> http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf
>>
>> -b
>>
>>> Specifically:
>>>
>>> ? RAID arrays
>>> ? More RAID LUNs for better concurrency
>>> ? For RAID6, 256-KB stripe size
>>>
>>> I use a single RAID 6 that is divided into several LUNs for
my bricks.  For
>>> example, on my Dell servers(with PERC6 RAID controllers) each
server has 12
>>> disks that I put into raid 6.  Then I break the RAID
6 into 6 LUNs and
>>> create a new PV/VG/LV for each brick.  From there I follow
the
>>> recommendations listed in the presentation.
>>>
>>> HTH!
>>>
>>> -b
>>>
>>>> 2)
>>>> I've heard a talk about glusterFS and out scaling. The
main point was
>>>> if more bricks are in use, the scale out process will
take a long
>>>> time. The problem was/is the Hash-Algo. So I'm asking
me how is it if
>>>> I've one very big brick (Raid10 20TB on each server) or
I've much more
>>>> bricks, what's faster and is there any issues?
>>>> Is there any experiences ?
>>>>
>>>> 3)
>>>> Failover of a HDD is for a raid controller with HotSpare
HDD not a big
>>>> deal. Glusterfs will rebuild automatically if a brick
fails and there
>>>> are no data present, this action will perform a lot of
network traffic
>>>> between the mirror bricks but it will handle it equal
as the raid
>>>> controller right ?
>>>>
>>>>
>>>>
>>>> Thanks and cheers
>>>> Heiko
>>>>
>>>>
>>>>
>>>> - --
>>>> Anynines.com
>>>>
>>>> Avarteq GmbH
>>>> B.Sc. Informatik
>>>> Heiko Kr?mer
>>>> CIO
>>>> Twitter: @anynines
>>>>
>>>> - ----
>>>> Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian
Fischer
>>>> Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.:
DE262633168
>>>> Sitz: Saarbr?cken
>>>> -----BEGIN PGP SIGNATURE-----
>>>> Version: GnuPG v1.4.14 (GNU/Linux)
>>>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>>>
>>>> iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B
>>>> lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0
>>>> GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ
>>>> N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs
>>>> TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6
>>>> Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY=
>>>> =bDly
>>>> -----END PGP SIGNATURE-----
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users@xxxxxxxxxxx
>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users@xxxxxxxxxxx
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@xxxxxxxxxxx
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

------------------------------

Message: 18
Date: Tue, 10 Dec 2013 09:38:03 +1000
From: Dan Mons <dmons@xxxxxxxxxxxxxxxxxx>
To: Joe Julian <joe@xxxxxxxxxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Gluster infrastructure question
Message-ID:

<CACa6TyenCTAgoKKsXCmrvd0G191VdBPkdNf3j4yROkT_9jTyhQ@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1

On 10 December 2013 08:09, Joe Julian <joe@xxxxxxxxxxxxxxxx> wrote:
> Replicas are defined in the order bricks are listed in the volume
create
> command. So
>   gluster volume create myvol replica 2 server1:/data/brick1
> server2:/data/brick1 server3:/data/brick1 server4:/data/brick1
> will replicate between server1 and server2 and replicate between server3
and
> server4.
>
> Bricks added to a replica 2 volume after it's been created will require
> pairs of bricks,
>
> The best way to "force" replication to happen on another
server is to just
> define it that way.

Yup, that's understood.  The problem is when (for argument's sake)
:

* We've defined 4 hosts with 10 disks each
* Each individual disk is a brick
* Replication is defined correctly when creating the volume initially
* I'm on holidays, my employer buys a single node, configures it
brick-per-disk, and the IT junior adds it to the cluster

All good up until that final point, and then I've got that fifth node
at the end replicating to itself.  Node goes down some months later,
chaos ensues.

Not a GlusterFS/technology problem, but a problem with what frequently
happens at a human level.  As a sysadmin, these are also things I
need
to work around, even if it means deviating from best practices. :)

-Dan

------------------------------

Message: 19
Date: Tue, 10 Dec 2013 11:06:06 +0700
From: Diep Pham Van <imeo@xxxxxxxxxx>
To: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject: Re:  [CentOS 6] Upgrade to the glusterfs

version in base or in glusterfs-epel
Message-ID: <20131210110606.2e217dc6@debbox>
Content-Type: text/plain; charset=US-ASCII

On Mon, 9 Dec 2013 19:53:20 +0900
Nguyen Viet Cuong <mrcuongnv@xxxxxxxxx> wrote:

> There is no glusterfs-server in the "base" repository, just
client.
Silly me.
After install and attempt to mount with base version of glusterfs-fuse,
I realize that I have to change 'backupvolfile-server' mount option to
'backup-volfile-servers'[1].

Links:
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1023950

-- 
PHAM Van Diep

------------------------------

Message: 20
Date: Mon, 09 Dec 2013 20:44:06 -0800
From: harry mangalam <harry.mangalam@xxxxxxx>
To: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
Subject:  Where does the 'date' string in

'/var/log/glusterfs/gl.log' come from?
Message-ID: <34671480.j6DT7uby7B@stunted>
Content-Type: text/plain; charset="us-ascii"

Admittedly I should search the source, but I wonder if anyone knows this

offhand.

Background:  of our  84 ROCKS (6.1) -provisioned compute nodes,
4 have picked 
up an 'advanced date'  in the /var/log/glusterfs/gl.log file - that
date 
string is running about 5-6 hours ahead of the system date and all the
Gluster 
servers (which are identical and correct).  The time advancement does
not 
appear to be identical tho it's hard to tell since it only shows on errors
and 
those update irregularly.

All the clients are the same version and all the servers are the same (gluster

v 3.4.0-8.el6.x86_64

This would not be of interest except that those 4 clients are losing files,

unable to reliably do IO, etc on the gluster fs.  They don't appear
to be 
having problems with NFS mounts, nor with a Fraunhofer FS that is also
mounted 
on each node,

Rebooting 2 of them has no effect - they come right back with an advanced

date.

---
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
---
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131209/9cde5ba3/attachment-0001.html>

------------------------------

Message: 21
Date: Tue, 10 Dec 2013 12:49:25 +0800
From: Sharuzzaman Ahmat Raslan <sharuzzaman@xxxxxxxxx>
To: harry mangalam <harry.mangalam@xxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Where does the 'date' string in

'/var/log/glusterfs/gl.log' come from?
Message-ID:

<CAK+zuc=5SY7wuFXUe-i2nUXAhGr+Ddaahr_7TKYgMxgtWKh1zg@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"

Hi Harry,

Did you setup ntp on each of the node, and sync the time to one single
source?

Thanks.

On Tue, Dec 10, 2013 at 12:44 PM, harry mangalam <harry.mangalam@xxxxxxx>wrote:

>  Admittedly I should search the source, but I wonder if anyone
knows this
> offhand.
>
>
>
> Background: of our 84 ROCKS (6.1) -provisioned compute nodes, 4 have
> picked up an 'advanced date' in the /var/log/glusterfs/gl.log file
- that
> date string is running about 5-6 hours ahead of the system date and
all the
> Gluster servers (which are identical and correct). The time advancement
> does not appear to be identical tho it's hard to tell since it only
shows
> on errors and those update irregularly.
>
>
>
> All the clients are the same version and all the servers are the same
> (gluster v 3.4.0-8.el6.x86_64
>
>
>
> This would not be of interest except that those 4 clients are losing
> files, unable to reliably do IO, etc on the gluster fs. They don't
appear
> to be having problems with NFS mounts, nor with a Fraunhofer FS that
is
> also mounted on each node,
>
>
>
> Rebooting 2 of them has no effect - they come right back with an advanced
> date.
>
>
>
>
>
> ---
>
> Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
>
> [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
>
> 415 South Circle View Dr, Irvine, CA, 92697 [shipping]
>
> MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
>
> ---
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>

-- 
Sharuzzaman Ahmat Raslan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/d0de4ecd/attachment-0001.html>

------------------------------

Message: 22
Date: Tue, 10 Dec 2013 04:49:50 +0000
From: Bobby Jacob <bobby.jacob@xxxxxxxxxxx>
To: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject:  FW: Self Heal Issue GlusterFS 3.3.1
Message-ID:

<AC3305F9C186F849B835A3E6D3C9BEFEB5A763@xxxxxxxxxxxxxxx.local>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

Can someone please advise on this issue. ?? Urgent. Selfheal is working
every 10 minutes only. ??

Thanks & Regards,
Bobby Jacob

From: Bobby Jacob
Sent: Tuesday, December 03, 2013 8:51 AM
To: gluster-users@xxxxxxxxxxx
Subject: FW: Self Heal Issue GlusterFS 3.3.1

Just and addition: on the node where the self heal is not working when
I check /var/log/glusterd/glustershd.log, I see the following:

[2013-12-03 05:49:18.348637] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.350273] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.354813] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.355893] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.356901] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.357730] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.359136] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.360276] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.361168] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.362135] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.363569] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.364232] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.364872] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.365777] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.367383] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)
[2013-12-03 05:49:18.368075] E [afr-self-heald.c:685:_link_inode_update_loc]
0-glustervol-replicate-0: inode link failed on the inode (00000000-0000-0000-0000-000000000000)

Thanks & Regards,
Bobby Jacob

From: gluster-users-bounces@xxxxxxxxxxx [mailto:gluster-users-bounces@xxxxxxxxxxx]
On Behalf Of Bobby Jacob
Sent: Tuesday, December 03, 2013 8:48 AM
To: gluster-users@xxxxxxxxxxx
Subject:  Self Heal Issue GlusterFS 3.3.1

Hi,

I'm running glusterFS 3.3.1 on Centos 6.4.

?  Gluster volume status

Status of volume: glustervol

Gluster process                

  Port    Online  Pid

------------------------------------------------------------------------------

Brick KWTOCUATGS001:/mnt/cloudbrick          
          24009   Y      
20031

Brick KWTOCUATGS002:/mnt/cloudbrick          
          24009   Y      
1260

NFS Server on localhost              

                38467   Y
      43320

Self-heal Daemon on localhost            

 N/A     Y       43326

NFS Server on KWTOCUATGS002              
              38467   Y  
    5842

Self-heal Daemon on KWTOCUATGS002            
          N/A     Y      
5848

The self heal stops working and application write only to 1 brick and it
doesn't replicate. When I check /var/log/glusterfs/glustershd.log I see
the following.:

[2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive] 0-socket:
failed to set keep idle on socket 8
[2013-12-03 05:42:32.033646] W [socket.c:1876:socket_server_event_handler]
0-socket.glusterfsd: Failed to set keep-alive: Operation not supported
[2013-12-03 05:42:32.790473] I [client-handshake.c:1614:select_server_supported_programs]
0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437), Version
(330)
[2013-12-03 05:42:32.790840] I [client-handshake.c:1411:client_setvolume_cbk]
0-glustervol-client-1: Connected to 172.16.95.153:24009, attached to remote
volume '/mnt/cloudbrick'.
[2013-12-03 05:42:32.790884] I [client-handshake.c:1423:client_setvolume_cbk]
0-glustervol-client-1: Server and Client lk-version numbers are not same,
reopening the fds
[2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify] 0-glustervol-replicate-0:
Subvolume 'glustervol-client-1' came back up; going online.
[2013-12-03 05:42:32.791161] I [client-handshake.c:453:client_set_lk_version_cbk]
0-glustervol-client-1: Server lk version = 1
[2013-12-03 05:42:32.795103] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
0-glustervol-replicate-0: open of <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6>
failed on child glustervol-client-0 (Transport endpoint is not connected)
[2013-12-03 05:42:32.798064] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
0-glustervol-replicate-0: open of <gfid:081c6657-301a-42a4-9f95-6eeba6c67413>
failed on child glustervol-client-0 (Transport endpoint is not connected)
[2013-12-03 05:42:32.799278] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
0-glustervol-replicate-0: open of <gfid:565f1358-449c-45e2-8535-93b5632c0d1e>
failed on child glustervol-client-0 (Transport endpoint is not connected)
[2013-12-03 05:42:32.800636] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
0-glustervol-replicate-0: open of <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e>
failed on child glustervol-client-0 (Transport endpoint is not connected)
[2013-12-03 05:42:32.802223] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
0-glustervol-replicate-0: open of <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76>
failed on child glustervol-client-0 (Transport endpoint is not connected)
[2013-12-03 05:42:32.803339] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
0-glustervol-replicate-0: open of <gfid:a109c429-5885-499e-8711-09fdccd396f2>
failed on child glustervol-client-0 (Transport endpoint is not connected)
[2013-12-03 05:42:32.804308] E [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
0-glustervol-replicate-0: open of <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6>
failed on child glustervol-client-0 (Transport endpoint is not connected)
[2013-12-03 05:42:32.804877] I [client-handshake.c:1614:select_server_supported_programs]
0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437), Version
(330)
[2013-12-03 05:42:32.807517] I [client-handshake.c:1411:client_setvolume_cbk]
0-glustervol-client-0: Connected to 172.16.107.154:24009, attached to remote
volume '/mnt/cloudbrick'.
[2013-12-03 05:42:32.807562] I [client-handshake.c:1423:client_setvolume_cbk]
0-glustervol-client-0: Server and Client lk-version numbers are not same,
reopening the fds
[2013-12-03 05:42:32.810357] I [client-handshake.c:453:client_set_lk_version_cbk]
0-glustervol-client-0: Server lk version = 1
[2013-12-03 05:42:32.827437] E [afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done]
0-glustervol-replicate-0: Unable to self-heal contents of '<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>'
(possible split-brain). Please delete the file from all but the preferred
subvolume.
[2013-12-03 05:42:39.205157] E [afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
0-glustervol-replicate-0: Unable to self-heal permissions/ownership of
'<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain).
Please fix the file on all backend volumes
[2013-12-03 05:42:39.215793] E [afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
0-glustervol-replicate-0: Unable to self-heal permissions/ownership of
'<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain).
Please fix the file on all backend volumes

PLEASE ADVICE.

Thanks & Regards,
Bobby Jacob

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/8fa935eb/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ATT00001.txt
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/8fa935eb/attachment-0001.txt>

------------------------------

Message: 23
Date: Mon, 09 Dec 2013 20:59:21 -0800
From: Joe Julian <joe@xxxxxxxxxxxxxxxx>
To: Bobby Jacob <bobby.jacob@xxxxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Self Heal Issue GlusterFS 3.3.1
Message-ID: <1386651561.2455.12.camel@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset="UTF-8"

On Tue, 2013-12-03 at 05:47 +0000, Bobby Jacob wrote:
> Hi,
> 
>  
> 
> I?m running glusterFS 3.3.1 on Centos 6.4. 
> 
> ? Gluster volume status
> 
>  
> 
> Status of volume: glustervol
> 
> Gluster process                

  Port    Online
> Pid
> 
> ------------------------------------------------------------------------------
> 
> Brick KWTOCUATGS001:/mnt/cloudbrick          
          24009   Y
> 20031
> 
> Brick KWTOCUATGS002:/mnt/cloudbrick          
          24009   Y
> 1260
> 
> NFS Server on localhost
>                    
  38467   Y       43320
> 
> Self-heal Daemon on localhost            

 N/A
> Y       43326
> 
> NFS Server on KWTOCUATGS002            
                38467   Y
> 5842
> 
> Self-heal Daemon on KWTOCUATGS002          
            N/A     Y
> 5848
> 
>  
> 
> The self heal stops working and application write only to 1 brick
and
> it doesn?t replicate. When I check /var/log/glusterfs/glustershd.log
I
> see the following.:
> 
>  
> 
> [2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive]
> 0-socket: failed to set keep idle on socket 8
> 
> [2013-12-03 05:42:32.033646] W
> [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd:
> Failed to set keep-alive: Operation not supported
> 
> [2013-12-03 05:42:32.790473] I
> [client-handshake.c:1614:select_server_supported_programs]
> 0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437),
> Version (330)
> 
> [2013-12-03 05:42:32.790840] I
> [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-1:
> Connected to 172.16.95.153:24009, attached to remote volume
> '/mnt/cloudbrick'.
> 
> [2013-12-03 05:42:32.790884] I
> [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-1:
> Server and Client lk-version numbers are not same, reopening the fds
> 
> [2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify]
> 0-glustervol-replicate-0: Subvolume 'glustervol-client-1' came back
> up; going online.
> 
> [2013-12-03 05:42:32.791161] I
> [client-handshake.c:453:client_set_lk_version_cbk]
> 0-glustervol-client-1: Server lk version = 1
> 
> [2013-12-03 05:42:32.795103] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.798064] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.799278] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.800636] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.802223] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.803339] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:a109c429-5885-499e-8711-09fdccd396f2> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.804308] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.804877] I
> [client-handshake.c:1614:select_server_supported_programs]
> 0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437),
> Version (330)
> 
> [2013-12-03 05:42:32.807517] I
> [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-0:
> Connected to 172.16.107.154:24009, attached to remote volume
> '/mnt/cloudbrick'.
> 
> [2013-12-03 05:42:32.807562] I
> [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-0:
> Server and Client lk-version numbers are not same, reopening the fds
> 
> [2013-12-03 05:42:32.810357] I
> [client-handshake.c:453:client_set_lk_version_cbk]
> 0-glustervol-client-0: Server lk version = 1
> 
> [2013-12-03 05:42:32.827437] E
> [afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done]
> 0-glustervol-replicate-0: Unable to self-heal contents of
> '<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>' (possible split-brain).
> Please delete the file from all but the preferred subvolume.

That file is at
$brick/.glusterfs/12/62/1262d40d-46a3-4e57-b07b-0fcc972c8403

Try picking one to remove like it says.
> 
> [2013-12-03 05:42:39.205157] E
> [afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
> 0-glustervol-replicate-0: Unable to self-heal permissions/ownership
of
> '<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain).
> Please fix the file on all backend volumes
> 
> [2013-12-03 05:42:39.215793] E
> [afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
> 0-glustervol-replicate-0: Unable to self-heal permissions/ownership
of
> '<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain).
> Please fix the file on all backend volumes
> 
>  
If that doesn't allow it to heal, you may need to find which filename
that's hardlinked to. ls -li the gfid file at the path I demonstrated
earlier. With that inode number in hand, find $brick -inum $inode_number
Once you know which filenames it's linked with, remove all linked copies
from all but one replica. Then the self-heal can continue successfully.

------------------------------

Message: 24
Date: Tue, 10 Dec 2013 13:09:38 +0800
From: Franco Broi <franco.broi@xxxxxxxxxx>
To: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject:  Pausing rebalance
Message-ID: <1386652178.1682.110.camel@tc1>
Content-Type: text/plain; charset="UTF-8"

Before attempting a rebalance on my existing distributed Gluster volume
I thought I'd do some testing with my new storage. I created a volume
consisting of 4 bricks on the same server and wrote some data to it. I
then added a new brick from a another server. I ran the fix-layout and
wrote some new files and could see them on the new brick. All good so
far, so I started the data rebalance. After it had been running for a
while I wanted to add another brick, which I obviously couldn't do while
it was running so I stopped it. Even with it stopped It wouldn't let me
add a brick so I tried restarting it, but it wouldn't let me do that
either. I presume you just reissue the start command as there's no
restart?

[root@nas3 ~]# gluster vol rebalance test-volume status

               Node Rebalanced-files
         size       scanned  
   failures       skipped      
  status run time in secs
---------      -----------   -----------   -----------
  -----------   -----------   ------------   --------------
localhost                7  
    611.7GB          1358    
        0            10
       stopped          4929.00
localhost                7  
    611.7GB          1358    
        0            10
       stopped          4929.00
 nas4-10g                0  
     0Bytes          1506  
          0          
  0      completed          
  8.00
volume rebalance: test-volume: success: 
[root@nas3 ~]# gluster vol add-brick test-volume nas4-10g:/data14/gvol
volume add-brick: failed: Volume name test-volume rebalance is in progress.
Please retry after completion
[root@nas3 ~]# gluster vol rebalance test-volume start
volume rebalance: test-volume: failed: Rebalance on test-volume is already
started

In the end I used the force option to make it start but was that the
right thing to do?

glusterfs 3.4.1 built on Oct 28 2013 11:01:59
Volume Name: test-volume
Type: Distribute
Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066
Status: Started
Number of Bricks: 5
Transport-type: tcp
Bricks:
Brick1: nas3-10g:/data9/gvol
Brick2: nas3-10g:/data10/gvol
Brick3: nas3-10g:/data11/gvol
Brick4: nas3-10g:/data12/gvol
Brick5: nas4-10g:/data13/gvol

------------------------------

Message: 25
Date: Tue, 10 Dec 2013 10:42:28 +0530
From: Vijay Bellur <vbellur@xxxxxxxxxx>
To: harry mangalam <harry.mangalam@xxxxxxx>,

"gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Where does the 'date' string in

'/var/log/glusterfs/gl.log' come from?
Message-ID: <52A6A2BC.7010501@xxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 12/10/2013 10:14 AM, harry mangalam wrote:
> Admittedly I should search the source, but I wonder if anyone knows
this
> offhand.
>
> Background: of our 84 ROCKS (6.1) -provisioned compute nodes, 4 have
> picked up an 'advanced date' in the /var/log/glusterfs/gl.log file
-
> that date string is running about 5-6 hours ahead of the system date
and
> all the Gluster servers (which are identical and correct). The time
> advancement does not appear to be identical tho it's hard to tell
since
> it only shows on errors and those update irregularly.

The timestamps in the log file are by default in UTC. That could 
possibly explain why the timestamps look advanced in the log file.

>
> All the clients are the same version and all the servers are the same
> (gluster v 3.4.0-8.el6.x86_64
>
> This would not be of interest except that those 4 clients are losing
> files, unable to reliably do IO, etc on the gluster fs. They don't
> appear to be having problems with NFS mounts, nor with a Fraunhofer
FS
> that is also mounted on each node,

Do you observe anything in the client log files of these machines that

indicate I/O problems?

Thanks,
Vijay

------------------------------

Message: 26
Date: Tue, 10 Dec 2013 10:56:52 +0530
From: shishir gowda <gowda.shishir@xxxxxxxxx>
To: Franco Broi <franco.broi@xxxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Pausing rebalance
Message-ID:

<CAMYy+hVgyiPMYiDtkKtA1EBbbcpJAyp3O1_1=oAqKq1dc4NN+g@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"

Hi Franco,

If a file is under migration, and a rebalance stop is encountered, then
rebalance process exits only after the completion of the migration.
That might be one of the reasons why you saw rebalance in progress message
while trying to add the brick

Could you please share the average file size in your setup?

You could always check the rebalance status command to ensure rebalance
has
indeed completed/stopped before proceeding with the add-brick. Using
add-brick force while rebalance is on-going should not be used in normal
scenarios. I do see that in your case, they show stopped/completed.
Glusterd logs would help in triaging the issue.

Rebalance re-writes layouts, and migrates data. While this is happening,
if
a add-brick is done, then the cluster might go into a imbalanced stated.
Hence, the check if rebalance is in progress while doing add-brick

With regards,
Shishir

On 10 December 2013 10:39, Franco Broi <franco.broi@xxxxxxxxxx> wrote:

>
> Before attempting a rebalance on my existing distributed Gluster volume
> I thought I'd do some testing with my new storage. I created a volume
> consisting of 4 bricks on the same server and wrote some data to it.
I
> then added a new brick from a another server. I ran the fix-layout
and
> wrote some new files and could see them on the new brick. All good
so
> far, so I started the data rebalance. After it had been running for
a
> while I wanted to add another brick, which I obviously couldn't do
while
> it was running so I stopped it. Even with it stopped It wouldn't let
me
> add a brick so I tried restarting it, but it wouldn't let me do that
> either. I presume you just reissue the start command as there's no
> restart?
>
> [root@nas3 ~]# gluster vol rebalance test-volume status
>                    
                Node Rebalanced-files
         size
>     scanned      failures      
skipped         status run time in secs
> ---------      -----------   -----------  
-----------   -----------
> -----------   ------------   --------------
> localhost                7
      611.7GB          1358  
          0
>      10        stopped    
     4929.00
> localhost                7
      611.7GB          1358  
          0
>      10        stopped    
     4929.00
>  nas4-10g                0
       0Bytes          1506
            0
>       0      completed    
        8.00
> volume rebalance: test-volume: success:
> [root@nas3 ~]# gluster vol add-brick test-volume nas4-10g:/data14/gvol
> volume add-brick: failed: Volume name test-volume rebalance is in
> progress. Please retry after completion
> [root@nas3 ~]# gluster vol rebalance test-volume start
> volume rebalance: test-volume: failed: Rebalance on test-volume is
already
> started
>
> In the end I used the force option to make it start but was that the
> right thing to do?
>
> glusterfs 3.4.1 built on Oct 28 2013 11:01:59
> Volume Name: test-volume
> Type: Distribute
> Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066
> Status: Started
> Number of Bricks: 5
> Transport-type: tcp
> Bricks:
> Brick1: nas3-10g:/data9/gvol
> Brick2: nas3-10g:/data10/gvol
> Brick3: nas3-10g:/data11/gvol
> Brick4: nas3-10g:/data12/gvol
> Brick5: nas4-10g:/data13/gvol
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/1944e9e8/attachment-0001.html>

------------------------------

Message: 27
Date: Tue, 10 Dec 2013 11:02:52 +0530
From: Vijay Bellur <vbellur@xxxxxxxxxx>
To: Alex Pearson <alex@xxxxxxxxxxx>
Cc: gluster-users Discussion List <Gluster-users@xxxxxxxxxxx>
Subject: Re: [Gluster-users] replace-brick failing -

transport.address-family not specified
Message-ID: <52A6A784.6070404@xxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 12/08/2013 05:44 PM, Alex Pearson wrote:
> Hi All,
> Just to assist anyone else having this issue, and so people can correct
me if I'm wrong...
>
> It would appear that replace-brick is 'horribly broken' and should
not be used in Gluster 3.4.  Instead a combination of "remove-brick
... count X ... start" should be used to remove the resilience from
a volume and the brick, then "add-brick ... count X" to add the
new brick.
>
> This does beg the question of why the hell a completely broken command
was left in the 'stable' release of the software.  This sort of thing
really hurts Glusters credibility.

A mention of replace-brick not being functional was made in the release

note for 3.4.0:

https://github.com/gluster/glusterfs/blob/release-3.4/doc/release-notes/3.4.0.md

>
> Ref: http://www.gluster.org/pipermail/gluster-users/2013-August/036936.html

This discussion happened after the release of GlusterFS 3.4. However, I

do get the point you are trying to make here. We can have an explicit 
warning in CLI when operations considered broken are attempted. There is

a similar plan to add a warning for rdma volumes:

https://bugzilla.redhat.com/show_bug.cgi?id=1017176

There is a patch under review currently to remove the replace-brick 
command from CLI:

http://review.gluster.org/6031

This is intended for master. If you can open a bug report indicating an

appropriate warning message that you would like to see when 
replace-brick is attempted, I would be happy to get such a fix in to 
both 3.4 and 3.5.

Thanks,
Vijay

>
> Cheers
>
> Alex
>
> ----- Original Message -----
> From: "Alex Pearson" <alex@xxxxxxxxxxx>
> To: gluster-users@xxxxxxxxxxx
> Sent: Friday, 6 December, 2013 5:25:43 PM
> Subject: [Gluster-users] replace-brick failing - transport.address-family

not specified
>
> Hello,
> I have what I think is a fairly basic Gluster setup, however when
I try to carry out a replace-brick operation it consistently fails...
>
> Here are the command line options:
>
> root@osh1:~# gluster volume info media
>
> Volume Name: media
> Type: Replicate
> Volume ID: 4c290928-ba1c-4a45-ac05-85365b4ea63a
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: osh1.apics.co.uk:/export/sdc/media
> Brick2: osh2.apics.co.uk:/export/sdb/media
>
> root@osh1:~# gluster volume replace-brick media osh1.apics.co.uk:/export/sdc/media
osh1.apics.co.uk:/export/WCASJ2055681/media start
> volume replace-brick: success: replace-brick started successfully
> ID: 60bef96f-a5c7-4065-864e-3e0b2773d7bb
> root@osh1:~# gluster volume replace-brick media osh1.apics.co.uk:/export/sdc/media
osh1.apics.co.uk:/export/WCASJ2055681/media status
> volume replace-brick: failed: Commit failed on localhost. Please check
the log file for more details.
>
> root@osh1:~# tail /var/log/glusterfs/bricks/export-sdc-media.log
> [2013-12-06 17:24:54.795754] E [name.c:147:client_fill_address_family]
0-media-replace-brick: transport.address-family not specified. Could not
guess default value from (remote-host:(null) or transport.unix.connect-path:(null))
options
> [2013-12-06 17:24:57.796422] W [dict.c:1055:data_to_str] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b)
[0x7fb826e3428b] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
[0x7fb826e3a25e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x200)
[0x7fb826e39f50]))) 0-dict: data is NULL
> [2013-12-06 17:24:57.796494] W [dict.c:1055:data_to_str] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b)
[0x7fb826e3428b] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
[0x7fb826e3a25e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x20b)
[0x7fb826e39f5b]))) 0-dict: data is NULL
> [2013-12-06 17:24:57.796519] E [name.c:147:client_fill_address_family]
0-media-replace-brick: transport.address-family not specified. Could not
guess default value from (remote-host:(null) or transport.unix.connect-path:(null))
options
> [2013-12-06 17:25:00.797153] W [dict.c:1055:data_to_str] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b)
[0x7fb826e3428b] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
[0x7fb826e3a25e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x200)
[0x7fb826e39f50]))) 0-dict: data is NULL
> [2013-12-06 17:25:00.797226] W [dict.c:1055:data_to_str] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b)
[0x7fb826e3428b] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
[0x7fb826e3a25e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x20b)
[0x7fb826e39f5b]))) 0-dict: data is NULL
> [2013-12-06 17:25:00.797251] E [name.c:147:client_fill_address_family]
0-media-replace-brick: transport.address-family not specified. Could not
guess default value from (remote-host:(null) or transport.unix.connect-path:(null))
options
> [2013-12-06 17:25:03.797811] W [dict.c:1055:data_to_str] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b)
[0x7fb826e3428b] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
[0x7fb826e3a25e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x200)
[0x7fb826e39f50]))) 0-dict: data is NULL
> [2013-12-06 17:25:03.797883] W [dict.c:1055:data_to_str] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b)
[0x7fb826e3428b] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e)
[0x7fb826e3a25e] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x20b)
[0x7fb826e39f5b]))) 0-dict: data is NULL
> [2013-12-06 17:25:03.797909] E [name.c:147:client_fill_address_family]
0-media-replace-brick: transport.address-family not specified. Could not
guess default value from (remote-host:(null) or transport.unix.connect-path:(null))
options
>
>
> I've tried placing the transport.address-family option in various
places, however it hasn't helped.
>
> Any help would be very much appreciated.
>
> Thanks in advance
>
> Alex
>

------------------------------

Message: 28
Date: Tue, 10 Dec 2013 11:04:49 +0530
From: Vijay Bellur <vbellur@xxxxxxxxxx>
To: Diep Pham Van <imeo@xxxxxxxxxx>,        
        "gluster-users@xxxxxxxxxxx"

<gluster-users@xxxxxxxxxxx>
Subject: Re:  [CentOS 6] Upgrade to the glusterfs

version in base or in glusterfs-epel
Message-ID: <52A6A7F9.2090009@xxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 12/10/2013 09:36 AM, Diep Pham Van wrote:
> On Mon, 9 Dec 2013 19:53:20 +0900
> Nguyen Viet Cuong <mrcuongnv@xxxxxxxxx> wrote:
>
>> There is no glusterfs-server in the "base" repository,
just client.
> Silly me.
> After install and attempt to mount with base version of glusterfs-fuse,
> I realize that I have to change 'backupvolfile-server' mount option
to
> 'backup-volfile-servers'[1].

And a patch to provide backward compatibility for 'backupvolfile-server'

is available now [1].

-Vijay

[1] http://review.gluster.org/6464

>
> Links:
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1023950
>

------------------------------

Message: 29
Date: Tue, 10 Dec 2013 13:39:38 +0800
From: Franco Broi <franco.broi@xxxxxxxxxx>
To: shishir gowda <gowda.shishir@xxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject: Re: [Gluster-users] Pausing rebalance
Message-ID: <1386653978.1682.125.camel@tc1>
Content-Type: text/plain; charset="utf-8"

On Tue, 2013-12-10 at 10:56 +0530, shishir gowda wrote: 
> Hi Franco,
> 
> 
> If a file is under migration, and a rebalance stop is encountered,
> then rebalance process exits only after the completion of the
> migration.
> 
> That might be one of the reasons why you saw rebalance in progress
> message while trying to add the brick

The status said it was stopped. I didn't do a top on the machine but are
you saying that it was still rebalancing despite saying it had stopped?

> 
> Could you please share the average file size in your setup?
> 

Bit hard to say, I just copied some data from our main processing
system. The sizes range from very small to 10's of gigabytes.

> 
> You could always check the rebalance status command to ensure
> rebalance has indeed completed/stopped before proceeding with the
> add-brick. Using add-brick force while rebalance is on-going should
> not be used in normal scenarios. I do see that in your case, they
show
> stopped/completed. Glusterd logs would help in triaging the issue.

See attached.

> 
> 
> Rebalance re-writes layouts, and migrates data. While this is
> happening, if a add-brick is done, then the cluster might go into
a
> imbalanced stated. Hence, the check if rebalance is in progress while
> doing add-brick

I can see that but as far as I could tell, the rebalance had stopped
according to the status.

Just to be clear, what command restarts the rebalancing?

> 
> 
> With regards,
> Shishir
> 
> 
> 
> On 10 December 2013 10:39, Franco Broi <franco.broi@xxxxxxxxxx>
wrote:
>         
>         Before attempting a rebalance on my existing
distributed
>         Gluster volume
>         I thought I'd do some testing with my
new storage. I created a
>         volume
>         consisting of 4 bricks on the same server
and wrote some data
>         to it. I
>         then added a new brick from a another
server. I ran the
>         fix-layout and
>         wrote some new files and could see them
on the new brick. All
>         good so
>         far, so I started the data rebalance.
After it had been
>         running for a
>         while I wanted to add another brick, which
I obviously
>         couldn't do while
>         it was running so I stopped it. Even with
it stopped It
>         wouldn't let me
>         add a brick so I tried restarting it,
but it wouldn't let me
>         do that
>         either. I presume you just reissue the
start command as
>         there's no
>         restart?
>         
>         [root@nas3 ~]# gluster vol rebalance test-volume
status
>                    

  Node Rebalanced-files
>              size    
  scanned      failures       skipped
>         status run time in secs
>         ---------      -----------
  -----------   -----------
>         -----------   -----------  
------------   --------------
>         localhost        
       7       611.7GB    
     1358
>         0            10
       stopped          4929.00
>         localhost        
       7       611.7GB    
     1358
>         0            10
       stopped          4929.00
>          nas4-10g        
       0        0Bytes    
     1506
>         0            
0      completed            
8.00
>         volume rebalance: test-volume: success:
>         [root@nas3 ~]# gluster vol add-brick test-volume
>         nas4-10g:/data14/gvol
>         volume add-brick: failed: Volume name
test-volume rebalance is
>         in progress. Please retry after completion
>         [root@nas3 ~]# gluster vol rebalance test-volume
start
>         volume rebalance: test-volume: failed:
Rebalance on
>         test-volume is already started
>         
>         In the end I used the force option to
make it start but was
>         that the
>         right thing to do?
>         
>         glusterfs 3.4.1 built on Oct 28 2013 11:01:59
>         Volume Name: test-volume
>         Type: Distribute
>         Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066
>         Status: Started
>         Number of Bricks: 5
>         Transport-type: tcp
>         Bricks:
>         Brick1: nas3-10g:/data9/gvol
>         Brick2: nas3-10g:/data10/gvol
>         Brick3: nas3-10g:/data11/gvol
>         Brick4: nas3-10g:/data12/gvol
>         Brick5: nas4-10g:/data13/gvol
>         
>         
>         _______________________________________________
>         Gluster-users mailing list
>         Gluster-users@xxxxxxxxxxx
>         http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: etc-glusterfs-glusterd.vol.log.gz
Type: application/gzip
Size: 7209 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/adc5d486/attachment-0001.bin>

------------------------------

Message: 30
Date: Tue, 10 Dec 2013 11:09:47 +0530
From: Vijay Bellur <vbellur@xxxxxxxxxx>
To: Nguyen Viet Cuong <mrcuongnv@xxxxxxxxx>
Cc: "Gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
Subject: Re:  replace-brick failing -

transport.address-family not specified
Message-ID: <52A6A923.4030208@xxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 12/08/2013 07:06 PM, Nguyen Viet Cuong wrote:
> Thanks for sharing.
>
> Btw, I do believe that GlusterFS 3.2.x is much more stable than 3.4.x
in
> production.
>

This is quite contrary to what we have seen in the community. From a 
development perspective too, we feel much better about 3.4.1. Are there

specific instances that worked well with 3.2.x which does not work fine

for you in 3.4.x?

Cheers,
Vijay

------------------------------

Message: 31
Date: Tue, 10 Dec 2013 11:30:21 +0530
From: Kaushal M <kshlmster@xxxxxxxxx>
To: Franco Broi <franco.broi@xxxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Pausing rebalance
Message-ID:

<CAOujamU0J4Tam9ojFAmCoPqSzd5Tm1FeyfMYEBv2znMX9yN=4A@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Dec 10, 2013 at 11:09 AM, Franco Broi <franco.broi@xxxxxxxxxx>
wrote:
> On Tue, 2013-12-10 at 10:56 +0530, shishir gowda wrote:
>> Hi Franco,
>>
>>
>> If a file is under migration, and a rebalance stop is encountered,
>> then rebalance process exits only after the completion of the
>> migration.
>>
>> That might be one of the reasons why you saw rebalance in progress
>> message while trying to add the brick
>
> The status said it was stopped. I didn't do a top on the machine but
are
> you saying that it was still rebalancing despite saying it had stopped?
>

The 'stopped' status is a little bit misleading. The rebalance process
could have been migrating a large file when the stop command was
issued, so the process would continue migrating that file and quit
once it finished. In this time period, though the status says
'stopped' the rebalance process is actually running, which prevents
other operations from happening. Ideally, we would have a 'stopping'
status which would convey the correct meaning. But for now we can only
verify that a rebalance process has actually stopped by monitoring the
actual rebalance process. The rebalance process is a 'glusterfs'
process with some arguments containing rebalance.

>>
>> Could you please share the average file size in your setup?
>>
>
> Bit hard to say, I just copied some data from our main processing
> system. The sizes range from very small to 10's of gigabytes.
>
>>
>> You could always check the rebalance status command to ensure
>> rebalance has indeed completed/stopped before proceeding with
the
>> add-brick. Using add-brick force while rebalance is on-going should
>> not be used in normal scenarios. I do see that in your case, they
show
>> stopped/completed. Glusterd logs would help in triaging the issue.
>
> See attached.
>
>>
>>
>> Rebalance re-writes layouts, and migrates data. While this is
>> happening, if a add-brick is done, then the cluster might go into
a
>> imbalanced stated. Hence, the check if rebalance is in progress
while
>> doing add-brick
>
> I can see that but as far as I could tell, the rebalance had stopped
> according to the status.
>
> Just to be clear, what command restarts the rebalancing?
>
>>
>>
>> With regards,
>> Shishir
>>
>>
>>
>> On 10 December 2013 10:39, Franco Broi <franco.broi@xxxxxxxxxx>
wrote:
>>
>>         Before attempting a rebalance on my
existing distributed
>>         Gluster volume
>>         I thought I'd do some testing with
my new storage. I created a
>>         volume
>>         consisting of 4 bricks on the same
server and wrote some data
>>         to it. I
>>         then added a new brick from a another
server. I ran the
>>         fix-layout and
>>         wrote some new files and could see
them on the new brick. All
>>         good so
>>         far, so I started the data rebalance.
After it had been
>>         running for a
>>         while I wanted to add another brick,
which I obviously
>>         couldn't do while
>>         it was running so I stopped it. Even
with it stopped It
>>         wouldn't let me
>>         add a brick so I tried restarting
it, but it wouldn't let me
>>         do that
>>         either. I presume you just reissue
the start command as
>>         there's no
>>         restart?
>>
>>         [root@nas3 ~]# gluster vol rebalance
test-volume status
>>                  

    Node Rebalanced-files
>>              size    
  scanned      failures       skipped
>>         status run time in secs
>>         ---------      -----------
  -----------   -----------
>>         -----------   -----------  
------------   --------------
>>         localhost        
       7       611.7GB    
     1358
>>         0          
 10        stopped        
 4929.00
>>         localhost        
       7       611.7GB    
     1358
>>         0          
 10        stopped        
 4929.00
>>          nas4-10g      
         0        0Bytes  
       1506
>>         0          
  0      completed          
  8.00
>>         volume rebalance: test-volume: success:
>>         [root@nas3 ~]# gluster vol add-brick
test-volume
>>         nas4-10g:/data14/gvol
>>         volume add-brick: failed: Volume name
test-volume rebalance is
>>         in progress. Please retry after completion
>>         [root@nas3 ~]# gluster vol rebalance
test-volume start
>>         volume rebalance: test-volume: failed:
Rebalance on
>>         test-volume is already started
>>
>>         In the end I used the force option
to make it start but was
>>         that the
>>         right thing to do?
>>
>>         glusterfs 3.4.1 built on Oct 28 2013
11:01:59
>>         Volume Name: test-volume
>>         Type: Distribute
>>         Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066
>>         Status: Started
>>         Number of Bricks: 5
>>         Transport-type: tcp
>>         Bricks:
>>         Brick1: nas3-10g:/data9/gvol
>>         Brick2: nas3-10g:/data10/gvol
>>         Brick3: nas3-10g:/data11/gvol
>>         Brick4: nas3-10g:/data12/gvol
>>         Brick5: nas4-10g:/data13/gvol
>>
>>
>>         _______________________________________________
>>         Gluster-users mailing list
>>         Gluster-users@xxxxxxxxxxx
>>         http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

------------------------------

Message: 32
Date: Tue, 10 Dec 2013 14:32:46 +0800
From: Franco Broi <franco.broi@xxxxxxxxxx>
To: Kaushal M <kshlmster@xxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Pausing rebalance
Message-ID: <1386657166.1682.130.camel@tc1>
Content-Type: text/plain; charset="UTF-8"

Thanks for clearing that up. I had to wait about 30 minutes for all
rebalancing activity to cease, then I was able to add a new brick.

What does it use to migrate the files? The copy rate was pretty slow
considering both bricks were on the same server, I only saw about
200MB/Sec. Each brick is a 16 disk ZFS raidz2, copying with dd I can get
well over 500MB/Sec.

On Tue, 2013-12-10 at 11:30 +0530, Kaushal M wrote: 
> On Tue, Dec 10, 2013 at 11:09 AM, Franco Broi <franco.broi@xxxxxxxxxx>
wrote:
> > On Tue, 2013-12-10 at 10:56 +0530, shishir gowda wrote:
> >> Hi Franco,
> >>
> >>
> >> If a file is under migration, and a rebalance stop is encountered,
> >> then rebalance process exits only after the completion of
the
> >> migration.
> >>
> >> That might be one of the reasons why you saw rebalance in
progress
> >> message while trying to add the brick
> >
> > The status said it was stopped. I didn't do a top on the machine
but are
> > you saying that it was still rebalancing despite saying it had
stopped?
> >
> 
> The 'stopped' status is a little bit misleading. The rebalance process
> could have been migrating a large file when the stop command was
> issued, so the process would continue migrating that file and quit
> once it finished. In this time period, though the status says
> 'stopped' the rebalance process is actually running, which prevents
> other operations from happening. Ideally, we would have a 'stopping'
> status which would convey the correct meaning. But for now we can
only
> verify that a rebalance process has actually stopped by monitoring
the
> actual rebalance process. The rebalance process is a 'glusterfs'
> process with some arguments containing rebalance.
> 
> >>
> >> Could you please share the average file size in your setup?
> >>
> >
> > Bit hard to say, I just copied some data from our main processing
> > system. The sizes range from very small to 10's of gigabytes.
> >
> >>
> >> You could always check the rebalance status command to ensure
> >> rebalance has indeed completed/stopped before proceeding
with the
> >> add-brick. Using add-brick force while rebalance is on-going
should
> >> not be used in normal scenarios. I do see that in your case,
they show
> >> stopped/completed. Glusterd logs would help in triaging the
issue.
> >
> > See attached.
> >
> >>
> >>
> >> Rebalance re-writes layouts, and migrates data. While this
is
> >> happening, if a add-brick is done, then the cluster might
go into a
> >> imbalanced stated. Hence, the check if rebalance is in progress
while
> >> doing add-brick
> >
> > I can see that but as far as I could tell, the rebalance had
stopped
> > according to the status.
> >
> > Just to be clear, what command restarts the rebalancing?
> >
> >>
> >>
> >> With regards,
> >> Shishir
> >>
> >>
> >>
> >> On 10 December 2013 10:39, Franco Broi <franco.broi@xxxxxxxxxx>
wrote:
> >>
> >>         Before attempting a rebalance
on my existing distributed
> >>         Gluster volume
> >>         I thought I'd do some testing
with my new storage. I created a
> >>         volume
> >>         consisting of 4 bricks on the
same server and wrote some data
> >>         to it. I
> >>         then added a new brick from a
another server. I ran the
> >>         fix-layout and
> >>         wrote some new files and could
see them on the new brick. All
> >>         good so
> >>         far, so I started the data rebalance.
After it had been
> >>         running for a
> >>         while I wanted to add another
brick, which I obviously
> >>         couldn't do while
> >>         it was running so I stopped it.
Even with it stopped It
> >>         wouldn't let me
> >>         add a brick so I tried restarting
it, but it wouldn't let me
> >>         do that
> >>         either. I presume you just reissue
the start command as
> >>         there's no
> >>         restart?
> >>
> >>         [root@nas3 ~]# gluster vol rebalance
test-volume status
> >>                  

    Node Rebalanced-files
> >>              size  
    scanned      failures      
skipped
> >>         status run time in secs
> >>         ---------      -----------
  -----------   -----------
> >>         -----------   -----------
  ------------   --------------
> >>         localhost      
         7       611.7GB  
       1358
> >>         0        
   10        stopped      
   4929.00
> >>         localhost      
         7       611.7GB  
       1358
> >>         0        
   10        stopped      
   4929.00
> >>          nas4-10g    
           0        0Bytes
         1506
> >>         0        
    0      completed        
    8.00
> >>         volume rebalance: test-volume:
success:
> >>         [root@nas3 ~]# gluster vol add-brick
test-volume
> >>         nas4-10g:/data14/gvol
> >>         volume add-brick: failed: Volume
name test-volume rebalance is
> >>         in progress. Please retry after
completion
> >>         [root@nas3 ~]# gluster vol rebalance
test-volume start
> >>         volume rebalance: test-volume:
failed: Rebalance on
> >>         test-volume is already started
> >>
> >>         In the end I used the force option
to make it start but was
> >>         that the
> >>         right thing to do?
> >>
> >>         glusterfs 3.4.1 built on Oct
28 2013 11:01:59
> >>         Volume Name: test-volume
> >>         Type: Distribute
> >>         Volume ID: 56ee0173-aed1-4be6-a809-ee0544f9e066
> >>         Status: Started
> >>         Number of Bricks: 5
> >>         Transport-type: tcp
> >>         Bricks:
> >>         Brick1: nas3-10g:/data9/gvol
> >>         Brick2: nas3-10g:/data10/gvol
> >>         Brick3: nas3-10g:/data11/gvol
> >>         Brick4: nas3-10g:/data12/gvol
> >>         Brick5: nas4-10g:/data13/gvol
> >>
> >>
> >>         _______________________________________________
> >>         Gluster-users mailing list
> >>         Gluster-users@xxxxxxxxxxx
> >>         http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>
> >>
> >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users@xxxxxxxxxxx
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users

------------------------------

Message: 33
Date: Tue, 10 Dec 2013 07:42:57 +0000
From: Bobby Jacob <bobby.jacob@xxxxxxxxxxx>
To: Joe Julian <joe@xxxxxxxxxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Self Heal Issue GlusterFS 3.3.1
Message-ID:

<AC3305F9C186F849B835A3E6D3C9BEFEB5A841@xxxxxxxxxxxxxxx.local>
Content-Type: text/plain; charset="utf-8"

Hi,

Thanks Joe, the split brain files have been removed as you recommended.
How can we deal with this situation as there is no document which solves
such issues. ?

[root@KWTOCUATGS001 83]# gluster volume heal glustervol info
Gathering Heal info on volume glustervol has been successful

Brick KWTOCUATGS001:/mnt/cloudbrick
Number of entries: 14
/Tommy Kolega
<gfid:10429dd5-180c-432e-aa4a-8b1624b86f4b>
<gfid:7883309e-8764-4cf6-82a6-d8d81cb60dd7>
<gfid:3e3d77d6-2818-4766-ae3b-4f582118321b>
<gfid:8bd03482-025c-4c09-8704-60be9ddfdfd8>
<gfid:2685e11a-4eb9-4a92-883e-faa50edfa172>
<gfid:24d83cbd-e621-4330-b0c1-ae1f0fd2580d>
<gfid:197e50fa-bfc0-4651-acaa-1f3d2d73936f>
<gfid:3e094ee9-c9cf-4010-82f4-6d18c1ab9ca0>
<gfid:77783245-4e03-4baf-8cb4-928a57b266cb>
<gfid:70340eaa-7967-41d0-855f-36add745f16f>
<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>
<gfid:b1651457-175a-43ec-b476-d91ae8b52b0b>
/Tommy Kolega/lucene_index

Brick KWTOCUATGS002:/mnt/cloudbrick
Number of entries: 15
<gfid:7883309e-8764-4cf6-82a6-d8d81cb60dd7>
<gfid:0454d0d2-d432-4ac8-8476-02a8522e4a6a>
<gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6>
<gfid:00389876-700f-4351-b00e-1c57496eed89>
<gfid:0cd48d89-1dd2-47f6-9311-58224b19446e>
<gfid:081c6657-301a-42a4-9f95-6eeba6c67413>
<gfid:565f1358-449c-45e2-8535-93b5632c0d1e>
<gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e>
<gfid:25fd406f-63e0-4037-bb01-da282cbe4d76>
<gfid:a109c429-5885-499e-8711-09fdccd396f2>
<gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6>
/Tommy Kolega
/Tommy Kolega/lucene_index
<gfid:c49e9d76-e5d4-47dc-9cf1-3f858f6d07ea>
<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>

Thanks & Regards,
Bobby Jacob

-----Original Message-----
From: Joe Julian [mailto:joe@xxxxxxxxxxxxxxxx]

Sent: Tuesday, December 10, 2013 7:59 AM
To: Bobby Jacob
Cc: gluster-users@xxxxxxxxxxx
Subject: Re:  Self Heal Issue GlusterFS 3.3.1

On Tue, 2013-12-03 at 05:47 +0000, Bobby Jacob wrote:
> Hi,
> 
>  
> 
> I?m running glusterFS 3.3.1 on Centos 6.4. 
> 
> ? Gluster volume status
> 
>  
> 
> Status of volume: glustervol
> 
> Gluster process                

  Port    Online
> Pid
> 
> ----------------------------------------------------------------------
> --------
> 
> Brick KWTOCUATGS001:/mnt/cloudbrick          
          24009   Y
> 20031
> 
> Brick KWTOCUATGS002:/mnt/cloudbrick          
          24009   Y
> 1260
> 
> NFS Server on localhost
>                    
  38467   Y       43320
> 
> Self-heal Daemon on localhost            

 N/A
> Y       43326
> 
> NFS Server on KWTOCUATGS002            
                38467   Y
> 5842
> 
> Self-heal Daemon on KWTOCUATGS002          
            N/A     Y
> 5848
> 
>  
> 
> The self heal stops working and application write only to 1 brick
and 
> it doesn?t replicate. When I check /var/log/glusterfs/glustershd.log
I 
> see the following.:
> 
>  
> 
> [2013-12-03 05:42:32.033563] W [socket.c:410:__socket_keepalive]
> 0-socket: failed to set keep idle on socket 8
> 
> [2013-12-03 05:42:32.033646] W
> [socket.c:1876:socket_server_event_handler] 0-socket.glusterfsd:
> Failed to set keep-alive: Operation not supported
> 
> [2013-12-03 05:42:32.790473] I
> [client-handshake.c:1614:select_server_supported_programs]
> 0-glustervol-client-1: Using Program GlusterFS 3.3.2, Num (1298437),

> Version (330)
> 
> [2013-12-03 05:42:32.790840] I
> [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-1:
> Connected to 172.16.95.153:24009, attached to remote volume 
> '/mnt/cloudbrick'.
> 
> [2013-12-03 05:42:32.790884] I
> [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-1:
> Server and Client lk-version numbers are not same, reopening the fds
> 
> [2013-12-03 05:42:32.791003] I [afr-common.c:3685:afr_notify]
> 0-glustervol-replicate-0: Subvolume 'glustervol-client-1' came back

> up; going online.
> 
> [2013-12-03 05:42:32.791161] I
> [client-handshake.c:453:client_set_lk_version_cbk]
> 0-glustervol-client-1: Server lk version = 1
> 
> [2013-12-03 05:42:32.795103] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:a7e88fd1-6e32-40ab-90f6-ea452242a7c6> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.798064] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:081c6657-301a-42a4-9f95-6eeba6c67413> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.799278] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:565f1358-449c-45e2-8535-93b5632c0d1e> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.800636] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:9c7010ac-5c11-4561-8b86-5c4d6561f34e> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.802223] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:25fd406f-63e0-4037-bb01-da282cbe4d76> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.803339] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:a109c429-5885-499e-8711-09fdccd396f2> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.804308] E
> [afr-self-heal-data.c:1321:afr_sh_data_open_cbk]
> 0-glustervol-replicate-0: open of
> <gfid:5a8fd3bf-9215-444c-b974-5c280f5699a6> failed on child
> glustervol-client-0 (Transport endpoint is not connected)
> 
> [2013-12-03 05:42:32.804877] I
> [client-handshake.c:1614:select_server_supported_programs]
> 0-glustervol-client-0: Using Program GlusterFS 3.3.2, Num (1298437),

> Version (330)
> 
> [2013-12-03 05:42:32.807517] I
> [client-handshake.c:1411:client_setvolume_cbk] 0-glustervol-client-0:
> Connected to 172.16.107.154:24009, attached to remote volume 
> '/mnt/cloudbrick'.
> 
> [2013-12-03 05:42:32.807562] I
> [client-handshake.c:1423:client_setvolume_cbk] 0-glustervol-client-0:
> Server and Client lk-version numbers are not same, reopening the fds
> 
> [2013-12-03 05:42:32.810357] I
> [client-handshake.c:453:client_set_lk_version_cbk]
> 0-glustervol-client-0: Server lk version = 1
> 
> [2013-12-03 05:42:32.827437] E
> [afr-self-heal-data.c:764:afr_sh_data_fxattrop_fstat_done]
> 0-glustervol-replicate-0: Unable to self-heal contents of 
> '<gfid:1262d40d-46a3-4e57-b07b-0fcc972c8403>' (possible split-brain).
> Please delete the file from all but the preferred subvolume.

That file is at
$brick/.glusterfs/12/62/1262d40d-46a3-4e57-b07b-0fcc972c8403

Try picking one to remove like it says.
> 
> [2013-12-03 05:42:39.205157] E
> [afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
> 0-glustervol-replicate-0: Unable to self-heal permissions/ownership
of 
> '<gfid:c590e3fb-a376-4ac9-86a6-14a80814e06f>' (possible split-brain).
> Please fix the file on all backend volumes
> 
> [2013-12-03 05:42:39.215793] E
> [afr-self-heal-metadata.c:472:afr_sh_metadata_fix]
> 0-glustervol-replicate-0: Unable to self-heal permissions/ownership
of 
> '<gfid:c0660768-289f-48ac-b8e5-e5b5a3a4b965>' (possible split-brain).
> Please fix the file on all backend volumes
> 
>  
If that doesn't allow it to heal, you may need to find which filename that's
hardlinked to. ls -li the gfid file at the path I demonstrated earlier.
With that inode number in hand, find $brick -inum $inode_number Once you
know which filenames it's linked with, remove all linked copies from all
but one replica. Then the self-heal can continue successfully.

------------------------------

Message: 34
Date: Tue, 10 Dec 2013 09:30:22 +0100
From: Johan Huysmans <johan.huysmans@xxxxxxxxx>
To: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject:  Structure needs cleaning on some files
Message-ID: <52A6D11E.4030406@xxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi All,

When reading some files we get this error:
md5sum: /path/to/file.xml: Structure needs cleaning

in /var/log/glusterfs/mnt-sharedfs.log we see these errors:
[2013-12-10 08:07:32.256910] W 
[client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: remote

operation failed: No such file or directory
[2013-12-10 08:07:32.257436] W 
[client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: remote

operation failed: No such file or directory
[2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] 
0-glusterfs-fuse: 8230: STAT() /path/to/file.xml => -1 (Structure needs

cleaning)

We are using gluster 3.4.1-3 on CentOS6.
Our servers are 64-bit, our clients 32-bit (we are already using 
--enable-ino32 on the mountpoint)

This is my gluster configuration:
Volume Name: testvolume
Type: Replicate
Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: SRV-1:/gluster/brick1
Brick2: SRV-2:/gluster/brick2
Options Reconfigured:
performance.force-readdirp: on
performance.stat-prefetch: off
network.ping-timeout: 5

And this is how the applications work:
We have 2 client nodes who both have a fuse.glusterfs mountpoint.
On 1 client node we have a application which writes files.
On the other client node we have a application which reads these files.
On the node where the files are written we don't see any problem, and 
can read that file without problems.
On the other node we have problems (error messages above) reading that
file.
The problem occurs when we perform a md5sum on the exact file, when 
perform a md5sum on all files in that directory there is no problem.

How can we solve this problem as this is annoying.
The problem occurs after some time (can be days), an umount and mount of

the mountpoint solves it for some days.
Once it occurs (and we don't remount) it occurs every time.

I hope someone can help me with this problems.

Thanks,
Johan Huysmans

------------------------------

Message: 35
Date: Tue, 10 Dec 2013 08:56:56 +0000
From: "Bernhard Glomm" <bernhard.glomm@xxxxxxxxxxx>
To: vbellur@xxxxxxxxxx, mrcuongnv@xxxxxxxxx
Cc: gluster-users@xxxxxxxxxxx
Subject: Re:  replace-brick failing -

transport.address-family not specified
Message-ID: <03a55549428f5909f0b3db1dee93d8c55e3ba3c3@xxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"

Am 10.12.2013 06:39:47, schrieb Vijay Bellur:
> On 12/08/2013 07:06 PM, Nguyen Viet Cuong wrote:
> > Thanks for sharing.
> > 
> > Btw, I do believe that GlusterFS 3.2.x is much more stable than
3.4.x in
> > production.
> > 

> This is quite contrary to what we have seen in the community. From
a 
> development perspective too, we feel much better about 3.4.1. Are
there 
> specific instances that worked well with 3.2.x which does not work
fine 
> for you in 3.4.x?

987555 -?is that fixed in 3.5?Or did it even make it into 3.4.2couldn't
find a note on that.Show stopper for moving from?3.2.x to anywhere for
me!
cheersb?
> 
> Cheers,
> Vijay
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-- 

            Bernhard Glomm

            IT Administration

                  Phone:

                  +49 (30)
86880 134

                  Fax:

                  +49 (30)
86880 100

                  Skype:

                  bernhard.glomm.ecologic

          Ecologic Institut gemeinn?tzige GmbH
| Pfalzburger Str. 43/44 | 10717 Berlin | Germany

          GF: R. Andreas Kraemer | AG: Charlottenburg
HRB 57947 | USt/VAT-IdNr.: DE811963464

          Ecologic? is a Trade Mark (TM) of Ecologic
Institut gemeinn?tzige GmbH

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/475454d4/attachment-0001.html>

------------------------------

Message: 36
Date: Tue, 10 Dec 2013 10:02:14 +0100
From: Johan Huysmans <johan.huysmans@xxxxxxxxx>
To: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Structure needs cleaning on some files
Message-ID: <52A6D896.1020404@xxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

I could reproduce this problem with while my mount point is running in

debug mode.
logfile is attached.

gr.
Johan Huysmans

On 10-12-13 09:30, Johan Huysmans wrote:
> Hi All,
>
> When reading some files we get this error:
> md5sum: /path/to/file.xml: Structure needs cleaning
>
> in /var/log/glusterfs/mnt-sharedfs.log we see these errors:
> [2013-12-10 08:07:32.256910] W 
> [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0:

> remote operation failed: No such file or directory
> [2013-12-10 08:07:32.257436] W 
> [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1:

> remote operation failed: No such file or directory
> [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] 
> 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml => -1 (Structure

> needs cleaning)
>
> We are using gluster 3.4.1-3 on CentOS6.
> Our servers are 64-bit, our clients 32-bit (we are already using 
> --enable-ino32 on the mountpoint)
>
> This is my gluster configuration:
> Volume Name: testvolume
> Type: Replicate
> Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: SRV-1:/gluster/brick1
> Brick2: SRV-2:/gluster/brick2
> Options Reconfigured:
> performance.force-readdirp: on
> performance.stat-prefetch: off
> network.ping-timeout: 5
>
> And this is how the applications work:
> We have 2 client nodes who both have a fuse.glusterfs mountpoint.
> On 1 client node we have a application which writes files.
> On the other client node we have a application which reads these files.
> On the node where the files are written we don't see any problem,
and 
> can read that file without problems.
> On the other node we have problems (error messages above) reading
that 
> file.
> The problem occurs when we perform a md5sum on the exact file, when

> perform a md5sum on all files in that directory there is no problem.
>
>
> How can we solve this problem as this is annoying.
> The problem occurs after some time (can be days), an umount and mount

> of the mountpoint solves it for some days.
> Once it occurs (and we don't remount) it occurs every time.
>
>
> I hope someone can help me with this problems.
>
> Thanks,
> Johan Huysmans
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
A non-text attachment was scrubbed...
Name: gluster_debug.log
Type: text/x-log
Size: 16600 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/bdf626dc/attachment-0001.bin>

------------------------------

Message: 37
Date: Tue, 10 Dec 2013 10:08:43 +0100
From: Heiko Kr?mer <hkraemer@xxxxxxxxxxxx>
To: gluster-users@xxxxxxxxxxx
Subject: Re:  Gluster infrastructure question
Message-ID: <52A6DA1B.3030209@xxxxxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi guys,

thanks for all these reports. Well, I think I'll change my Raid level
to 6 and let the Raid controller build and rebuild all Raid members
and replicate again with glusterFS. I get more capacity but I need to
check if the write throughput acceptable.

I think, I can't take advantage of using glusterFS with a lot of
Bricks because I've found more cons as pros in my case.

@Ben thx for this very detailed document!

Cheers and Thanks
Heiko

On 10.12.2013 00:38, Dan Mons wrote:
> On 10 December 2013 08:09, Joe Julian <joe@xxxxxxxxxxxxxxxx>
> wrote:
>> Replicas are defined in the order bricks are listed in the volume
>> create command. So gluster volume create myvol replica 2
>> server1:/data/brick1 server2:/data/brick1 server3:/data/brick1
>> server4:/data/brick1 will replicate between server1 and server2
>> and replicate between server3 and server4.
>> 
>> Bricks added to a replica 2 volume after it's been created will
>> require pairs of bricks,
>> 
>> The best way to "force" replication to happen on another
server
>> is to just define it that way.
> 
> Yup, that's understood.  The problem is when (for argument's
sake)
> :
> 
> * We've defined 4 hosts with 10 disks each * Each individual disk
> is a brick * Replication is defined correctly when creating the
> volume initially * I'm on holidays, my employer buys a single node,
> configures it brick-per-disk, and the IT junior adds it to the
> cluster
> 
> All good up until that final point, and then I've got that fifth
> node at the end replicating to itself.  Node goes down some months
> later, chaos ensues.
> 
> Not a GlusterFS/technology problem, but a problem with what
> frequently happens at a human level.  As a sysadmin, these are
also
> things I need to work around, even if it means deviating from best
> practices. :)
> 
> -Dan _______________________________________________ Gluster-users
> mailing list Gluster-users@xxxxxxxxxxx 
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 

- -- 
Anynines.com

Avarteq GmbH
B.Sc. Informatik
Heiko Kr?mer
CIO
Twitter: @anynines

- ----
Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian Fischer
Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
Sitz: Saarbr?cken
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJSptoTAAoJELxFogM4ixOFJTsIAJBWed3AGiiI+PDC2ubfboKc
UPkMc+zuirRh2+QJBAoZ4CsAv9eIZ5NowclSSby9PTq2XRjjLvMdKuI+IbXCRT4j
AbMLYfP3g4Q+agXnY6N6WJ6ZIqXQ8pbCK3shYp9nBfVYkiDUT1bGk0WcgQmEWTCw
ta1h17LYkworIDRtqWQAl4jr4JR4P3x4cmwOZiHCVCtlyOP02x/fN4dji6nyOtuB
kQPBVsND5guQNU8Blg5cQoES5nthtuwJdkWXB+neaCZd/u3sexVSNe5m15iWbyYg
mAoVvlBJ473IKATlxM5nVqcUhmjFwNcc8MMwczXxTkwniYzth53BSoltPn7kIx4=
=epys
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hkraemer.vcf
Type: text/x-vcard
Size: 277 bytes
Desc: not available
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/f663943d/attachment-0001.vcf>

------------------------------

Message: 38
Date: Tue, 10 Dec 2013 10:42:43 +0100
From: Johan Huysmans <johan.huysmans@xxxxxxxxx>
To: gluster-users@xxxxxxxxxxx, bill.mair@xxxxxx
Subject: Re:  Errors from PHP stat() on files and

directories in a glusterfs mount
Message-ID: <52A6E213.3000109@xxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

Hi,

It seems I have a related problem (just posted this on the mailing list).
Do you already have a solution for this problem?

gr.
Johan Huysmans

On 05-12-13 20:05, Bill Mair wrote:
> Hi,
>
> I'm trying to use glusterfs to mirror the ownCloud "data"
area between 
> 2 servers.
>
> They are using debian jessie due to some dependancies that I have
for 
> other components.
>
> This is where my issue rears it's ugly head. This is failing because
I 
> can't stat the files and directories on my glusterfs mount.
>
> /var/www/owncloud/data is where I am mounting the volume and I can

> reproduce the error using a simple php test application, so I don't

> think that it is apache or owncloud related.
>
> I'd be grateful for any pointers on how to resolve this problem.
>
> Thanks,
>
>     Bill
>
> Attached is "simple.php" test and the results of executing
"strace 
> php5 simple.php" twice, once with the glusterfs mounted 
> (simple.php.strace-glusterfs) and once against the file system when

> unmounted (simple.php.strace-unmounted).
>
> ------------------------------------------------------------------------
>
> Here is what I get in the gluster log when I run the test (as root):
>
> /var/log/glusterfs/var-www-owncloud-data.log
>
> [2013-12-05 18:33:50.802250] D 
> [client-handshake.c:185:client_start_ping] 0-gv-ocdata-client-0: 
> returning as transport is already disconnected OR there are no frames

> (0 || 0)
> [2013-12-05 18:33:50.825132] D 
> [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 
> 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ]
> [2013-12-05 18:33:50.825322] D 
> [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 
> 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ]
> [2013-12-05 18:33:50.825393] D 
> [afr-self-heal-common.c:887:afr_mark_sources] 0-gv-ocdata-replicate-0:

> Number of sources: 0
> [2013-12-05 18:33:50.825456] D 
> [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type]

> 0-gv-ocdata-replicate-0: returning read_child: 0
> [2013-12-05 18:33:50.825511] D 
> [afr-common.c:1380:afr_lookup_select_read_child] 
> 0-gv-ocdata-replicate-0: Source selected as 0 for /
> [2013-12-05 18:33:50.825579] D 
> [afr-common.c:1117:afr_lookup_build_response_params] 
> 0-gv-ocdata-replicate-0: Building lookup response from 0
> [2013-12-05 18:33:50.827069] D 
> [afr-common.c:131:afr_lookup_xattr_req_prepare] 
> 0-gv-ocdata-replicate-0: /check.txt: failed to get the gfid from dict
> [2013-12-05 18:33:50.829409] D 
> [client-handshake.c:185:client_start_ping] 0-gv-ocdata-client-0: 
> returning as transport is already disconnected OR there are no frames

> (0 || 0)
> [2013-12-05 18:33:50.836719] D 
> [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 
> 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ]
> [2013-12-05 18:33:50.836870] D 
> [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 
> 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ]
> [2013-12-05 18:33:50.836941] D 
> [afr-self-heal-common.c:887:afr_mark_sources] 0-gv-ocdata-replicate-0:

> Number of sources: 0
> [2013-12-05 18:33:50.837002] D 
> [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type]

> 0-gv-ocdata-replicate-0: returning read_child: 0
> [2013-12-05 18:33:50.837058] D 
> [afr-common.c:1380:afr_lookup_select_read_child] 
> 0-gv-ocdata-replicate-0: Source selected as 0 for /check.txt
> [2013-12-05 18:33:50.837129] D 
> [afr-common.c:1117:afr_lookup_build_response_params] 
> 0-gv-ocdata-replicate-0: Building lookup response from 0
>
> Other bits of information
>
> root@bbb-1:/var/www/owncloud# uname -a
> Linux bbb-1 3.8.13-bone30 #1 SMP Thu Nov 14 02:59:07 UTC 2013 armv7l

> GNU/Linux
>
> root@bbb-1:/var/www/owncloud# dpkg -l glusterfs-*
> Desired=Unknown/Install/Remove/Purge/Hold
> | 
> Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
> |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
> ||/ Name Version              
      Architecture Description
> +++-============================================-===========================-===========================-==============================================================================================
> ii  glusterfs-client 3.4.1-1          
          armhf clustered 
> file-system (client package)
> ii  glusterfs-common 3.4.1-1          
          armhf GlusterFS 
> common libraries and translator modules
> ii  glusterfs-server 3.4.1-1          
          armhf clustered 
> file-system (server package)
>
> mount
>
> bbb-1:gv-ocdata on /var/www/owncloud/data type fuse.glusterfs 
> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>
> /etc/fstab
>
> UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx /sdhc ext4 defaults 0 0
> bbb-1:gv-ocdata /var/www/owncloud/data glusterfs 
> defaults,_netdev,log-level=DEBUG 0 0
>
> ls -al on the various paths
>
> root@bbb-1:/var/log/glusterfs# ll -d /sdhc/
> drwxrwxr-x 7 root root 4096 Nov 28 19:15 /sdhc/
>
> root@bbb-1:/var/log/glusterfs# ll -d /sdhc/gv-ocdata/
> drwxrwx--- 5 www-data www-data 4096 Dec  5 00:50 /sdhc/gv-ocdata/
>
> root@bbb-1:/var/log/glusterfs# ll -d /sdhc/gv-ocdata/check.txt
> -rw-r--r-- 2 root root 10 Dec  5 00:50 /sdhc/gv-ocdata/check.txt
>
> root@bbb-1:/var/www/owncloud# ll -d /var/www/owncloud/data/
> drwxrwx--- 5 www-data www-data 4096 Dec  5 00:50 /var/www/owncloud/data/
>
> root@bbb-1:/var/www/owncloud# ll -d /var/www/owncloud/data/check.txt
> -rw-r--r-- 1 root root 10 Dec  5 00:50 /var/www/owncloud/data/check.txt
>
> file & dir attr information:
>
> root@bbb-1:/var/www/owncloud# attr -l /var/www/owncloud/data
> Attribute "glusterfs.volume-id" has a 16 byte value for

> /var/www/owncloud/data
>
> root@bbb-1:/var/www/owncloud# attr -l /var/www/owncloud/data/check.txt
> root@bbb-1:/var/www/owncloud#
>
> root@bbb-1:/var/www/owncloud# attr -l /sdhc/gv-ocdata/
> Attribute "glusterfs.volume-id" has a 16 byte value for
/sdhc/gv-ocdata/
> Attribute "gfid" has a 16 byte value for /sdhc/gv-ocdata/
> Attribute "glusterfs.dht" has a 16 byte value for /sdhc/gv-ocdata/
> Attribute "afr.gv-ocdata-client-0" has a 12 byte value for

> /sdhc/gv-ocdata/
> Attribute "afr.gv-ocdata-client-1" has a 12 byte value for

> /sdhc/gv-ocdata/
>
> root@bbb-1:/var/www/owncloud# attr -l /sdhc/gv-ocdata/check.txt
> Attribute "gfid" has a 16 byte value for /sdhc/gv-ocdata/check.txt
> Attribute "afr.gv-ocdata-client-0" has a 12 byte value for

> /sdhc/gv-ocdata/check.txt
> Attribute "afr.gv-ocdata-client-1" has a 12 byte value for

> /sdhc/gv-ocdata/check.txt
> root@bbb-1:/var/www/owncloud#
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/d77e25bb/attachment-0001.html>

------------------------------

Message: 39
Date: Tue, 10 Dec 2013 21:03:36 +1100
From: Andrew Lau <andrew@xxxxxxxxxxxxxx>
To: Ben Turner <bturner@xxxxxxxxxx>
Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Gluster infrastructure question
Message-ID:

<CAD7dF9c3uexEG++1YEHwh3zw7a1Xy+=Co_xO+zrDrggDuV2DJQ@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"

Hi Ben,

For glusterfs would you recommend the enterprise-storage
or throughput-performance tuned profile?

Thanks,
Andrew

On Tue, Dec 10, 2013 at 6:31 AM, Ben Turner <bturner@xxxxxxxxxx>
wrote:

> ----- Original Message -----
> > From: "Ben Turner" <bturner@xxxxxxxxxx>
> > To: "Heiko Kr?mer" <hkraemer@xxxxxxxxxxx>
> > Cc: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
> > Sent: Monday, December 9, 2013 2:26:45 PM
> > Subject: Re: [Gluster-users] Gluster infrastructure question
> >
> > ----- Original Message -----
> > > From: "Heiko Kr?mer" <hkraemer@xxxxxxxxxxx>
> > > To: "gluster-users@xxxxxxxxxxx List" <gluster-users@xxxxxxxxxxx>
> > > Sent: Monday, December 9, 2013 8:18:28 AM
> > > Subject:  Gluster infrastructure question
> > >
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA1
> > >
> > > Heyho guys,
> > >
> > > I'm running since years glusterfs in a small environment
without big
> > > problems.
> > >
> > > Now I'm going to use glusterFS for a bigger cluster but
I've some
> > > questions :)
> > >
> > > Environment:
> > > * 4 Servers
> > > * 20 x 2TB HDD, each
> > > * Raidcontroller
> > > * Raid 10
> > > * 4x bricks => Replicated, Distributed volume
> > > * Gluster 3.4
> > >
> > > 1)
> > > I'm asking me, if I can delete the raid10 on each server
and create
> > > for each HDD a separate brick.
> > > In this case have a volume 80 Bricks so 4 Server x 20 HDD's.
Is there
> > > any experience about the write throughput in a production
system with
> > > many of bricks like in this case? In addition i'll get double
of HDD
> > > capacity.
> >
> > Have a look at:
> >
> > http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf
>
> That one was from 2012, here is the latest:
>
>
> http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf
>
> -b
>
> > Specifically:
> >
> > ? RAID arrays
> > ? More RAID LUNs for better concurrency
> > ? For RAID6, 256-KB stripe size
> >
> > I use a single RAID 6 that is divided into several LUNs for my
bricks.
>  For
> > example, on my Dell servers(with PERC6 RAID controllers) each
server has
> 12
> > disks that I put into raid 6.  Then I break the RAID 6 into
6 LUNs and
> > create a new PV/VG/LV for each brick.  From there I follow
the
> > recommendations listed in the presentation.
> >
> > HTH!
> >
> > -b
> >
> > > 2)
> > > I've heard a talk about glusterFS and out scaling. The main
point was
> > > if more bricks are in use, the scale out process will take
a long
> > > time. The problem was/is the Hash-Algo. So I'm asking me
how is it if
> > > I've one very big brick (Raid10 20TB on each server) or
I've much more
> > > bricks, what's faster and is there any issues?
> > > Is there any experiences ?
> > >
> > > 3)
> > > Failover of a HDD is for a raid controller with HotSpare
HDD not a big
> > > deal. Glusterfs will rebuild automatically if a brick fails
and there
> > > are no data present, this action will perform a lot of network
traffic
> > > between the mirror bricks but it will handle it equal as
the raid
> > > controller right ?
> > >
> > >
> > >
> > > Thanks and cheers
> > > Heiko
> > >
> > >
> > >
> > > - --
> > > Anynines.com
> > >
> > > Avarteq GmbH
> > > B.Sc. Informatik
> > > Heiko Kr?mer
> > > CIO
> > > Twitter: @anynines
> > >
> > > - ----
> > > Gesch?ftsf?hrer: Alexander Fai?t, Dipl.-Inf.(FH) Julian
Fischer
> > > Handelsregister: AG Saarbr?cken HRB 17413, Ust-IdNr.: DE262633168
> > > Sitz: Saarbr?cken
> > > -----BEGIN PGP SIGNATURE-----
> > > Version: GnuPG v1.4.14 (GNU/Linux)
> > > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> > >
> > > iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B
> > > lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0
> > > GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ
> > > N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs
> > > TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6
> > > Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY=
> > > =bDly
> > > -----END PGP SIGNATURE-----
> > >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users@xxxxxxxxxxx
> > > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users@xxxxxxxxxxx
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/b19779ff/attachment-0001.html>

------------------------------

Message: 40
Date: Tue, 10 Dec 2013 15:34:56 +0530
From: Vijay Bellur <vbellur@xxxxxxxxxx>
To: Bernhard Glomm <bernhard.glomm@xxxxxxxxxxx>, mrcuongnv@xxxxxxxxx
Cc: gluster-users@xxxxxxxxxxx
Subject: Re:  replace-brick failing -

transport.address-family not specified
Message-ID: <52A6E748.5070300@xxxxxxxxxx>
Content-Type: text/plain; charset=UTF-8; format=flowed

On 12/10/2013 02:26 PM, Bernhard Glomm wrote:
> Am 10.12.2013 06:39:47, schrieb Vijay Bellur:
>
>     On 12/08/2013 07:06 PM, Nguyen Viet Cuong wrote:
>
>         Thanks for sharing.
>
>         Btw, I do believe that GlusterFS 3.2.x
is much more stable than
>         3.4.x in
>         production.
>
>
>     This is quite contrary to what we have seen in the community.
>From a
>     development perspective too, we feel much better about
3.4.1. Are there
>     specific instances that worked well with 3.2.x which
does not work fine
>     for you in 3.4.x?
>
>
> 987555 - is that fixed in 3.5?
>
> Or did it even make it into 3.4.2
>
> couldn't find a note on that.
>

Yes, this will be part of 3.4.2. Note that the original problem was due

to libvirt being rigid about the ports that it needs to use for 
migrations. AFAIK this has been addressed in upstream libvirt as well.
Through this bug fix, glusterfs provides a mechanism where it can use a

separate range of ports for bricks. This configuration can be enabled to

work with other applications that do not adhere with guidelines laid out

by IANA.

Cheers,
Vijay

------------------------------

Message: 41
Date: Tue, 10 Dec 2013 15:38:16 +0530
From: Vijay Bellur <vbellur@xxxxxxxxxx>
To: Alexandru Coseru <alex.coseru@xxxxxxxxxx>,

gluster-users@xxxxxxxxxxx
Subject: Re:  Gluster - replica - Unable to self-heal

contents of '/' (possible split-brain)
Message-ID: <52A6E810.9050900@xxxxxxxxxx>
Content-Type: text/plain; charset=windows-1252; format=flowed

On 12/09/2013 07:21 PM, Alexandru Coseru wrote:

>
> [2013-12-09 13:20:52.066978] E
> [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
> 0-stor1-replicate-0: Unable to self-heal contents of '/' (possible
> split-brain). Please delete the file from all but the preferred
> subvolume.- Pending matrix:  [ [ 0 2 ] [ 2 0 ] ]
>
> [2013-12-09 13:20:52.067386] E
> [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk]
> 0-stor1-replicate-0: background  meta-data self-heal failed on
/
>
> [2013-12-09 13:20:52.067452] E [mount3.c:290:mnt3svc_lookup_mount_cbk]
> 0-nfs: error=Input/output error
>
> [2013-12-09 13:20:53.092039] E
> [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
> 0-stor1-replicate-0: Unable to self-heal contents of '/' (possible
> split-brain). Please delete the file from all but the preferred
> subvolume.- Pending matrix:  [ [ 0 2 ] [ 2 0 ] ]
>
> [2013-12-09 13:20:53.092497] E
> [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk]
> 0-stor1-replicate-0: background  meta-data self-heal failed on
/
>
> [2013-12-09 13:20:53.092559] E [mount3.c:290:mnt3svc_lookup_mount_cbk]
> 0-nfs: error=Input/output error
>
> What I?m doing wrong ?

Looks like there is a metadata split-brain on /.

The split-brain resolution document at [1] can possibly be of help here.

-Vijay

[1] https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md

>
> PS:  Volume stor_fast works like a charm.
>

Good to know, thanks!

------------------------------

Message: 42
Date: Tue, 10 Dec 2013 11:59:44 +0100
From: "Mariusz Sobisiak" <MSobisiak@xxxxxx>
To: <gluster-users@xxxxxxxxxxx>
Subject:  Error after crash of Virtual Machine during

migration
Message-ID:

<507D8C234E515F4F969362F9666D7EBBE875D1@xxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain;            
    charset="us-ascii"

Greetings,

Legend:
storage-gfs-3-prd - the first gluster.
storage-1-saas - new gluster where "the first gluster" had to
be
migrated.
storage-gfs-4-prd - the second gluster (which had to be migrated later).

I've started command replace-brick:
'gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared
storage-1-saas:/ydp/shared start'

During that Virtual Machine (Xen) has crashed. Now I can't abort
migration and continue it again.

When I try:
'# gluster volume replace-brick sa_bookshelf
storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort'
The command lasts about 5 minutes then finishes with no results. Apart
from that Gluster after that command starts behave very strange. 
For example I can't do '# gluster volume heal sa_bookshelf info' because
it lasts about 5 minutes and returns black screen (the same like abort).

Then I restart Gluster server and Gluster returns to normal work except
the replace-brick commands. When I do:
'# gluster volume replace-brick sa_bookshelf
storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared status'
I get:
Number of files migrated = 0       Current file=
I can do 'volume heal info' commands etc. until I call the command:
'# gluster volume replace-brick sa_bookshelf
storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort'.

# gluster --version
glusterfs 3.3.1 built on Oct 22 2012 07:54:24 Repository revision:
git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS
comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU
General Public License.

Brick (/ydp/shared) logs (repeats the same constantly):
[2013-12-06 11:29:44.790299] W [dict.c:995:data_to_str]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
) [0x7ff4a5d35fcb]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
_family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL
[2013-12-06 11:29:44.790402] W [dict.c:995:data_to_str]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
) [0x7ff4a5d35fcb]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
_family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL
[2013-12-06 11:29:44.790465] E [name.c:141:client_fill_address_family]
0-sa_bookshelf-replace-brick: transport.address-family not specified.
Could not guess default value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
[2013-12-06 11:29:47.791037] W [dict.c:995:data_to_str]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
) [0x7ff4a5d35fcb]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
_family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL
[2013-12-06 11:29:47.791141] W [dict.c:995:data_to_str]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
) [0x7ff4a5d35fcb]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
_family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL
[2013-12-06 11:29:47.791174] E [name.c:141:client_fill_address_family]
0-sa_bookshelf-replace-brick: transport.address-family not specified.
Could not guess default value from (remote-host:(null) or
transport.unix.connect-path:(null)) options
[2013-12-06 11:29:50.791775] W [dict.c:995:data_to_str]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
) [0x7ff4a5d35fcb]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
_family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL
[2013-12-06 11:29:50.791986] W [dict.c:995:data_to_str]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab
) [0x7ff4a5d35fcb]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r
emote_sockaddr+0x15d) [0x7ff4a5d3d64d]
(-->/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address
_family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL
[2013-12-06 11:29:50.792046] E [name.c:141:client_fill_address_family]
0-sa_bookshelf-replace-brick: transport.address-family not specified.
Could not guess default value from (remote-host:(null) or
transport.unix.connect-path:(null)) options

# gluster volume info

Volume Name: sa_bookshelf
Type: Distributed-Replicate
Volume ID: 74512f52-72ec-4538-9a54-4e50c4691722
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: storage-gfs-3-prd:/ydp/shared
Brick2: storage-gfs-4-prd:/ydp/shared
Brick3: storage-gfs-3-prd:/ydp/shared2
Brick4: storage-gfs-4-prd:/ydp/shared2

# gluster volume status
Status of volume: sa_bookshelf
Gluster process                

  Port    Online
Pid
------------------------------------------------------------------------
------
Brick storage-gfs-3-prd:/ydp/shared          
          24009   Y
758
Brick storage-gfs-4-prd:/ydp/shared          
          24009   Y
730
Brick storage-gfs-3-prd:/ydp/shared2          
         24010   Y
764
Brick storage-gfs-4-prd:/ydp/shared2          
         24010   Y
4578
NFS Server on localhost              
                  38467  
Y
770
Self-heal Daemon on localhost            
              N/A     Y
776
NFS Server on storage-1-saas            
               38467   Y
840
Self-heal Daemon on storage-1-saas            
         N/A     Y
846
NFS Server on storage-gfs-4-prd            
            38467   Y
4584
Self-heal Daemon on storage-gfs-4-prd          
        N/A     Y
4590

storage-gfs-3-prd:~# gluster peer status Number of Peers: 2

Hostname: storage-1-saas
Uuid: 37b9d881-ce24-4550-b9de-6b304d7e9d07
State: Peer in Cluster (Connected)

Hostname: storage-gfs-4-prd
Uuid: 4c384f45-873b-4c12-9683-903059132c56
State: Peer in Cluster (Connected)

(from storage-1-saas)# gluster peer status Number of Peers: 2

Hostname: 172.16.3.60
Uuid: 1441a7b0-09d2-4a40-a3ac-0d0e546f6884
State: Peer in Cluster (Connected)

Hostname: storage-gfs-4-prd
Uuid: 4c384f45-873b-4c12-9683-903059132c56
State: Peer in Cluster (Connected)

Clients work properly.
I googled for that but I found that was a bug but in 3.3.0 version. How
can I repair that and continue my migration? Thank You for any help.

BTW: I moved Gluster Server via Gluster 3.4: Brick Restoration - Replace
Crashed Server how to.

Regards,
Mariusz

------------------------------

Message: 43
Date: Tue, 10 Dec 2013 12:52:29 +0100
From: Johan Huysmans <johan.huysmans@xxxxxxxxx>
To: "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject: Re:  Structure needs cleaning on some files
Message-ID: <52A7007D.6020005@xxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

Hi All,

It seems I can easily reproduce the problem.

* on node 1 create a file (touch , cat , ...).
* on node 2 take md5sum of direct file (md5sum /path/to/file)
* on node 1 move file to other name (mv file file1)
* on node 2 take md5sum of direct file (md5sum /path/to/file), this is

still working although the file is not really there
* on node 1 change file content
* on node 2 take md5sum of direct file (md5sum /path/to/file), this is

still working and has a changed md5sum

This is really strange behaviour.
Is this normal, can this be altered with a a setting?

Thanks for any info,
gr.
Johan

On 10-12-13 10:02, Johan Huysmans wrote:
> I could reproduce this problem with while my mount point is running
in 
> debug mode.
> logfile is attached.
>
> gr.
> Johan Huysmans
>
> On 10-12-13 09:30, Johan Huysmans wrote:
>> Hi All,
>>
>> When reading some files we get this error:
>> md5sum: /path/to/file.xml: Structure needs cleaning
>>
>> in /var/log/glusterfs/mnt-sharedfs.log we see these errors:
>> [2013-12-10 08:07:32.256910] W 
>> [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0:

>> remote operation failed: No such file or directory
>> [2013-12-10 08:07:32.257436] W 
>> [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1:

>> remote operation failed: No such file or directory
>> [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk]

>> 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml => -1 (Structure

>> needs cleaning)
>>
>> We are using gluster 3.4.1-3 on CentOS6.
>> Our servers are 64-bit, our clients 32-bit (we are already using

>> --enable-ino32 on the mountpoint)
>>
>> This is my gluster configuration:
>> Volume Name: testvolume
>> Type: Replicate
>> Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7
>> Status: Started
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: SRV-1:/gluster/brick1
>> Brick2: SRV-2:/gluster/brick2
>> Options Reconfigured:
>> performance.force-readdirp: on
>> performance.stat-prefetch: off
>> network.ping-timeout: 5
>>
>> And this is how the applications work:
>> We have 2 client nodes who both have a fuse.glusterfs mountpoint.
>> On 1 client node we have a application which writes files.
>> On the other client node we have a application which reads these
files.
>> On the node where the files are written we don't see any problem,
and 
>> can read that file without problems.
>> On the other node we have problems (error messages above) reading

>> that file.
>> The problem occurs when we perform a md5sum on the exact file,
when 
>> perform a md5sum on all files in that directory there is no problem.
>>
>>
>> How can we solve this problem as this is annoying.
>> The problem occurs after some time (can be days), an umount and
mount 
>> of the mountpoint solves it for some days.
>> Once it occurs (and we don't remount) it occurs every time.
>>
>>
>> I hope someone can help me with this problems.
>>
>> Thanks,
>> Johan Huysmans
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@xxxxxxxxxxx
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20131210/32f9069c/attachment-0001.html>

------------------------------

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

End of Gluster-users Digest, Vol 68, Issue 11
*********************************************

**

This email and any attachments may contain information that is confidential and/or privileged for the sole use of the intended recipient.  Any use, review, disclosure, copying, distribution or reliance by others, and any forwarding of this email or its contents, without the express permission of the sender is strictly prohibited by law.  If you are not the intended recipient, please contact the sender immediately, delete the e-mail and destroy all copies.

**
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users