Self healing of 3.3.0 cause our 2 bricks replicated cluster freeze (client read/write timeout)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



when you mount xfs, also use the inode64 option. That will help with xfs
performance.

My offhand guess is you are likely running into limited network bandwidth
for the 2 bricks to sync. As the network gets flooded nfs response gets
poor. Make sure you are getting full-duplex connections - or upgrade your
network to 10G or (even better) Infiniband.


On Mon, Nov 26, 2012 at 1:46 AM, ZHANG Cheng <czhang.oss at gmail.com> wrote:

> Early this morning our 2 bricks replicated cluster had an outage. The
> disk space for one of the brick server (brick02) was used up. When we
> responded to the disk full alert, the issue already lasted for a few
> hours. We reclaimed some disk space, and reboot the brick02 server,
> expecting once it come back it will go self healing.
>
> It did go self healing, but just after couple minutes, access to
> gluster filesystem freeze. Tons of "nfs: server brick not responding,
> still trying" popped up in dmesg. The load average on app server went
> up to 200 something from usual 0.10. We had to shutdown brick02 server
> or stop gluster server process on it, to get the gluster cluster back
> working.
>
> How could we deal with this issue? Thanks in advance.
>
> Our gluster setup is followed the official doc.
>
> gluster> volume info
>
> Volume Name: staticvol
> Type: Replicate
> Volume ID: fdcbf635-5faf-45d6-ab4e-be97c74d7715
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: brick01:/exports/static
> Brick2: brick02:/exports/static
>
> Underlying filesystem is xfs (on a lvm volume), as:
> /dev/mapper/vg_node-brick on /exports/static type xfs
> (rw,noatime,nodiratime,nobarrier,logbufs=8)
>
> The brick servers don't act as gluster client.
>
> Our app servers are the gluster client, mount via nfs.
> brick:/staticvol on /mnt/gfs-static type nfs
> (rw,noatime,nodiratime,vers=3,rsize=8192,wsize=8192,addr=10.10.10.51)
>
> brick is a DNS round-robin record for brick01 and brick02.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121128/96357baf/attachment-0001.html>


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux