Re: gluster 3.4 self-heal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ravishankar,

thank you for the explanation.
I expected a performance hit after such a long shutdown, the only problem is I couldn't understand
if the healing was going or not.
After launching the gluster volume heal vol1 full I can see files in the .glusterfs/indices/xattrop/ directory
to decrease, but to this rate it would take two weeks to finish, maybe I would rather delete and recreate the volume
from scratch and with 3.5.

Thanks
Ivano

On 5/27/14 7:35 PM, Ravishankar N wrote:
On 05/27/2014 08:47 PM, Ivano Talamo wrote:
Dear all,

we have a replicated volume (2 servers with 1 brick each) on Scientific Linux 6.2 with gluster 3.4.
Everything was running fine until we shutdown of of the two and kept it down for 2 months.
When it came up again the volume could be healed and we have the following symptoms
(call #1 the always-up server, #2 the server that was kept down):

-doing I/O on the volume has very bad performances (impossible to keep VM images on it)

A replica's bricks are not supposed to be intentionally kept down even for hours, leave alone months :-(  ; If you do; then when it does come backup, there would be tons of stuff to heal, so a performance hit is expected.
-on #1 there's 3997354 files on .glusterfs/indices/xattrop/ and the number doesn't go down

When #2 was down, did the I/O involve directory renames? (see if there are entries on .glusterfs/landfill on #2). If yes then this is a known issue and a fix is in progress : http://review.gluster.org/#/c/7879/

-on #1 gluster volume heal vol1 info the first time takes a lot to end and doesn't show nothing.
This is fixed in glusterfs 3.5  where heal info is much more responsive.
after that it prints "Another transaction is in progress. Please try again after sometime."

Furthermore on #1 glusterhd.log is full of messages like this:
[2014-05-27 15:07:44.145326] W [client-rpc-fops.c:1538:client3_3_inodelk_cbk] 0-vol1-client-0: remote operation failed: No such file or directory
[2014-05-27 15:07:44.145880] W [client-rpc-fops.c:1640:client3_3_entrylk_cbk] 0-vol1-client-0: remote operation failed: No such file or directory
[2014-05-27 15:07:44.146070] E [afr-self-heal-entry.c:2296:afr_sh_post_nonblocking_entry_cbk] 0-vol1-replicate-0: Non Blocking entrylks failed for <gfid:bfbe65db-7426-4ca0-bf0b-7d1a28de2052>.
[2014-05-27 15:13:34.772856] E [afr-self-heal-data.c:1270:afr_sh_data_open_cbk] 0-vol1-replicate-0: open of <gfid:18a358e0-23d3-4f56-8d74-f5cc38a0d0ea> failed on child vol1-client-0 (No such file or directory)

On #2 bricks I see some updates, ie. new filenames appearing and .glusterfs/indices/xattrop/ is usually empy.

Do you know what's happening? How can we fix this?
You could try a `gluster volume heal vol1 full` to see if the bricks get synced.

Regards,
Ravi

thank you,
Ivano



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux