Re: Problem with self-heal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
Yesterday I was gonna to replicate the error, but I didnt managed to do it, so I started to wonder whether it wasnt bad call..

I read the following links, so I would like to ask :D Does it mean, that this bug is caused by very fast recovery of connection? Or are there other things that come to the game? I am running 3.5.1 on production servers for less important stuff, and there one server came down this weekend. After all the heal process was totally fine. As long as the real server boots nearly 5minuts. Does it mean that this was the reason why I didnt experienced this bug?


When we can expect Gluster 3.5.2 to be released?

Thanks Milos




On 7/13/2014 10:23 PM, Ravishankar N wrote:
On 07/13/2014 09:05 PM, Miloš Kozák wrote:
Hi, I would like to ask about the progress. On the ticket there is
nothing new added..



I haven't had a chance to look at the logs/ reproduce the bug. Will get
to it in a couple of days.
Thanks,
Ravi


Thanks, Milos



Dne 14-07-02 11:37 PM, Miloš Kozák napsal(a):
Submitted: 1115748

Milos

Dne 14-07-02 11:40 AM, Vijay Bellur napsal(a):
On 07/02/2014 06:15 PM, Milos Kozak wrote:
Hi,

I am going to replicate the problem on clean gluster configuration
latter today. So far my answers are below.

On 7/2/2014 1:38 AM, Ravishankar N wrote:
On 07/02/2014 02:28 AM, Miloš Kozák wrote:
Hi,
I am running some test on top of v3.5.1 in my 2 nodes configuration
with one disk each and replica 2 mode.

I have two servers connected by a cable. Through this cable I let
glusterd communicate. I start dd to create a relatively large
file. In
the middle of writing process I disconnect the cable, so on one
server
(node1) I can see all data and on the other one (node2) I can see
just
a split of the file when writing is finished

Does this mean your client (mount point) is also on node 1?

Yes I mounted volume on both servers as follows:
localhost:vg0    /mnt

.. no surprise so far.

Then I put the cable back. After a while peers are discovered,
self-healing daemons start to communicate, so I can see:

gluster volume heal vg0 info
Brick node1:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1

Brick node2:/dist1/brick/fs/
/node-middle - Possibly undergoing heal
Number of entries: 1

But on the network there are no data moving, which I verify by df..

When  you get "Possibly undergoing heal" and no I/O is going on
from the
client, it means the self-heal daemon is healing the file. Can you
check
if there are  messages in glustershd.log of node1 about self-heal
completion ?

There are no lines in log, that is the reason why I wrote this email
eventually.

Any help? In my opinion after a while I should get my nodes
synchronized, but after 20minuts of waiting still nothing (the file
was 2G big)
Does gluster volume status show all processes being online?

All processes are running.


Output of strace -f -p <self-heal-daemon pid> from both nodes might
also help.

Thanks,
Vijay


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users





[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux