Re: non-blocking connect() returned: 111 (Connection refused) [solved]

Jordi Moles Blanco <jordi@xxxxxxxxx> · Thu, 18 Dec 2008 09:54:51 +0100

En/na Basavanagowda Kanur ha escrit:
Jordi,
  Do you have any firewall running on machines?

--
gowda

On Thu, Dec 18, 2008 at 1:47 PM, Jordi Moles Blanco <jordi@xxxxxxxxx 
<mailto:jordi@xxxxxxxxx>> wrote:

    En/na Raghavendra G ha escrit:

        Hi Jordi,

        Have you started glusterfsd on each of the newly added nodes?
        If not, please start them.

        some comments have been inlined.

        On Wed, Dec 17, 2008 at 3:28 PM, Jordi Moles Blanco
        <jordi@xxxxxxxxx <mailto:jordi@xxxxxxxxx>
        <mailto:jordi@xxxxxxxxx <mailto:jordi@xxxxxxxxx>>> wrote:

           Hi,

           i've got 6 nodes providing a storage unit with gluster 2.5
        patch
           800. They are set in 2 groups of 3 nodes each.

           On top of that, i've got a Xen 3.2 machine storing its virtual
           machines in gluster mount point.

           The thing is that i used to have only 2 nodes for group,
        that's 4
           nodes in total, and today I'm trying to add 1 extra node
        for each
           group.

           This is the final setting on Xen's Side:

           **************

           volume espai1
                 type protocol/client
                 option transport-type tcp/client
                 option remote-host 10.0.0.3
                 option remote-subvolume espai
           end-volume

           volume espai2
                 type protocol/client
                 option transport-type tcp/client
                 option remote-host 10.0.0.4
                 option remote-subvolume espai
           end-volume

           volume espai3
                 type protocol/client
                 option transport-type tcp/client
                 option remote-host 10.0.0.5
                 option remote-subvolume espai
           end-volume

           volume espai4
             type protocol/client
             option transport-type tcp/client
             option remote-host 10.0.0.6
             option remote-subvolume espai
           end-volume

           volume espai5
             type protocol/client
             option transport-type tcp/client
             option remote-host 10.0.0.7
             option remote-subvolume espai
           end-volume

           volume espai6
             type protocol/client
             option transport-type tcp/client
             option remote-host 10.0.0.8
             option remote-subvolume espai
           end-volume

           volume namespace1
                 type protocol/client
                 option transport-type tcp/client
                 option remote-host 10.0.0.4
                 option remote-subvolume nm
           end-volume

           volume namespace2
                 type protocol/client
                 option transport-type tcp/client
                 option remote-host 10.0.0.5
                 option remote-subvolume nm
           end-volume

           volume grup1
                 type cluster/afr
                 subvolumes espai1 espai3 espai5
           end-volume

           volume grup2
                 type cluster/afr
                 subvolumes espai2 espai4 espai6
           end-volume

           volume nm
                 type cluster/afr
                 subvolumes namespace1 namespace2
           end-volume

           volume g01
                 type cluster/unify
                 subvolumes grup1 grup2
                 option scheduler rr
                 option namespace nm
           end-volume

           volume io-cache
                 type performance/io-cache
                 option cache-size 512MB
                 option page-size 1MB
                 option force-revalidate-timeout 2
                 subvolumes g01
           end-volume  
           **************

           so... i stopped all virtual machines, unmounted gluster on Xen,
           updated the spec file (the one above) and ran gluster again
        in Xen.

           I've set different gluster environments but i had never tried
           this, and now i'm facing some problems.

           For what i had read before this... i used to think that when
           adding and extra node to a group and "remounting" on client's
           side, the Healing feature would copy all the content of the
        other
           nodes already present in the group to the "new one". That
        hasn't
           happened, even when I've tried to force the file system, by
           listing the files or doing what you suggest in you
        documentation:

           **********

           find /mnt/glusterfs -type f -print0 | xargs -0 head -c1
        >/dev/null

           **********

           so... my first question would be... does "self-healing"
        work this
           way? If it doesn't.... which is the best way to add a node to a
           group? Do i have to run a "copy" command manually to get
        the new
           node ready?
           I've also noticed that i have necessarily to umount gluster
        from
           Xen. Is there a way to avoid stopping all the virtual machines,
           umounting and mounting again? Is there a feature like "refresh
           config file"?

        Hot add ("refresh config file") is in the roadmap.

           And finally... i looked into the logs to see why self-healing
           wasn't working, and i found this on Xen's Side:

           **********
           2008-12-17 12:08:30 E [tcp-client.c:190:tcp_connect] espai6:
           non-blocking connect() returned: 111 (Connection refused)
           **********

           and it keeps saying this when i want to access  files which
        were
           created in the "old nodes".

           is this a bug? how can i work around this?

           If i create new stuff, though, it replicates to the 3 nodes, no
           problem with that.... the only problem is with the old
        files that
           were already present before i added the new node.

           Thanks for your help in advance, and let me know if you
        need any
           further information.

           _______________________________________________
           Gluster-devel mailing list
           Gluster-devel@xxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxx>
        <mailto:Gluster-devel@xxxxxxxxxx
        <mailto:Gluster-devel@xxxxxxxxxx>>

           http://lists.nongnu.org/mailman/listinfo/gluster-devel

        -- 
        Raghavendra G

    hi, yes.

    when gluster behaves like this, all nodes are running. As i said,
    when you create new data, it replicates to all the nodes of each
    group, so it's working fine.
    However, it keeps logging "connection refused", which i though was
    reported only when a node wasn't available, but they are all
    available and replicating data fine.

    The thing, though, is that old data is not beeing replicated into
    the new nodes?

    Is there any way to "force" replication to the new nodes? Could i
    be getting somehow the "connection refused" because new nodes
    won't accept previous data?

    Thanks for your help.

    _______________________________________________
    Gluster-devel mailing list
    Gluster-devel@xxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxx>
    http://lists.nongnu.org/mailman/listinfo/gluster-devel

--
hard work often pays off after time, but laziness always pays off now

Well.... sorry for having you bothered about this, i found what the 
problem was in the end.

I got mixed up with a couple of things:

-On the one hand, in .vol file in Xen there was a mistake, one node was 
declared with a wrong ip address, so it was giving the "connection 
refused" status. I didn't pay enough attention to all 6 nodes, and 5 
were replicating OK and i missed the one it wasn't and i thought that 
they were all working fine.
-On the other hand, old data was not replicated to the new nodes because 
i didn't set the attributes to "trusted gluster" when adding new nodes. 
Now all the data appears fine in all nodes.

Sorry for that and thanks for your help and patience :)