rdma problems with glusterfs 3.1.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Michael,

I was having at least a similar symptom to the "Transport endpoint is
not connected" message you list, and in my case it was because I was
using a version of ofed which wasn't good enough.  When I started
using ofed 1.5.1 then that problem went away.

You might look at the archives for a thread "hanging "df" (3.1,
infiniband)" from Oct 19th which contains the record of diagnosis and
repair, in case it offers you any help.

.. Lana (lana.deere at gmail.com)






On Thu, Oct 28, 2010 at 11:26 AM, Michael Galloway
<michael.d.galloway at gmail.com> wrote:
> Good day all,
>
> I?ve built a new glusterfs volume using 20 nodes of one of my clusters, each
> with a 2TB SATA disk, formatted with ext3 (system is centos 5.2, x86_64).
> The volume is such:
>
> Volume Name: gfsvol1
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 10 x 2 = 20
> Transport-type: rdma
> Bricks:
> Brick1: node002:/gfs
> Brick2: node003:/gfs
> Brick3: node004:/gfs
> Brick4: node005:/gfs
> Brick5: node006:/gfs
> Brick6: node007:/gfs
> Brick7: node008:/gfs
> Brick8: node009:/gfs
> Brick9: node010:/gfs
> Brick10: node011:/gfs
> Brick11: node012:/gfs
> Brick12: node013:/gfs
> Brick13: node014:/gfs
> Brick14: node015:/gfs
> Brick15: node016:/gfs
> Brick16: node017:/gfs
> Brick17: node019:/gfs
> Brick18: node020:/gfs
> Brick19: node021:/gfs
> Brick20: node022:/gfs
>
> The volume mounts on a client:
>
> [root at moldyn ~]# mount -t glusterfs -o transport=rdma node002:/gfsvol1
> /gfsvol1
> [root at moldyn ~]# df
> Filesystem ? ? ? ? ? 1K-blocks ? ? ?Used Available Use% Mounted on
> glusterfs#node002:/gfsvol1
> ? ? ? ? ? ? ? ? ? ? 19228583424 ? 2001664 18249825792 ? 1% /gfsvol1
>
> I get this error on a copy into the gluster volume:
>
> [mgx at moldyn ~]$ cp -R pmemd/ /gfsvol1/mgx/pmemd
> cp: writing `/gfsvol1/mgx/pmemd/fmdrun.out': Transport endpoint is not
> connected
> cp: closing `/gfsvol1/mgx/pmemd/fmdrun.out': Resource temporarily
> unavailable
>
> it did copy files, just failed on that one:
>
> /gfsvol1/mgx/pmemd/:
> total 4357376
> -rw-rw-r-- 1 root root ?514711552 Oct 27 13:02 fmdrun.out
> -rw-rw-r-- 1 mgx ?mgx ? ? ? ?4754 Oct 27 13:01 fmdrun.out.new
> -rw-rw-r-- 1 mgx ?mgx ? 851832631 Oct 27 13:03 fmdrun.out_run1
> -rw-rw-r-- 1 mgx ?mgx ? ? ? ? ?81 Oct 27 13:01 mdinfo
> -rw------- 1 mgx ?mgx ? ? ? ? 803 Oct 27 13:02 md.out
> -rw-rw-r-- 1 mgx ?mgx ? ? ? ? 342 Oct 27 13:03 md.sub
> -rw-rw-r-- 1 mgx ?mgx ?1567835776 Oct 27 13:02 new.mdcrd
> -rw-rw-r-- 1 mgx ?mgx ?1522326100 Oct 27 13:01 new.mdcrd_run1
> -rw-rw-r-- 1 mgx ?mgx ? ? ?155957 Oct 27 13:02 new.rst
> -rw-rw-r-- 1 mgx ?mgx ? ? ?155957 Oct 27 13:01 old.rst
> drwxrwxr-x 3 mgx ?mgx ? ? ? 40960 Oct 27 13:01 rbenew
> -rw-rw-r-- 1 mgx ?mgx ? ? ? ?1008 Oct 27 13:03 vp_mdrun.in
> -rw-rw-r-- 1 mgx ?mgx ? ? ? 26190 Oct 27 13:01 vp.prmtop
> -rw-rw-r-- 1 mgx ?mgx ? ? ?348092 Oct 27 13:01 vp_wat.prmtop
>
> pmemd/:
> total 4711216
> -rw-rw-r-- 1 mgx mgx ?876818259 Apr ?2 ?2010 fmdrun.out
> -rw-rw-r-- 1 mgx mgx ? ? ? 4754 Mar 19 ?2010 fmdrun.out.new
> -rw-rw-r-- 1 mgx mgx ?851832631 Mar ?6 ?2010 fmdrun.out_run1
> -rw-rw-r-- 1 mgx mgx ? ? ? ? 81 Apr ?2 ?2010 mdinfo
> -rw------- 1 mgx mgx ? ? ? ?803 Apr ?2 ?2010 md.out
> -rw-rw-r-- 1 mgx mgx ? ? ? ?342 Mar 31 ?2010 md.sub
> -rw-rw-r-- 1 mgx mgx 1567835776 Apr ?2 ?2010 new.mdcrd
> -rw-rw-r-- 1 mgx mgx 1522326100 Mar ?6 ?2010 new.mdcrd_run1
> -rw-rw-r-- 1 mgx mgx ? ? 155957 Apr ?2 ?2010 new.rst
> -rw-rw-r-- 1 mgx mgx ? ? 155957 Mar ?9 ?2010 old.rst
> drwxrwxr-x 3 mgx mgx ? ? ? 4096 Mar 31 ?2010 rbenew
> -rw-rw-r-- 1 mgx mgx ? ? ? 1008 Mar ?2 ?2010 vp_mdrun.in
> -rw-rw-r-- 1 mgx mgx ? ? ?26190 Mar ?2 ?2010 vp.prmtop
> -rw-rw-r-- 1 mgx mgx ? ? 348092 Mar ?2 ?2010 vp_wat.prmtop
>
> The fmdrun.out file is truncated and incorrect ownership.
>
> The volume was created following the 3.1 docu.
>
> Where is the problem at? Gluster? IB? my ib is ofed 1.3.1 and I have SDR
> mellenox HCA?s.
>
> --- michael
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux