Lana - Looks like you have the IPoIB stack installed, but not support for ibverbs. Let's try this - # yum install libibverbs # service glusterd restart Thanks, Craig -- Craig Carl Senior Systems Engineer; Gluster, Inc. Cell - ( 408) 829-9953 (California, USA) Office - ( 408) 770-1884 Gtalk - craig.carl at gmail.com Twitter - @gluster Installing Gluster Storage Platform, the movie! http://rackerhacker.com/2010/08/11/one-month-with-glusterfs-in-production/ From: "Lana Deere" <lana.deere at gmail.com> To: "Craig Carl" <craig at gluster.com> Cc: gluster-users at gluster.org, landman at scalableinformatics.com Sent: Tuesday, October 19, 2010 4:02:11 PM Subject: Re: hanging "df" (3.1, infiniband) They show up in ibhosts and I can ping or ssh via IPoIB to them, but perhaps they are not completely configured properly. Or perhaps I have mixed some references to the regular Ethernet into the configuration for rdma? Anyway, here are the outputs you requested: [root at storage0 ~]# lsmod Module Size Used by iptable_filter 36161 0 ip_tables 55201 1 iptable_filter x_tables 50505 1 ip_tables fuse 83057 1 autofs4 63049 3 hidp 83521 2 rfcomm 104937 0 l2cap 89409 10 hidp,rfcomm bluetooth 118853 5 hidp,rfcomm,l2cap lockd 101553 0 sunrpc 199945 2 lockd cpufreq_ondemand 42449 8 acpi_cpufreq 47937 0 freq_table 38977 2 cpufreq_ondemand,acpi_cpufreq ib_iser 69569 0 libiscsi2 77765 1 ib_iser scsi_transport_iscsi2 74073 2 ib_iser,libiscsi2 scsi_transport_iscsi 35017 1 scsi_transport_iscsi2 ib_srp 67465 0 rds 401393 0 ib_sdp 144285 0 ib_ipoib 113057 0 ipoib_helper 35537 2 ib_ipoib ipv6 435489 77 ib_ipoib xfrm_nalgo 43333 1 ipv6 crypto_api 42945 1 xfrm_nalgo rdma_ucm 47681 0 rdma_cm 68437 4 ib_iser,rds,ib_sdp,rdma_ucm ib_ucm 50121 0 ib_uverbs 68720 2 rdma_ucm,ib_ucm ib_umad 50153 0 ib_cm 72809 4 ib_srp,ib_ipoib,rdma_cm,ib_ucm iw_cm 43465 1 rdma_cm ib_addr 41929 1 rdma_cm ib_sa 74953 4 ib_srp,ib_ipoib,rdma_cm,ib_cm mlx4_ib 94461 0 ib_mad 70629 4 ib_umad,ib_cm,ib_sa,mlx4_ib ib_core 104901 15 ib_iser,ib_srp,rds,ib_sdp,ib_ipoib,rdma_ucm,rdma_cm,ib_ucm,ib_uverbs,ib_umad,ib_cm,iw_cm,ib_sa,mlx4_ib,ib_mad xfs 508625 1 loop 48721 0 dm_mirror 54737 0 dm_multipath 56921 0 scsi_dh 42177 1 dm_multipath raid456 152417 1 xor 38865 1 raid456 video 53197 0 backlight 39873 1 video sbs 49921 0 power_meter 47053 0 hwmon 36553 1 power_meter i2c_ec 38593 1 sbs dell_wmi 37601 0 wmi 41985 1 dell_wmi button 40545 0 battery 43849 0 asus_acpi 50917 0 acpi_memhotplug 40517 0 ac 38729 0 parport_pc 62313 0 lp 47121 0 parport 73165 2 parport_pc,lp mlx4_en 107985 0 joydev 43969 0 i2c_i801 41813 0 igb 122709 0 i2c_core 56641 2 i2c_ec,i2c_i801 8021q 57425 1 igb shpchp 70893 0 mlx4_core 152773 2 mlx4_ib,mlx4_en serio_raw 40517 0 dca 41221 1 igb sg 70377 0 pcspkr 36289 0 dm_raid45 99657 0 dm_message 36289 1 dm_raid45 dm_region_hash 46145 1 dm_raid45 dm_log 44993 3 dm_mirror,dm_raid45,dm_region_hash dm_mod 101649 4 dm_mirror,dm_multipath,dm_raid45,dm_log dm_mem_cache 38977 1 dm_raid45 mpt2sas 159337 12 scsi_transport_sas 66753 1 mpt2sas ahci 69705 6 libata 209489 1 ahci sd_mod 56513 32 scsi_mod 196953 10 ib_iser,libiscsi2,scsi_transport_iscsi2,ib_srp,scsi_dh,sg,mpt2sas,scsi_transport_sas,libata,sd_mod raid1 56001 3 ext3 168913 2 jbd 94769 1 ext3 uhci_hcd 57433 0 ohci_hcd 56309 0 ehci_hcd 66125 0 [root at storage0 ~]# ibv_devinfo libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0 No IB devices found [root at storage0 ~]# lspci 00:00.0 Host bridge: Intel Corporation 5520 I/O Hub to ESI Port (rev 22) 00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22) 00:03.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 3 (rev 22) 00:05.0 PCI bridge: Intel Corporation 5520/X58 I/O Hub PCI Express Root Port 5 (rev 22) 00:07.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 7 (rev 22) 00:09.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 9 (rev 22) 00:13.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub I/OxAPIC Interrupt Controller (rev 22) 00:14.0 PIC: Intel Corporation 5520/5500/X58 I/O Hub System Management Registers (rev 22) 00:14.1 PIC: Intel Corporation 5520/5500/X58 I/O Hub GPIO and Scratch Pad Registers (rev 22) 00:14.2 PIC: Intel Corporation 5520/5500/X58 I/O Hub Control Status and RAS Registers (rev 22) 00:14.3 PIC: Intel Corporation 5520/5500/X58 I/O Hub Throttle Registers (rev 22) 00:16.0 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22) 00:16.1 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22) 00:16.2 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22) 00:16.3 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22) 00:16.4 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22) 00:16.5 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22) 00:16.6 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22) 00:16.7 System peripheral: Intel Corporation 5520/5500/X58 Chipset QuickData Technology Device (rev 22) 00:1a.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #4 00:1a.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #5 00:1a.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #6 00:1a.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #2 00:1d.0 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #1 00:1d.1 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #2 00:1d.2 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB UHCI Controller #3 00:1d.7 USB Controller: Intel Corporation 82801JI (ICH10 Family) USB2 EHCI Controller #1 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) 00:1f.0 ISA bridge: Intel Corporation 82801JIR (ICH10R) LPC Interface Controller 00:1f.2 SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller 00:1f.3 SMBus: Intel Corporation 82801JI (ICH10 Family) SMBus Controller 01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 01:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02) 05:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0) 06:01.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200eW WPCM450 (rev 0a) [root at storage0 ~]# /etc/init.d/openibd status Low level hardware support loaded: mlx4_ib Upper layer protocol modules: ib_iser ib_srp rds ib_sdp ib_ipoib User space access modules: rdma_ucm ib_ucm ib_uverbs ib_umad Connection management modules: rdma_cm ib_cm iw_cm Configured IPoIB interfaces: ib0 Currently active IPoIB interfaces: ib0 [root at storage0 ~]# .. Lana (lana.deere at gmail.com) On Tue, Oct 19, 2010 at 6:48 PM, Craig Carl <craig at gluster.com> wrote: > Lana - > The first couple of lines of the log identify our problem - > > [2010-10-19 07:47:49.315416] C [rdma.c:3817:rdma_init] rpc-transport/rdma: > No IB devices found > [2010-10-19 07:47:49.315438] E [rdma.c:4744:init] rdma.management: Failed to > initialize IB Device > [2010-10-19 07:47:49.315452] E [rpc-transport.c:965:rpc_transport_load] > rpc-transport: 'rdma' initialization failed > > Are you sure your IB cards are working? Can you send the output of - > > # lsmod > # ibv_devinfo > # lspci > # /etc/init.d/openibd status > > > > Thanks, > > Craig > > -- > Craig Carl > Senior Systems Engineer; Gluster, Inc. > Cell - (408) 829-9953 (California, USA) > Office - (408) 770-1884 > Gtalk - craig.carl at gmail.com > Twitter - @gluster > Installing Gluster Storage Platform, the movie! > http://rackerhacker.com/2010/08/11/one-month-with-glusterfs-in-production/ > > > ________________________________ > From: "Lana Deere" <lana.deere at gmail.com> > To: "Craig Carl" <craig at gluster.com> > Cc: gluster-users at gluster.org, landman at scalableinformatics.com > Sent: Tuesday, October 19, 2010 3:29:41 PM > Subject: Re: hanging "df" (3.1, infiniband) > > For the last little while I've been using storage0 as both client and > server, so those files are both client and server files at the same > time. If it would be helpful, I could go back to using a different > host as client (but then 'df' will hang instead of reporting the > Transport message). > > [root at storage0 ~]# cat /etc/glusterd/.cmd_log_history > [2010-10-19 07:54:36.244333] peer probe : on host storage1:24007 > [2010-10-19 07:54:36.249891] peer probe : on host storage1:24007 FAILED > [2010-10-19 07:54:43.745558] peer probe : on host storage2:24007 > [2010-10-19 07:54:43.750752] peer probe : on host storage2:24007 FAILED > [2010-10-19 07:54:48.915378] peer probe : on host storage3:24007 > [2010-10-19 07:54:48.920595] peer probe : on host storage3:24007 FAILED > [2010-10-19 07:59:49.737251] Volume create : on volname: RaidData attempted > [2010-10-19 07:59:49.737314] Volume create : on volname: RaidData > type:DEFAULT count:4 bricks: storage0:/data storage1:/data > storage2:/data storage3:/data > [2010-10-19 07:59:49.737631] Volume create : on volname: RaidData SUCCESS > [2010-10-19 08:01:36.909963] volume start : on volname: RaidData SUCCESS > > The /var/log file was pretty big, so I put it on pastebin: > http://pastebin.com/m6WbHPUp > > > .. Lana (lana.deere at gmail.com) > > > > > > > On Tue, Oct 19, 2010 at 6:10 PM, Craig Carl <craig at gluster.com> wrote: >> Lana - >> Can you also post the contents of >> >> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log >> and >> /etc/glusterd/.cmd_log_history >> >> on both the client and server to the list? >