GFS LogVol00cluster.1: withdrawn / rejecting I/O to dead device

<tom-fedora@xxxxxxxxxxxxx> · Tue, 27 Sep 2005 23:16:57 +0200

Hi,

we are building a HA cluster with GFS6.1 and Fedora Core 4

Our SAN box had an outage and was then reconnected.

Now, we are unable to mount the clusterfilesystem gfs.

Sep 27 20:05:19 www5 kernel: scsi2 (0:0): rejecting I/O to dead device
Sep 27 20:05:19 www5 kernel: GFS: fsid=xxxcluster:LogVol00cluster.1: fatal:
I/O error
Sep 27 20:05:19 www5 kernel: GFS: fsid=xxxcluster:LogVol00cluster.1:   block
= 9498835
Sep 27 20:05:19 www5 kernel: GFS: fsid=xxxcluster:LogVol00cluster.1:
function = gfs_logbh_wait
Sep 27 20:05:19 www5 kernel: GFS: fsid=xxxcluster:LogVol00cluster.1:   file
= /usr/src/build/607778-i686/BUILD/smp/src/gfs/dio.c, line = 923
Sep 27 20:05:19 www5 kernel: GFS: fsid=xxxcluster:LogVol00cluster.1:   time
= 1127844319
Sep 27 20:05:19 www5 kernel: GFS: fsid=xxxcluster:LogVol00cluster.1: about
to withdraw from the cluster
Sep 27 20:05:19 www5 kernel: GFS: fsid=xxxcluster:LogVol00cluster.1: waiting
for outstanding I/O
Sep 27 20:05:19 www5 kernel: GFS: fsid=xxxcluster:LogVol00cluster.1: telling
LM to withdraw
Sep 27 20:05:19 www5 kernel: lock_dlm: withdraw abandoned memory
Sep 27 20:05:19 www5 kernel: GFS: fsid=xxxcluster:LogVol00cluster.1:
withdrawn
Sep 27 20:05:43 www5 kernel: scsi2 (0:0): rejecting I/O to dead device
Sep 27 20:05:43 www5 kernel: Buffer I/O error on device dm-3, logical block
20971504
Sep 27 20:05:43 www5 kernel: scsi2 (0:0): rejecting I/O to dead device
Sep 27 20:05:43 www5 kernel: Buffer I/O error on device dm-3, logical block
20971504

Sep 27 20:52:17 www3 kernel: scsi2 (0:0): rejecting I/O to dead device
Sep 27 20:52:17 www3 kernel: Buffer I/O error on device dm-1, logical block
20971504
Sep 27 20:52:17 www3 kernel: scsi2 (0:0): rejecting I/O to dead device
Sep 27 20:52:17 www3 kernel: Buffer I/O error on device dm-1, logical block
20971504
Sep 27 20:52:17 www3 kernel: scsi2 (0:0): rejecting I/O to dead device
Sep 27 20:52:17 www3 kernel: Buffer I/O error on device dm-1, logical block
0

Rejecting/lm withdraw did not appear on the third node, also lm withdraw did
not appear on www3

[root@www4 ~]# mount /mnt/ /dev/VolGroupDaten01/LogVol00cluster -t gfs
mount: /mnt/ is not a block device

We need to avoid restarting the server nodes - the volume groups so far are
visible and access with eg. fisk is possible.
Another single server which only uses a non-cluster LVM2 volume mount worked
without reboot.

Any help would be really welcome,

Thanks
Thomas

[root@www3 ~]# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "VolGroupDaten02" using metadata type lvm2
  Found volume group "VolGroupDaten01" using metadata type lvm2

[root@www3 ~]# lvdisplay VolGroupDaten01
  --- Logical volume ---
  LV Name                /dev/VolGroupDaten01/LogVol00cluster
  VG Name                VolGroupDaten01
  LV UUID                o38bnG-sLSi-WhUJ-47Bs-3u6g-qSUm-5yBkNr
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                80.00 GB
  Current LE             20480
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:1

[root@www3 ~]# pvdisplay

...
...
...

  --- Physical volume ---
  PV Name               /dev/sde
  VG Name               VolGroupDaten01
  PV Size               540.00 GB / not usable 0
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              138239
  Free PE               117759
  Allocated PE          20480
  PV UUID               oVeByo-8IoA-qFlt-fsN9-ULAR-xUju-niLTEO

[root@www3 ~]# cman_tool status
Protocol version: 5.0.1
Config version: 2
Cluster name: xxxcluster
Cluster ID: 57396
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 3
Expected_votes: 3
Total_votes: 3
Quorum: 2
Active subsystems: 3
Node name: www3.xxx.cc
Node addresses: 192.168.2.23

[root@www3 ~]# cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    3   M   www5.xxx.cc
   2    1    3   M   www4.xxx.cc
   3    1    3   M   www3.xxx.cc

<?xml version="1.0"?>
<cluster name="xxxcluster" config_version="3">
  <clusternodes>
    <clusternode name="www5.xxx.cc" votes="1">
     <fence>
      <method name="single">
       <device name="human" ipaddr="192.168.2.25"/>
     </method>
    </fence>
   </clusternode>
   <clusternode name="www3.xxx.cc" votes="1">
    <fence>
     <method name="single">
       <device name="human" ipaddr="192.168.2.23"/>
     </method>
    </fence>
    </clusternode>
   <clusternode name="www4.xxx.cc" votes="1">
    <fence>
     <method name="single">
       <device name="human" ipaddr="192.168.2.24"/>
     </method>
    </fence>
  </clusternode>
 </clusternodes>
<fence_devices>
 <fence_device name="human" agent="fence_manual"/>
</fence_devices>
</cluster>
[root@www3 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster name="xxxcluster" config_version="3">
  <clusternodes>
    <clusternode name="www5.xxx.cc" votes="1">
     <fence>
      <method name="single">
       <device name="human" ipaddr="192.168.2.25"/>
     </method>
    </fence>
   </clusternode>
   <clusternode name="www3.xxx.cc" votes="1">
    <fence>
     <method name="single">
       <device name="human" ipaddr="192.168.2.23"/>
     </method>
    </fence>
    </clusternode>
   <clusternode name="www4.xxx.cc" votes="1">
    <fence>
     <method name="single">
       <device name="human" ipaddr="192.168.2.24"/>
     </method>
    </fence>
  </clusternode>
 </clusternodes>
<fence_devices>
 <fence_device name="human" agent="fence_manual"/>
</fence_devices>
</cluster>

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster