Device-mapper-multipath not working correctly with GNBD devices

"Svetoslav Rogalski" <svetoslav.rogalski@xxxxxxxxxxxx> · Tue, 4 Mar 2008 09:30:38 +0200

Hi all,

I am trying to configure a failover multipath between
2 GNBD devices.

I have a 4 nodes Redhat Cluster Suite (RCS) cluster.
3 of them are used for running services, 1 of them for central storage. In the future
I am going to introduce another machine for central storage. The 2 storage
machine are going to share/export the same disk. The idea is not to have a
single point of failure on the machine exporting the storage.

For concept testing I am using one machine on which I
have configured 2 GNBD exports, which are exporting exactly the same disk.
These are configured with:

# /sbin/gnbd_export -d /dev/sdb1 -e gnbd0 -u gnbd

# /sbin/gnbd_export -d /dev/sdb1 -e gnbd1 -u gnbd

They are exporting with the same id, so the multipath
driver will automatically configure them as alternative paths to the same
storage.

Now on one of the cluster nodes used for running
services I am importing these GNBD devices with:

# /sbin/gnbd_import -i gnbd1

where gnbd1 is the hostname of the machine exporting
the GNBD devices.

And I have these imported ok:

# gnbd_import -l

Device name : gnbd1

----------------------

    Minor # : 0

 sysfs name : /block/gnbd0

     Server : gnbd11

       Port : 14567

      State : Open Connected
Clear

   Readonly : No

    Sectors : 41941688

Device name : gnbd0

----------------------

    Minor # : 1

 sysfs name : /block/gnbd1

     Server : gnbd1

       Port : 14567

      State : Open Connected
Clear

   Readonly : No

    Sectors : 41941688

#

After, I have configured the device-mapper multipath
by commenting the "blacklist" section in /etc/multipath.conf and
adding this "defaults" section:

defaults {

user_friendly_names yes

polling_interval 5

        #path_grouping_policy
failover

  path_grouping_policy multibus

        rr_min_io
1

        failback
immediate

        #failback
manual

no_path_retry fail

#no_path_retry queue

}

Now I have the mpath device configured correctly
(IMHO):

# multipath -ll

mpath0 (gnbd) dm-2 GNBD,GNBD

[size=20G][features=0][hwhandler=0]

\_ round-robin 0 [prio=2][enabled]

 \_ #:#:#:# gnbd0 252:0 [active][ready]

 \_ #:#:#:# gnbd1 252:1 [active][ready]

#

# dmsetup ls

mpath0 (253, 2)

VolGroup00-LogVol01 (253, 1)

VolGroup00-LogVol00 (253, 0)

#

Now I mkfs.ext3 over the mpath0 device to create a
filesystem, then mount. After I start to copy a file (with scp - to have a
progress bar) and during the copy process I shutdown one of the exported GNBD
device on the disk exporting machine with:

# gnbd_export -r gnbd1 -O

After a while in the maillog:

gnbd_recvd[3357]: client lost connection with gnbd11
: Broken pipe

gnbd_recvd[3357]: reconnecting

kernel: gnbd1: Receive control failed (result -32)

kernel: gnbd1: shutting down socket

kernel: exiting GNBD_DO_IT ioctl

kernel: gnbd1: Attempted send on closed socket

gnbd_recvd[3357]: ERROR [gnbd_recvd.c:292] login
refused by the server : No such

device

gnbd_recvd[3357]: reconnecting

kernel: device-mapper: multipath: Failing path 252:1.

multipathd: gnbd1: directio checker reports path is
down

multipathd: checker failed path 252:1 in map mpath0

multipathd: mpath0: remaining active paths: 1

gnbd_recvd[3357]: ERROR [gnbd_recvd.c:292] login
refused by the server : No such

device

gnbd_recvd[3357]: reconnecting

Now the copy process is freezed. It stays that way
until the GNBD device is exported again. I try some commands on the multipath
machine:

# multipath -ll

gnbd1: checker msg is "directio checker reports
path is down"

mpath0 (gnbd) dm-2 GNBD,GNBD

[size=20G][features=0][hwhandler=0]

\_ round-robin 0 [prio=1][active]

 \_ #:#:#:# gnbd0 252:0 [active][ready]

 \_ #:#:#:# gnbd1 252:1 [failed][faulty]

<freezed, the prompt is not returning back>

This prompt get back after the GNBD device is
exported again.

My expectations were that in such a scenario the
multipath driver is going to switch the requests to the other path and
everything should continue to work. Am I wrong?

I have upgraded to the last version of all the RPMs.
I am using CentOS 5.1.

I have tried different multipath settings (which are
commented out in the multipath.conf "defaults" section I pasted
previously), but nothing happens.

This may be useful. When starting the machine in the
log:

multipathd: gnbd0: add path (uevent)

kernel: device-mapper: multipath round-robin: version
1.0.0 loaded

multipathd: mpath0: load table [0 41941688 multipath
0 0 1 1 round-robin 0 1 1

252:0 1000]

multipathd: mpath0: event checker started

multipathd: dm-2: add map (uevent)

multipathd: dm-2: devmap already registered

gnbd_recvd[3357]: gnbd_recvd started

kernel: resending requests

multipathd: gnbd1: add path (uevent)

multipathd: mpath0: load table [0 41941688 multipath
0 0 1 1 round-robin 0 2 1

252:0 1000 252:1 1000]

multipathd: dm-2: add map (uevent)

multipathd: dm-2: devmap already registered

Maybe this is a bug of GNBD not the multipath? Any
help for getting this
working will be very appreciated.

Thanks.

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos