Re: [dm-devel] Problems with multipathd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



===> I found some settings in /sys/module/qla2xxx/parameters/...,
but most of them are read-only values. I have changed ql2xretrycount
and ql2xsuspendcount but without success. Any suggestions for
this driver?


Here are the interesting one I guess.

[root@s64p17bibro ~]# find /sys/class/ -name "*tmo*"
/sys/class/fc_remote_ports/rport-1:0-3/dev_loss_tmo
/sys/class/fc_remote_ports/rport-1:0-2/dev_loss_tmo
/sys/class/fc_remote_ports/rport-1:0-1/dev_loss_tmo
/sys/class/fc_remote_ports/rport-1:0-0/dev_loss_tmo
/sys/class/scsi_host/host1/lpfc_nodev_tmo

Ok, I have a 6 seconds timeout now :-)


I have commented this line, but udev still has difficulties to create this
links. Therefore I have changed /etc/dev.d/block/multipath.dev (the script
is attached at the end of this post) and added debug messages. The most
important modification is that kpartx uses the block-device-files in
/dev/mapper/... instead of /dev/...
===> Why isn't that the default? Are there any disadvantages?


Not really. All distributors seem to have their own ideas about naming
policies. You should ask about, and follow the Gentoo philosophy I
guess.


I'm sure of not beeing the only one who has problems with missing /dev/... links. It's possible that multipath installs a device-mapper table without errors, but kpartx fails because udev doesn't create links in /dev/... So, I think multipath.dev should execute kpartx with /dev/mapper/... instead of /dev/... by default.


===> Without "udevstart" udev doesn't create the /dev/150gb*
links! Is this a udev bug?

You can still identify the udev problems keeping the node creation
in /dev/. Maybe all path setupis done in the initrd/initramfs without
multipath being able to react.

multipath is able to react. I don't understand why I have to execute udevstart.



===> First multipathd says "8:0: tur checker reports
path is down" and multipath prints sda "failed" (ok).
After a few seconds sda is "ready" and multipathd says
"8:0: tur checker reports path is up"?! I have changed
nothing during this time.


Maybe the checker is confused by the long timeouts.
Worth another try after the lowering.

After lowering the timeouts to 6 seconds multipathd shows the same behavior.



===> Multipathing seems to work without but not with multipathd.
It's very slow, but Christophe Varoqui wrote that I have to lower
the HBA timeouts (unfortunately, I don't know how to do this,
see above). Does I really need multipathd? I suppose so :-)


multipathd is needed to reinstate paths.
In your case the rport disappears and reappears so the mecanism is all
hotplug-driven and thus may work without the daemon ... if memory
ressources permits hotplug and multipath(8) execution, that is.

What do you means with "In your case..."? Because 2.6 and udev are multipath-tools dependencies all systems running multipath have the same environment. They all use kernel 2.6 and udev, that is hotplug-driven. The kernel starts this hotplug process and udev executes multipath. Sorry, but I have to ask again: Does we really need multipathd?


After lowering dev_loss_tmo timeouts and stopping multipathd I have a working multipath environment :-))) I tested this with a little perl script and a mysql database:



My trafficmaker-host executed this script 27 times (parallel):

...
for(my $count=1;$count<=1000000;$count++)
{
  ...
  my $sql="INSERT INTO $table VALUES($id,\"$value\")";
  my $return=$dbh->do($sql);
  ...
}
...
{
  my $sql="SELECT COUNT(*) FROM $table WHERE id=$id";
  my $sth=$dbh->prepare($sql);
  my $return=$sth->execute();
  ...
  $selectCount=$sth->fetchrow_array();
  ...;
}


The database host had to insert this 30 byte strings and I have started some copy-jobs (cp -a /usr/* /partition_mounted_with_multipath/ etc.) to increase the I/O load. During this test I have disabled and enabled the different HBA-Switch-Ports with the following result: It took 6 to 15 seconds before "multipath -l" showed that a path is down (15 seconds because the host had a 30.0 CPU load and responded very slowly), but no INSERT got lost :-)))

But sometimes multipath seems to be a bit confused...



1.) one path disabled

In the majority of cases multipath prints...

testhalde2 sbin # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ #:#:#:#     8:0  [active]
 \_ 1:0:0:1 sdb 8:16 [active]


But sometimes I get...

testhalde2 usr # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 4:0:0:1 sdb 8:16 [active]



2.) all paths enabled (default)

In the majority of cases multipath prints...

testhalde2 sbin # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
 \_ 1:0:0:1 sdb 8:16 [active]
 \_ 0:0:0:1 sdc 8:32 [active]


But sometimes I get...

testhalde2 usr # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
 \_ 0:0:0:1 sdb 8:16 [active]
\_ round-robin 0 [enabled]
 \_ 4:0:0:1 sdc 8:32 [active]


Regards
Simon


[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux