Re: Raid5 over sbp2 : sbp2 command abort

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Francois Barre wrote:
This is a cross-post (sorry for that), but I don't know where it comes from yet.

Alas we get similar reports about software RAID over SBP-2 now and then on linux1394-devel or -user. I very much suspect sbp2 to be the culprit.

One person reported different results with different software RAID levels but I am too lazy right now to dig for the post in the list archive.

Question to the linux-raid folks: Does md support disks on different SCSI host adapters to be in the same RAID set?

A. The setup
VIA EPIA 10k Nehemiah, OHCI with VIA
4 sbp2 250Go IDE drives

Are these drives' bridges based on a Proflific chip? If yes, check if you could get a firmware update.

Vanilla 2.6.15.1 kernel, mdadm 2.2, superblock 0.90
ohci1394+sbp2 in kernel (default params : serialize_io=1, ...), raid5
as a module.

I recommend to build the FireWire drivers as modules. This enables you to unload and reload them e.g. to recover from some failures or to try different parameters. However, static linking or building them as modules does not have an effect on reliability during data transfers.

B. The tests
Test0 : Creating a 4-drive raid5 with 1 drive missing, copying the 4th
drive content to the raid5, works great.
Stress-testing multiple drive copy seems to be ok (Test0 + various
tests), very responsive, absolutely no error, but Test1 has a lot of
'command abort' errors, which blocks io for seconds, then starts
again.

Test1 : Building from scratch the raid5 with 4 drives (i.e. none
missing), causes 'sbp2 : command abort' messages.

Are there any other suspicious messages from sbp2, ieee1394, or ohci1394?

At the end of Test1, raid5 is not created : one drive is set faulty.

C. The questions :
How could I run a paranoïd/degraded bandwidth mode ? I tried playing
with /proc/sys/dev/raid/speed_limit_max, reducing to far away from
highest bandwidth, but it did not have the expected behaviour : io
runs to highest bw for seconds, then stops, then runs again at highest
rate, ...

What about sbp2's max_speed parameter?

Is there a way to avoid write back at sbp2 level ? I could not find
any way to do so...

What do you mean by that?

What kernel version should I rather use ? Seems like scsi on 2.6.15.x
is not really trustworthy, should I run 2.6.14.x ?

"aborting sbp2 command" issues have been reported for quite a long time now. Especially for Linux 2.6, although 2.4's sbp2 isn't fundamentally different. I don't think 2.6.14.x would make a difference to 2.6.15.x with this particular problem.

BTW, I'm hoping to get some spare time in February in order to work on this particular problem. I never used software RAID over sbp2 myself and don't intend to do so any time soon, but I get what I suspect to be the same type of failures with a 1394a disk and with a 1394b JBOD device (or hardware "R"AID-0) myself.

In case of my 1394a disk, the failures vanish either with serialize_io=1 (this was not required with an older kernel; I don't remember which one) or --- curiously enough --- with "gap count optimization". As I wrote an hour ago on linux1394-user, gap count optimization is a performance tuning of the FireWire bus and is not yet implemented in the kernel. You can get gap count optimization manually with "echo p 0x00450000 | 1394commander" for a single external device or "echo p 0x004a0000 | 1394commander" if 4 external devices are daisy-chained. Run the command after all disks were connected and switched on, otherwise the command may inhibit access to newly added devices. www.linux1394.org has a link to 1394commander.
--
Stefan Richter
-=====-=-==- ---= ====-
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux