pdcraid bug in 2.4.18

Chris Abbey <linux@xxxxxxxxxx> · Sun, 19 May 2002 13:16:01 -0500 (CDT)

Hi all, a few weeks ago I posted that I was having problems with a
Promise 20268R controller based stripe set. The executive summary of
that post was that reading the first 512 bytes from the array was
returning all zeros, whereas reading from the first drive in the array
returned a valid partition table. Further digging found that it wasn't
returning all zeros so much as it was returning the contents of the
second disc instead of the first (which was in fact all zeros since
the first partition on the array had been cleared using the bin only
driver from promise and dd from /dev/zero). Further the pattern continued
all through the array, only data from the second drive was ever returned,
and it was duplicated in blocks of stride length. Lots of printk's in
the driver latter I figured out why. Here is an snippet of the printk
output from the pdcraid's module init (all lines that begin "cabbey:" are
my additions):

> cabbey: probedisk(4, 0, 0)
> cabbey: read_disk_sb(33, 0)
> cabbey: calc_pdcblock_offset(33, 0)
> cabbey: calc_pdcblock_offset returning 195371505
> cabbey: probedisk(5, 0, 0)
> cabbey: read_disk_sb(33, 64)
> cabbey: calc_pdcblock_offset(33, 64)
> cabbey: probedisk(6, 0, 0)
> cabbey: read_disk_sb(34, 0)
> cabbey: calc_pdcblock_offset(34, 0)
> cabbey: calc_pdcblock_offset returning 195371505
> cabbey: probedisk(7, 0, 0)
> cabbey: read_disk_sb(34, 64)
> cabbey: calc_pdcblock_offset(34, 64)
> cabbey: probedisk(8, 0, 0)
> cabbey: read_disk_sb(56, 0)
> cabbey: probedisk(9, 0, 0)
[...]
> cabbey: probedisk(13, 0, 0)
> cabbey: read_disk_sb(88, 64)
> ataraid/d0:
>  unknown partition table
> Drive 0 is 95396 Mb (34 / 0)
> Drive 1 is 95396 Mb (34 / 0)
> Raid0 array consists of 2 drives.

Note that when the drives are probed it is done with major/minor
pairs 33/0 and 34/0. This is correct. Note however that latter
on when the partition table is not found that they are both
being treated as 34/0!!

By inserting a line at 557 of pdcraid.c as such I was able to force
the drive to be the correct device node and every thing worked fine
from there forward. (it's a horid hack, I know, but a functional one)

        raid[0].disk[0].device = MKDEV(33,0);

It's important that this be fixed prior to the ataraid_register_disk call
that follows, because that eventually calls into the partition checking
code.

I'm not sure why both devices end up with the second device's node, if
the two devices weren't exactly the same make/model I'm afraid I'd see
that they both end up with complete copies of the second drive's data,
almost as if it just got a shallow copy of a structure that was maipulated
as the search loop progressed. But I don't see any evidence of that type
of bug in the code.

I should point out for completeness that I'm working with the 2.4.18
kernel sources SuSE shipped with 8.0 (and hence the cc: to them) however,
while they have a number of patches to deal with the card as a raw IDE
device, there are no changes in their code base for when it's under
pdcraid's control.

-- 
Never make a technical decision based upon the politics of the situation.
Never make a political decision based upon technical issues.
The only place these realms meet is in the mind of the unenlightened.
			-- Geoffrey James, The Zen of Programming