On Wed, 16 Aug 2006, Arkadiusz Miskiewicz wrote: > On Wednesday 16 August 2006 20:14, Andrew Vasquez wrote: > > > > Everything works fine on 32-bit dual xeon machine with card inserted into > > > PCI slot. Here in new dual core opteron machine the card sits in PCI-X > > > slot. It's Thunder K8SRE S2891 mainboard, Transport GT24 B2891 1U > > > barebone, Tyan M2075 riser card, 2 x Opteron 270, 6GB ram. > > > > Have you ruled out motherboard/memory issues? We've run 23xx and 24xx > > boards on a variety of AMD64 motherboards with more than 4GB of > > memory. > Not exactly. I have three such servers (the same board, cpu, ram). > > One of these has PCI-E scsi raid controller and works without any problem. > > Second one (which I'm using for testing) has Adaptec SCSI controller > in Tyan specific TARRO slot (which afaik sits on the same PCI-X bus as PCI-X slot). > I had an array on that adaptec for some time and there were no problems. > The second one also got own qlogic 2312 card. > > Third machine has other qlogic 2312 card and I tried it two times, > too - the same issues observed. > > So mainboards and memory should be fine but I guess there is possibility > that PCI-X slot is somehow broken there. I'll test that in several weeks > (unfortunately I have no physical access to these now) by putting some > typical scsi controller into that pci-x slot. > > > > Note that booting with 32-bit kernel (which works fine on Xeon system) > > > doesn't cure the problem on Opteron system. Booting 64bit 2.6.18rc4 > > > kernel with mem=3G also doesn't fix anything. > > > > Hmm... Can you send your dmesg output post boot and driver load? > > dmesg and lspci -vvv attached. > > There is one thing that worries me: > QLogic QLA2340 - > ISP2312: PCI-X (133 MHz) @ 0000:09:08.0 hdma+, host#=0, fw=3.03.20 IPX > Why it's showing 133MHz here? > > Tyan manual ftp://ftp.tyan.com/manuals/m_s2891_100.pdf says on page 4 > ,,One PCI-X 100MHz slot (PCI-X B)''. > > Hm, while looking at that manual I see that on page 13 it says ,,one pci-x 100MHz device''. > Maybe that's the problem since I didn't remove Adaptec TARO controller. I should be able to > get this tested again without adaptec sooner than several weeks. Maybe I misunderstand > what documentation says here. > > Anyway still don't know why driver displays 133Mhz while manual says about > 100MHz one pci-x slot. that is odd... I'm just dumping the bits from the control-register which are set by hardware to indicate which type of slot the HBA is connected to... > > > /t is on tmpfs, /dev/sda2 in on FC array. Reading the same data several > > > times and I get different md5sum results each time, see below. > > > > > > How I can track where corruption occurs? > > I started dd'ing text file onto /dev/sda2, then back to tmpfs and diffing these. Corruption > is one byte, several not corrupted bytes and again one corrupted byte. Some regular pattern. Could you send me the a snippet (2 to 4KB) of the good and bad data? Regards, Andrew Vasquez - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html