Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello again.
2008/3/14, Aron Stansvik <elvstone@xxxxxxxxx>:> 2008/2/26, nickcheng <nick.cheng@xxxxxxxxxxxx>:>  > Hi Aron,>> >  Thanks for your patience.>  >  If you still got into trouble, please let me know.>  >  Thank you again,>>> I have now tried:>>  * Turning on/off NCQ in the Areca RAID.>  * Turning on/off read-ahead cache in the Areca RAID.>  * Putting the disks in anti-vibration mounts in 5.25" slots.>  * Switching SATA cables.>  * Using legacy ATA power connectors instead of the SATA ones.>>  But I still have the problem. The power supply is 650W so there should>  be plenty of power. There's only two Raptor disks, an Opteron CPU and>  an nVidia 6600GT in the machine.>>  The Raptor two Raptor disks have different firmware on them, could>  that cause any problem?>>  Two people who had read my post here on LKML have contacted me on>  e-mail and have the same problem, but they have Seagate and Samsung>  disks, and use the 1220 controller.>>  The problem is hard to trigger, I've not been able to trigger it with>  any benchmarking tool, but in ~95% of the cases I can trigger it by>  just copying a directory with lots of small files (around 500 MB).>>  Anyone else seeing this? I'd really like to get it to work since this>  is my only computer :(>>  Should I try with XFS or ReiserFS instead of EXT3?
I've now tried with ReiserFS, same problem. My distribution didn'thave XFS as a choice at installation so I haven't tried it yet. I'lltry with a LiveCD that supports XFS later. The two disks in the arrayare the only ones in the system.
Aron
>>  Regards,>> Aron>>>  >>  > -----Original Message----->  >  From: Aron Stansvik [mailto:elvstone@xxxxxxxxx]>  >>  > Sent: Tuesday, February 26, 2008 6:52 AM>  >  To: erich>  >  Cc: nick.cheng@xxxxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx;>  >  linux-scsi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx>  >  Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1>  >>  >  Hi Erich.>  >>  >  2008/2/25, nickcheng <nick.cheng@xxxxxxxxxxxx>:>  >  > Hi Aron,>  >  >  From our field experiences and customers' feedbacks, all of them direct>  >  to>  >  >  vibration and power issues.>  >  >  The vibration could be caused by FANs not only by themselves.>  >>  >  Okay. I have a chassi fan that is quite close to the drives, I will>  >  try disabling it. I've also ordered two Nexus TwinDisk anti vibration>  >  harddrive mounts with which I'll place the disks in my 5.25" slots>  >  instead, away from any fans.>  >>  >  If this doesn't work, I'm stumped, as I really don't think it's the>  >  power supply and I don't have the money to buy a new one.>  >>  >  >  You mentioned it could be the F/W issue.>  >  >  If the environment does not meet the prerequisite, FW could not work>  >  >  correctly.>  >  >  Actually FW just reacts to the situations not it causes the issue.>  >>  >  Of course, I understand this. Just trying to figure this problem out..>  >>  >  >  Please check it out!!>  >>  >  I'll report back with my findings with moving disk away from fans and>  >  using anti-vibrations mounts.>  >>  >  Thanks for taking your time to reply.>  >>  >  Aron>  >>  >  >  Thank you,>  >  >>  >  >>  >  >  -----Original Message----->  >  >  From: Aron Stansvik [mailto:elvstone@xxxxxxxxx]>  >  >  Sent: Sunday, February 24, 2008 1:54 AM>  >  >  To: nick.cheng@xxxxxxxxxxxx>  >  >  Cc: erich; akpm@xxxxxxxxxxxxxxxxxxxx; linux-scsi@xxxxxxxxxxxxxxx;>  >  >  linux-kernel@xxxxxxxxxxxxxxx>  >  >>  >  > Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1>  >  >>  >  >  Hello again Areca and LKML hackers.>  >  >>  >  >  2008/2/18, Aron Stansvik <elvstone@xxxxxxxxx>:>  >  >  > Hello Nick.>  >  >  >>  >  >  >  Sorry that I'm not answering until now. I've been busy.>  >  >  >>  >  >  >  2008/2/13, nickcheng <nick.cheng@xxxxxxxxxxxx>:>  >  >  >>  >  >  > > Hi Aron,>  >  >  >  >  From our experience and some customers' feedback, your issue could>  >  be>  >  >  caused>  >  >  >  >  by power instability or vibration to your HDs.>  >  >  >  >  Please check step by step:>  >  >  >  >  (1).under your original environment, increase the SCSI command>  >  value,>  >  >  >  >  default=30, with the shell script, set_scsicmd_timeout(). 90 or 120>  >  is>  >  >  >  >  enough.>  >  >  >  >  (2).if method 1 does not work, find out the vibration source or>  >  change>  >  >  the>  >  >  >  >  power supply>  >  >  >>  >  >  >>  >  >  > I will try to increase that value. I don't think it's vibration; the>  >  >  >  disks are firmly in place in a very heavy chassi (Silverstone>  >  >  >  SST-TJ05B-T). And I really don't think there's something wrong with>  >  >  >  the power supply, it's a pretty new Silverstone ST65ZF 650W. This is>  >  >  >  my own personal workstation, so I don't just have another power supply>  >  >  >  to test with :/>  >  >  >>  >  >  >  I will report back on my success/failure. Thanks for your answer.>  >  >>  >  >  I've now tried with both 90 and 120 for the timeout value, and the>  >  >  problem still persists. It seems to happen when lots of small writes>  >  >  are occuring, e.g. when installing something.>  >  >>  >  >  I really don't think the disks are vibrating, I don't see how they>  >  >  could. One more thing I'm going to test is to use the legacy ATA power>  >  >  connector instead of the SATA power connector. This was what I was>  >  >  using before when I only had a single drive and no RAID controller.>  >  >  Maybe my power supply is malfunctioning and not giving enough power on>  >  >  the SATA power connectors.. but I doubt it.>  >  >>  >  >  Is there anything else that could cause this? Have you guys at Areca>  >  >  tested the ARC-1200 with Raptors in RAID1?>  >  >>  >  >  :(>  >  >>  >  >  Regards,>  >  >  Aron>  >  >>  >  >  >>  >  >  >>  >  >  >  Aron>  >  >  >>  >  >  >>  >  >  >  >  If your still have any questions, please feel free to let me know.>  >  >  >  >>  >  >  >  >  P.S. The attached driver source, arcmsr-1.20.00.15-71224, has been>  >  >  >  >  upstreamed to kernel.org and will be released in kernel 2.6.25. If>  >  you>  >  >  like,>  >  >  >  >  you could update your driver with it.>  >  >  >  >  It fixes some minor bugs, but these bugs are nothing to do with>  >  your>  >  >  issue.>  >  >  >  >>  >  >  >  >>  >  >  >  >  -----Original Message----->  >  >  >  >  From: erich [mailto:erich@xxxxxxxxxxxx]>  >  >  >  >  Sent: Wednesday, February 13, 2008 4:33 PM>  >  >  >  >  To: (廣安科技)鄭守謙>  >  >  >  >  Subject: Fw: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1>  >  >  >  >>  >  >  >  >>  >  >  >  >>  >  >  >  >  ----- Original Message ----->  >  >  >  >  From: "Andrew Morton" <akpm@xxxxxxxxxxxxxxxxxxxx>>  >  >  >  >  To: "Aron Stansvik" <elvstone@xxxxxxxxx>>  >  >  >  >  Cc: <linux-kernel@xxxxxxxxxxxxxxx>; <linux-scsi@xxxxxxxxxxxxxxx>;>  >  >  "erich">  >  >  >  >  <erich@xxxxxxxxxxxx>>  >  >  >  >  Sent: Wednesday, February 13, 2008 4:03 PM>  >  >  >  >  Subject: Re: Aborted commands with arcmsr and 2xWD1500ADFD in RAID1>  >  >  >  >>  >  >  >  >>  >  >  >  >  >>  >  >  >  >  > (cc's added)>  >  >  >  >  >>  >  >  >  >  > On Mon, 11 Feb 2008 17:44:08 +0100 "Aron Stansvik">  >  >  <elvstone@xxxxxxxxx>>  >  >  >  >  > wrote:>  >  >  >  >  >>  >  >  >  >  >> Hello LKML.>  >  >  >  >  >>>  >  >  >  >  >> Under semi-high disk I/O (e.g. installing a compiled KDE), I get>  >  >  the>  >  >  >  >  >> following (accompanied by seconds of lock-ups on the machine):>  >  >  >  >  >>>  >  >  >  >  >> [ 7727.345183] arcmsr0: abort device command of scsi id = 0 lun>  >  = 0>  >  >  >  >  >> [ 7730.348776] arcmsr0:                 scsi id = 0 lun = 0 ccb>  >  =>  >  >  >  >  >> '0xdfb461c0' poll command abort successfully>  >  >  >  >  >> [ 8053.795943] arcmsr0: abort device command of scsi id = 0 lun>  >  = 0>  >  >  >  >  >> [ 8056.799528] arcmsr0:                 scsi id = 0 lun = 0 ccb>  >  =>  >  >  >  >  >> '0xdfb595e0' poll command abort successfully>  >  >  >  >  >> [ 8884.592810] arcmsr0: abort device command of scsi id = 0 lun>  >  = 0>  >  >  >  >  >> [ 8887.596392] arcmsr0:                 scsi id = 0 lun = 0 ccb>  >  =>  >  >  >  >  >> '0xdfb56d80' poll command abort successfully>  >  >  >  >  >> [ 8917.760216] arcmsr0: abort device command of scsi id = 0 lun>  >  = 0>  >  >  >  >  >> [ 8920.763797] arcmsr0:                 scsi id = 0 lun = 0 ccb>  >  =>  >  >  >  >  >> '0xdfb472c0' poll command abort successfully>  >  >  >  >  >> [ 9074.106547] arcmsr0: abort device command of scsi id = 0 lun>  >  = 0>  >  >  >  >  >>>  >  >  >  >  >> This is my setup:>  >  >  >  >  >>>  >  >  >  >  >> 1 x MSI K8N Master2-FAR>  >  >  >  >  >> 1 x Opteron 252>  >  >  >  >  >> 1 x Areca ARC1200 (sitting in a PCIe x4 socket)>  >  >  >  >  >> 2 x WD1500ADFD in RAID1>  >  >  >  >  >>>  >  >  >  >  >> astan@rubik:~$ uname -a>  >  >  >  >  >> Linux rubik 2.6.24-7-generic #1 SMP Thu Feb 7 01:29:58 UTC 2008>  >  >  i686>  >  >  >  >  >> GNU/Linux>  >  >  >  >  >> astan@rubik:~$ modinfo arcmsr>  >  >  >  >  >> filename:>  >  >  >  >  >> /lib/modules/2.6.24-7-generic/kernel/drivers/scsi/arcmsr/arcmsr.>  >  ko>  >  >  >  >  >> version:        Driver Version 1.20.00.15 2007/08/30>  >  >  >  >  >> license:        Dual BSD/GPL>  >  >  >  >  >> description:    ARECA (ARC11xx/12xx/13xx/16xx) SATA/SAS RAID>  >  HOST>  >  >  Adapter>  >  >  >  >  >> author:         Erich Chen <support@xxxxxxxxxxxx>>  >  >  >  >  >> srcversion:     28EAD6AB49D4491CA04D465>  >  >  >  >  >> [...]>  >  >  >  >  >>>  >  >  >  >  >> I've read some previous posts here on LKML that it could be the>  >  >  Areca>  >  >  >  >  >> firmware who doesn't like my WD disks. Anyone know if this is an>  >  >  IRQ>  >  >  >  >  >> handling problem in the kernel, or if it's a problem with the>  >  RAID>  >  >  >  >  >> controller firmware?>  >  >  >  >  >>>  >  >  >  >  >> Erich Chen (of Areca); have you tried the new ARC1200 in RAID1>  >  >  >  >  >> configuration with Raptor disks on Linux?>  >  >  >  >  >>>  >  >  >  >  >> As a side note, I can tell you that I first tried running>  >  FreeBSD>  >  >  6.3>  >  >  >  >  >> (RELENG_6) on this machine, but got random reboots during disk>  >  I/O>  >  >  >  >  >> (even with a kernel with KDB debugging turned on). This leads me>  >  to>  >  >  >  >  >> believe that it might be a firmware issue, and that Linux just>  >  >  handles>  >  >  >  >  >> it more gracefully than FreeBSD.>  >  >  >  >  >>>  >  >  >  >  >> Any ideas or advice is appriciated. This is my first post to the>  >  >  LKML,>  >  >  >  >  >> so please instruct me if you want more information or if you>  >  want>  >  >  me>  >  >  >  >  >> to take further debugging actions.>  >  >  >  >  >>>  >  >  >  >  >> Best regards,>  >  >  >  >  >> Aron Stansvik>  >  >  >  >  >>  >  >  >  >>  >  >  >  >>  >  >  >>  >  >>  >  >>  >>  >>?頨{.n?????%??橆??w?{.n???{殺??孜?雰}?笙??j:+v??茶庫全?2??霅??腄冠嗓??z蹂z嫡?+???▏?w噮f


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux