Re: removing faulty drive on 3ware 9xxx card

Harry Mangalam <hjm@xxxxxxxxx> · Thu, 9 Jun 2005 13:08:36 -0700

That looks right, tho you haven't mentioned what version of the SW you're 
using. ANd you DO have the docs, right? ;)

If not, go here to get them:
http://www.3ware.com/support/downloadpageeng.asp?SNO=7

Or you could test the robustness of the system and just yank it.  I'd be 
interested in the results.. :)

After the bad disk is pulled, the rebuild should start immediately on your hot 
spare AFAIK, and when you replace the bad disk, you should then be able to 
specify it as the hot spare.

The web version of their SW (3dm2) works for me and is considerably more 
intuitive than the tw_cli (tho that's no saying a lot).

You might also try to get the SMART info from the disk (the 3ware SW can 
extract the raw numbers but will not interpret it).  

also:

Konstantin Olchanski <olchansk@xxxxxxxxxxxxx> recently wrote that:
I use the 3ware driver that comes with the Red Hat kernels, the
additional monitoring tools from 3ware do not work. SMART monitoring
works via "smartctl -a -d 3ware,0 /dev/twe0".
and added offline:
 BTW, I had to mknod /dev/twe0 manually, this
is how it looks like:

[root@tw00 ~]# ls -l /dev/twe0
crw-------  1 root root 254, 0 Jun  8 15:03 /dev/twe0

here's the section of man page for my version of tw_cli (2.00.00.042)

[maint] rebuild cid uid pid [ignoreECC]
    This command allows you to rebuild a DEGRADED unit by using the specified 
port. Rebuild only applies to redundant arrays such as RAID-1, RAID-5, 
RAID-10 and RAID-50. During rebuild, bad sectors on the source disk will 
cause the rebuild to fail. You can allow for the operation to continue via 
ignoreECC. Rebuild process is a background task and will change the state of 
a unit to REBUILDING. Various info commands also show a percent completion as 
rebuilding progresses. 

    Note that the port (disk) to be used to rebuild a unit, must be a SPARE or 
configured disk.

Let us know what happens...
hjm

On Thursday 09 June 2005 12:08 pm, Richard Jacobsen wrote:
> Hello everyone,
>
> I have a drive which is constantly putting out:
>
> 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0009): Drive timeout detected:port=4,
>
> However the 3ware cli reports it as still a valid member of the array:
>
> //beautemps> info c0
>
> Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify 
> IgnECC
> ---------------------------------------------------------------------------
>--- u0    RAID-5    OK             -      64K     2328.2    ON     OFF     
> OFF
>
> Port   Status           Unit   Size        Blocks        Serial
> ---------------------------------------------------------------
> p0     OK               u0     232.88 GB   488397168     WD-WMAEP28256
> p1     OK               u0     232.88 GB   488397168     WD-WMAEP28252
> p2     OK               u0     232.88 GB   488397168     WD-WMAEP27015
> p3     OK               u0     232.88 GB   488397168     WD-WMAEP28280
> p4     OK               u0     232.88 GB   488397168     WD-WMAEP28256
> p5     OK               u0     232.88 GB   488397168     WD-WMAEP28257
> p6     OK               u0     232.88 GB   488397168     WD-WMAEP28253
> p7     OK               u0     232.88 GB   488397168     WD-WMAEP28252
> p8     OK               u0     232.88 GB   488397168     WD-WMAEP28566
> p9     OK               u0     232.88 GB   488397168     WD-WMAEP25657
> p10    OK               u0     232.88 GB   488397168     WD-WMAEP28584
> p11    OK               -      232.88 GB   488397168     WD-WMAEP28250
>
> Since I'm assuming that this constant drive timeout is what is making my
> array show to a crawl, I'd like to remove p4 from the array, have the
> hotswap on p11 take over, then replace p4.
>
> I'm thinking that:
>
> maint remove c0 p4
>
> Is the command I'm looking for.  Any caveats before I try?
>
> Thanks,
> Richard

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm@xxxxxxxxx 
            <<plain text preferred>>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html