I was just trying to be helpful. *backs away slowly* Cameron On Fri, Jan 20, 2017 at 5:16 PM, Valeri Galtsev <galtsev@xxxxxxxxxxxxxxxxx> wrote: > > On Fri, January 20, 2017 7:00 pm, Cameron Smith wrote: > > Hi Valeri, > > > > > > Before you pull a drive you should check to make sure that doing so > > won't kill the whole array. > > Wow! What did I say to make you treat me as an ultimate idiot!? ;-) All my > comments, at least in my own reading, we about things you need to do to > make sure when you hot unplug bad drive it is indeed failed drive you have > to replace. > > Valeri > > > > > MegaCli can help you prevent a storage disaster and can let you have more > > insight into your RAID and the status of the virtual disks and the disks > > than make up each array. > > > > MegaCli will let you see the health and status of each drive. Does it > have > > media errors, is it in predictive failure mode, what firmware version > does > > it have etc. MegaCli will also let you see the status of the enclosure, > > the > > adapter and the virtual disks (logical disks). > > > > Before you pull a drive it's a good idea to properly prepare it for > > removal > > after confirming that it's OK to remove it. > > > > Here are a few commands: > > > > OFFLINE A DISK > > MegaCli -PDOffline -PhysDrv[32:0] -a0 > > > > MARK A DISK AS MISSING > > MegaCli -pdmarkmissing -physdrv[32:0] -a0 > > > > MARK A DISK AS PREPARED FOR REMOVAL > > MegaCli -pdprprmv -physdrv[32:0] -a0 > > > > Here are some easy overview commands that I run when first looking at the > > storage on a system: > > MegaCli -AdpAllInfo -aAll |grep -A 8 "Device Present"; > > MegaCli -PDList -aALL |grep "Firmware state"; > > MegaCli -PDList -aALL |grep "Media Error Count"; > > MegaCli -PDList -aALL |grep "Predictive Failure Count"; > > MegaCli -PDList -aALL |grep "Inquiry Data"; > > MegaCli -PDList -aALL |grep "Device Firmware Level"; > > MegaCli -PDList -aALL |grep "Drive has flagged"; > > MegaCli -PDList -aALL |grep Temperature; > > > > > > I also leverage MegaCli from bash scripts on my older Dell 11Gen that I > > run > > in cron.hourly that check the health status of my arrays and email me if > > there is an issue. > > > > > > > > Cameron Smith > > Technical Operations Manager > > Network Redux, LLC > > Cell: 503-926-4928 > > > > On Fri, Jan 20, 2017 at 3:38 PM, Valeri Galtsev > > <galtsev@xxxxxxxxxxxxxxxxx> > > wrote: > > > >> > >> On Fri, January 20, 2017 5:16 pm, Joseph L. Casale wrote: > >> >> This is why before configuring and installing everything you may want > >> to > >> >> attach drives one at a time, and upon boot take a note which physical > >> >> drive number the controller has for that drive, and definitely label > >> it > >> >> so > >> >> y9ou will know which drive to pull when drive failure is reported. > >> > > >> > Sorry Valeri, that only works if you're the only guy in the org. > >> > >> Well, this is true, I'm only one sysadmin working for two departments > >> here... > >> > >> > > >> > In reality, you cannot and should not rely on this given how easily it > >> can > >> > change and more than likely someone won't update it. > >> > > >> > Would you walk up to a production unit in a degraded state and simply > >> pull > >> > out a drive and risk a production issue? I wouldn't... > >> > >> I routinely do: I just hot remove failed drive from running production > >> systems, and replace with good drive (take a note what I said about my > >> job > >> above though). No one of our users ever notices. When I do it I usually > >> am > >> only taking chance of making degraded RAID6 (with one drive failed) > >> degraded yet even more and become not fault tolerant, though still on > >> line > >> with all data on it. But even that chance is slim given I take all > >> precautions when I am initially setting up the box. > >> > >> > > >> > You need to assert the position of the drive and prepare it in the > >> array > >> > controller for removal, then swap, scan, add to virtual disk then > >> initiate > >> > rebuild. > >> > >> Hm, not certain what process you describe. Most of my controllers are > >> 3ware and LSI, I just pull failed drive (and I know phailed physical > >> drive > >> number), put good in its place and rebuild stars right away. I have a > >> couple of Areca ones (I love them too!), I don't remember if I have to > >> manually initialize rebuild. (I'm lucky in using good drives - very > >> careful in choosing good ones ;-). > >> > >> > > >> > Not to mention if it's a busy system, confirm that the IO load from > >> the > >> > rebuild is not having an impact on the application. You may need to > >> lower > >> > the rate. > >> > >> Indeed, in 3ware configuration there is a choice of several grades of > >> rebuild vs IO, I usually choose slower rebuild - faster IO. If I have > >> only > >> one drive failing on me during a year in a given rack, there is almost > >> zero chance of second drive failing during quite some time (we had > >> heated > >> discussion about it once and I still stand by my opinion that drive > >> failures are independent events). So, my degraded RAID-6 can keep > >> running > >> and even still stay redundant ("single redundant" akin RAID-5) for the > >> period of rebuild, even if that takes quite long. > >> > >> Valeri > >> > >> > _______________________________________________ > >> > CentOS mailing list > >> > CentOS@xxxxxxxxxx > >> > https://lists.centos.org/mailman/listinfo/centos > >> > > >> > >> > >> ++++++++++++++++++++++++++++++++++++++++ > >> Valeri Galtsev > >> Sr System Administrator > >> Department of Astronomy and Astrophysics > >> Kavli Institute for Cosmological Physics > >> University of Chicago > >> Phone: 773-702-4247 > >> ++++++++++++++++++++++++++++++++++++++++ > >> _______________________________________________ > >> CentOS mailing list > >> CentOS@xxxxxxxxxx > >> https://lists.centos.org/mailman/listinfo/centos > >> > > _______________________________________________ > > CentOS mailing list > > CentOS@xxxxxxxxxx > > https://lists.centos.org/mailman/listinfo/centos > > > > > ++++++++++++++++++++++++++++++++++++++++ > Valeri Galtsev > Sr System Administrator > Department of Astronomy and Astrophysics > Kavli Institute for Cosmological Physics > University of Chicago > Phone: 773-702-4247 > ++++++++++++++++++++++++++++++++++++++++ > _______________________________________________ > CentOS mailing list > CentOS@xxxxxxxxxx > https://lists.centos.org/mailman/listinfo/centos > _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx https://lists.centos.org/mailman/listinfo/centos