Justin, There was actually a discussion I fired off a few weeks ago about how to best run SW RAID on this hardware. Here's the recap: We're running RHEL, so no access to ZFS/XFS. I really wish we could do ZFS, but no luck. The box presents 48 drives, split across 6 SATA controllers. So disks sda-sdh are on one controller, etc. In our configuration, I run a RAID5 MD array for each controller, then run LVM on top of these to form one large VolGroup. I found that it was easiest to setup ext3 with a max of 2TB partitions. So running on top of the massive LVM VolGroup are a handful of ext3 partitions, each mounted in the filesystem. This less than ideal (ZFS would allow us one large partition), but we're rewriting some software to utilize the multi-partition scheme. In this setup, we should be fairly protected against drive failure. We are vulnerable to a controller failure. If such a failure occurred, we'd have to restore from backup. Hope this helps, let me know if you have any questions or suggestions. I'm certainly no expert here! Thanks, Norman On 2/19/08, Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> wrote: > Norman, > > I am extremely interested in what distribution you are running on it and > what type of SW raid you are employing (besides the one you showed here), > are all 48 drives filled, or? > > Justin. > > On Tue, 19 Feb 2008, Norman Elton wrote: > > > Justin, > > > > This is a Sun X4500 (Thumper) box, so it's got 48 drives inside. > > /dev/sd[a-z] are all there as well, just in other RAID sets. Once you > > get to /dev/sdz, it starts up at /dev/sdaa, sdab, etc. > > > > I'd be curious if what I'm experiencing is a bug. What should I try to > > restore the array? > > > > Norman > > > > On 2/19/08, Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> wrote: > >> Neil, > >> > >> Is this a bug? > >> > >> Also, I have a question for Norman-- how come your drives are sda[a-z]1? > >> Typically it is /dev/sda1 /dev/sdb1 etc? > >> > >> Justin. > >> > >> On Tue, 19 Feb 2008, Norman Elton wrote: > >> > >>> But why do two show up as "removed"?? I would expect /dev/sdal1 to show up > >>> someplace, either active or failed. > >>> > >>> Any ideas? > >>> > >>> Thanks, > >>> > >>> Norman > >>> > >>> > >>> > >>> On Feb 19, 2008, at 12:31 PM, Justin Piszcz wrote: > >>> > >>>> How many drives actually failed? > >>>>> Failed Devices : 1 > >>>> > >>>> > >>>> On Tue, 19 Feb 2008, Norman Elton wrote: > >>>> > >>>>> So I had my first "failure" today, when I got a report that one drive > >>>>> (/dev/sdam) failed. I've attached the output of "mdadm --detail". It > >>>>> appears that two drives are listed as "removed", but the array is > >>>>> still functioning. What does this mean? How many drives actually > >>>>> failed? > >>>>> > >>>>> This is all a test system, so I can dink around as much as necessary. > >>>>> Thanks for any advice! > >>>>> > >>>>> Norman Elton > >>>>> > >>>>> ====== OUTPUT OF MDADM ===== > >>>>> > >>>>> Version : 00.90.03 > >>>>> Creation Time : Fri Jan 18 13:17:33 2008 > >>>>> Raid Level : raid5 > >>>>> Array Size : 6837319552 (6520.58 GiB 7001.42 GB) > >>>>> Device Size : 976759936 (931.51 GiB 1000.20 GB) > >>>>> Raid Devices : 8 > >>>>> Total Devices : 7 > >>>>> Preferred Minor : 4 > >>>>> Persistence : Superblock is persistent > >>>>> > >>>>> Update Time : Mon Feb 18 11:49:13 2008 > >>>>> State : clean, degraded > >>>>> Active Devices : 6 > >>>>> Working Devices : 6 > >>>>> Failed Devices : 1 > >>>>> Spare Devices : 0 > >>>>> > >>>>> Layout : left-symmetric > >>>>> Chunk Size : 64K > >>>>> > >>>>> UUID : b16bdcaf:a20192fb:39c74cb8:e5e60b20 > >>>>> Events : 0.110 > >>>>> > >>>>> Number Major Minor RaidDevice State > >>>>> 0 66 1 0 active sync /dev/sdag1 > >>>>> 1 66 17 1 active sync /dev/sdah1 > >>>>> 2 66 33 2 active sync /dev/sdai1 > >>>>> 3 66 49 3 active sync /dev/sdaj1 > >>>>> 4 66 65 4 active sync /dev/sdak1 > >>>>> 5 0 0 5 removed > >>>>> 6 0 0 6 removed > >>>>> 7 66 113 7 active sync /dev/sdan1 > >>>>> > >>>>> 8 66 97 - faulty spare /dev/sdam1 > >>>>> - > >>>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx > >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >>> > >> > > - > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html