Re: Advice for setup: SW RAID 6 vs JBOD

Hu Bert <revirii@xxxxxxxxxxxxxx> · Fri, 7 Jun 2019 07:38:04 +0200

If i remember correctly: in the video they suggested not to make a
RAID 10 too big (i.e. too many (big) disks), because the RAID resync
then could take a long time. They didn't mention a limit; on my 3
servers with 2 RAID 10 (1x4 disks, 1x6 disks), no disk failed so far,
but there were automatic periodic redundancy checks (mdadm checkarray)
which ran for a couple of days, increasing load on the servers and
responsiveness of glusterfs on the clients. Almost no one even noticed
that mdadm checks were running :-)

But if i compare it with our old JBOD setup: after the disk change the
heal took about a month, resulting in really poor performance on the
client side. As we didn't want to experience that period again ->
throw hardware at the problem. Maybe a different setup (10 disks -> 5
RAID 1, building a distribute replicate) would've been even better,
but so far we're happy with the current setup.

Am Do., 6. Juni 2019 um 18:48 Uhr schrieb Eduardo Mayoral <emayoral@xxxxxxxx>:
>
> Your comment actually helps me more than you think, one of the main
> doubts I have is whether I go for JOBD with replica 3 or SW RAID 6 with
> replica2 + arbitrer. Before reading your email I was leaning more
> towards JOBD, as reconstruction of a moderately big RAID 6 with mdadm
> can be painful too. Now I see a reconstruct is going to be painful
> either way...
>
> For the record, the workload I am going to migrate is currently
> 18,314,445 MB and 34,752,784 inodes (which is not exactly the same as
> files, but let's use that for a rough estimate), for an average file
> size of about 539 KB per file.
>
> Thanks a lot for your time and insights!
>
> On 6/6/19 8:53, Hu Bert wrote:
> > Good morning,
> >
> > my comment won't help you directly, but i thought i'd send it anyway...
> >
> > Our first glusterfs setup had 3 servers withs 4 disks=bricks (10TB,
> > JBOD) each. Was running fine in the beginning, but then 1 disk failed.
> > The following heal took ~1 month, with a bad performance (quite high
> > IO). Shortly after the heal hat finished another disk failed -> same
> > problems again. Not funny.
> >
> > For our new system we decided to use 3 servers with 10 disks (10 TB)
> > each, but now the 10 disks in a SW RAID 10 (well, we split the 10
> > disks into 2 SW RAID 10, each of them is a brick, we have 2 gluster
> > volumes). A lot of disk space "wasted", with this type of SW RAID and
> > a replicate 3 setup, but we wanted to avoid the "healing takes a long
> > time with bad performance" problems. Now mdadm takes care of
> > replicating data, glusterfs should always see "good" bricks.
> >
> > And the decision may depend on what kind of data you have. Many small
> > files, like tens of millions? Or not that much, but bigger files? I
> > once watched a video (i think it was this one:
> > https://www.youtube.com/watch?v=61HDVwttNYI). Recommendation there:
> > RAID 6 or 10 for small files, for big files... well, already 2 years
> > "old" ;-)
> >
> > As i said, this won't help you directly. You have to identify what's
> > most important for your scenario; as you said, high performance is not
> > an issue - if this is true even when you have slight performance
> > issues after a disk fail then ok. My experience so far: the bigger and
> > slower the disks are and the more data you have -> healing will hurt
> > -> try to avoid this. If the disks are small and fast (SSDs), healing
> > will be faster -> JBOD is an option.
> >
> >
> > hth,
> > Hubert
> >
> > Am Mi., 5. Juni 2019 um 11:33 Uhr schrieb Eduardo Mayoral <emayoral@xxxxxxxx>:
> >> Hi,
> >>
> >>     I am looking into a new gluster deployment to replace an ancient one.
> >>
> >>     For this deployment I will be using some repurposed servers I
> >> already have in stock. The disk specs are 12 * 3 TB SATA disks. No HW
> >> RAID controller. They also have some SSD which would be nice to leverage
> >> as cache or similar to improve performance, since it is already there.
> >> Advice on how to leverage the SSDs would be greatly appreciated.
> >>
> >>     One of the design choices I have to make is using 3 nodes for a
> >> replica-3 with JBOD, or using 2 nodes with a replica-2 and using SW RAID
> >> 6 for the disks, maybe adding a 3rd node with a smaller amount of disk
> >> as metadata node for the replica set. I would love to hear advice on the
> >> pros and cons of each setup from the gluster experts.
> >>
> >>     The data will be accessed from 4 to 6 systems with native gluster,
> >> not sure if that makes any difference.
> >>
> >>     The amount of data I have to store there is currently 20 TB, with
> >> moderate growth. iops are quite low so high performance is not an issue.
> >> The data will fit in any of the two setups.
> >>
> >>     Thanks in advance for your advice!
> >>
> >> --
> >> Eduardo Mayoral Jimeno
> >> Systems engineer, platform department. Arsys Internet.
> >> emayoral@xxxxxxxx - +34 941 620 105 - ext 2153
> >>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users@xxxxxxxxxxx
> >> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
> Eduardo Mayoral Jimeno
> Systems engineer, platform department. Arsys Internet.
> emayoral@xxxxxxxx - +34 941 620 105 - ext 2153
>
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users