problem with missing bricks

mohitanchlia at gmail.com (Mohit Anchlia) · Sun, 1 Jan 2012 10:47:03 -0800

Can you post the output of df -h? It looks like you have mounted the
drives as /gluster/brick[0-9]? If that is correct then you can
mitigate that by create a sub directory on that mount points accross
all the bricks, and do writes only inside that sub-dir.

On Sat, Dec 31, 2011 at 8:50 AM, Todd Pfaff <pfaff at rhpcs.mcmaster.ca> wrote:
> Gluster-user folks,
>
> I'm trying to use gluster in a way that may be a considered an unusual use
> case for gluster. ?Feel free to let me know if you think what I'm doing
> is dumb. ?It just feels very comfortable doing this with gluster.
> I have been using gluster in other, more orthodox configurations, for
> several years.
>
> I have a single system with 45 inexpensive sata drives - it's a self-built
> backblaze similar to that documented at this url but with some upgrades
> and substitutions:
>
> ?http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/
>
> We use this system for disk-to-disk backups only, no primary storage,
> nothing mission critical.
>
> For the past two years I have been using this system with linux software
> raid, with the drives organized as multiple raid 5/6/10 sets of 5 drives
> per set. ?This has worked ok but I have suffered enough multiple
> simultaneous drive failures to prompt me to explore alternatives to raid.
> Yes, I know, that's what I get for using cheap sata drives.
>
> What I'm experimenting with now is creating gluster distributed-replicated
> volumes on some of these drives, and maybe all in the future if this works
> reasonably well.
>
> At this point I am using 10 of the drives configured as shown here:
>
> ?Volume Name: volume1
> ?Type: Distributed-Replicate
> ?Status: Started
> ?Number of Bricks: 5 x 2 = 10
> ?Transport-type: tcp
> ?Bricks:
> ?Brick1: host:/gluster/brick01
> ?Brick2: host:/gluster/brick06
> ?Brick3: host:/gluster/brick02
> ?Brick4: host:/gluster/brick07
> ?Brick5: host:/gluster/brick03
> ?Brick6: host:/gluster/brick08
> ?Brick7: host:/gluster/brick04
> ?Brick8: host:/gluster/brick09
> ?Brick9: host:/gluster/brick05
> ?Brick10: host:/gluster/brick10
> ?Options Reconfigured:
> ?auth.allow: 127.0.0.1,10.10.10.10
>
>
> For the most part this is working fine so far. ?The problem I have run
> into several times now is that when a drive fails and the system is
> rebooted, the volume comes up without that brick. ?Gluster happily writes
> to the missing brick's mount point, thereby eventually filling up the root
> filesystem. ?Once the root filesystem is full and processes writing to
> gluster space are hung, I can never recover from this state without
> rebooting.
>
> Is there any way to avoid this problem of gluster writing to a brick
> path that isn't really populated by the intended brick filesystem?
> Does gluster not create any sort of signature or meta-data that
> indicates whether or not a path is really a gluster brick?
> How do others deal with missing bricks?
>
> I realize that ultimately I should get the bricks replaced as soon as
> possible but there may be times when I want to continue running for some
> time with a "degraded" volume if you will.
>
> Any and all ideas, suggestions, comments, criticisms are welcome.
>
> Cheers and Happy New Year,
> Todd
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users