brick out of space, unmounted brick

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Response inline.

Jeff White
Linux/Unix Systems Engineer
University of Pittsburgh - CSSD
Jaw171 at pitt.edu


On 10/17/2011 01:11 PM, Jeff Shaw wrote:
> Hello Gluster users,
> Before I put Gluster into production, I am wondering how it determines whether a byte can be written, and where I should look in the source code to change these behaviors. My experiences are with glusterfs 3.2.4 on CentOS 6 64-bit.
>
> Suppose I have a Gluster volume made up of four 1 MB bricks, like this
>
> Volume Name: test
> Type: Distributed-Replicate
> Status: Started
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: gluster0-node0:/brick0
> Brick2: gluster0-node1:/brick1
> Brick3: gluster0-node0:/brick2
> Brick4: gluster0-node1:/brick3
>
> The mounted Gluster volume will report that the size of the volume is 2 MB, which creates a false impression that it can hold a 2 MB file. This isn't too bad, since people are used to a file system's maximum file size being smaller than the file system's maximum total size.
>
> Scenario 1: One brick runs out of space first.
>
> Taking this a step further, suppose brick0 is actually 2 MB, and I attempt to copy a file having 2 MB to the Gluster volume. If Gluster chooses to copy the file to brick0 and brick1, then the copy succeeds, although brick1 only stores half the file. When brick0 fails, only half of the file is available for reading. It would be better if Gluster failed to continue writing when one brick in the replication group ran out of space.
>
> Scenario 2: One brick is umounted.
>
> Suppose after Scenario 1 completes, brick0 goes offline. Then, a user attempts to retrieve the 2 MB file. The user receives the file fragment. Because gluster0-node0:/brick0 is unmounted, the file doesn't exist at that location, and so the gateway copies the file fragment from gluster0-node1:/brick1 onto gluster0-node0:/brick0. Then, even worse, the user starts copying files onto the Gluster volume. All the files destined for the first replication group appear under /brick0, even though it's unmounted. This eventually will fill up the root file system.
>
> I think to fix this, when creating a file, Gluster should make sure that the file system that the brick was originally created on is mounted.

I had an idea for this already: http://bugs.gluster.com/show_bug.cgi?id=3578

> Also, perhaps bricks should only be able to be created at mount points.

I think this would be too limiting.  Some people might have a large 
/data mount point but only want /data/gluster to hold the gluster files.

> A colleague of mine suggested mounting all the Gluster bricks within another file system's path that's read only.

This would be more complicated for a gluster admin to set up but could 
be possible.  You could also mount tmpfs or something to /data then your 
real storage to /data/gluster.  That might work even though tmpfs itself 
won't work with Gluster (I don't think so at least) so I'm not sure what 
would happen if /data/gluster was unmounted and gluster suddenly fell 
into an unsupported filesystem type.

In any case I wouldn't force this to need to be true but just have it as 
a way an admin could design their servers if they wish.

> Gluster's source code is quite large, so if someone could point me to the right files to edit, I'd be happy to change its behavior to match what I expect.
>
> Thanks,
> Jeff
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux