> On Wed, Sep 7, 2011 at 4:27 PM, Dan Bretherton > <d.a.bretherton at reading.ac.uk <mailto:d.a.bretherton at reading.ac.uk>> > wrote: > > > On 17/08/11 16:19, Dan Bretherton wrote: > > > > > > Dan Bretherton wrote: > > > On 15/08/11 20:00, gluster-users-request at gluster.org > <mailto:gluster-users-request at gluster.org> wrote: > > Message: 1 > Date: Sun, 14 Aug 2011 23:24:46 +0300 > From: "Deyan Chepishev - > SuperHosting.BG"<dchepishev at superhosting.bg > <mailto:dchepishev at superhosting.bg>> > Subject: cluster.min-free-disk > separate for each > brick > To: gluster-users at gluster.org > <mailto:gluster-users at gluster.org> > Message-ID:<4E482F0E.3030604 at superhosting.bg > <mailto:4E482F0E.3030604 at superhosting.bg>> > Content-Type: text/plain; charset=UTF-8; format=flowed > > Hello, > > I have a gluster set up with very different brick > sizes. > > brick1: 9T > brick2: 9T > brick3: 37T > > with this configuration if I set the parameter > cluster.min-free-disk to 10% it > applies to all bricks which is quite uncomfortable > with these brick sizes, > because 10% for the small bricks are ~ 1T but for > the big brick it is ~3.7T and > what happens at the end is that if all brick go to > 90% usage and I continue > writing, the small ones eventually fill up to 100% > while the big one has enough > free space. > > My question is, is there a way to set > cluster.min-free-disk per brick instead > setting it for the entire volume or any other way > to work around this problem ? > > Thank you in advance > > Regards, > Deyan > > Hello Deyan, > > I have exactly the same problem and I have asked about > it before - see links below. > > http://community.gluster.org/q/in-version-3-1-4-how-can-i-set-the-minimum-amount-of-free-disk-space-on-the-bricks/ > > http://gluster.org/pipermail/gluster-users/2011-May/007788.html > > My understanding is that the patch referred to in > Amar's reply in the May thread prevents a > "migrate-data" rebalance operation failing by running > out of space on smaller bricks, but that doesn't solve > the problem we are having. Being able to set > min-free-disk for each brick separately would be > useful, as would being able to set this value as a > number of bytes rather than a percentage. However, > even if these features were present we would still > have a problem when the amount of free space becomes > less than min-free-disk, because this just results in > a warning message in the logs and doesn't actually > prevent more files from being written. In other > words, min-free-disk is a soft limit rather than a > hard limit. When a volume is more than 90% full there > may still be hundreds of gigabytes of free space > spread over the large bricks, but the small bricks may > each only have a few gigabytes left of even less. > Users do "df" and see lots of free space in the > volume so they continue writing files. However, when > GlusterFS chooses to write a file to a small brick, > the write fails with "device full" errors if the file > grows too large, which is often the case here with > files typically several gigabytes in size for some > applications. > > I would really like to know if there is a way to make > min-free-disk a hard limit. Ideally, GlusterFS would > chose a brick on which to write a file based on how > much free space it has left rather than choosing a > brick at random (or however it is done now). That > would solve the problem of non-uniform brick sizes > without the need for a hard min-free-disk limit. > > Amar's comment in the May thread about QA testing > being done only on volumes with uniform brick sizes > prompted me to start standardising on a uniform brick > size for each volume in my cluster. My impression is > that implementing the features needed for users with > non-uniform brick sizes is not a priority for Gluster, > and that users are all expected to use uniform brick > sizes. I really think this fact should be stated > clearly in the GlusterFS documentation, in the > sections on creating volumes in the Administration > Guide for example. That would stop other users from > going down the path that I did initially, which has > given me a real headache because I am now having to > move tens of terabytes of data off bricks that are > larger than the new standard size. > > Regards > Dan. > > Hello, > > This is really bad news, because I already migrated my > data and I just realized that I am screwed because Gluster > just does not care about the brick sizes. > It is impossible to move to uniform brick sizes. > > Currently we use 2TB HDDs, but the disks are growing and > soon we will probably use 3TB hdds or whatever other > larges sizes appear on the market. So if we choose to use > raid5 and some level of redundancy (for example 6hdds in > raid5, no matter what their size is) this sooner or later > will lead us to non uniform bricks which is a problem and > it is not correct to expect that we always can or want to > provide uniform size bricks. > > With this way of thinking if we currently have 10T from > 6x2T in hdd5, at some point when there is a 10T on a > single disk we will have to use no raid just because > gluster can not handle non uniform bricks. > > Regards, > Deyan > > > I think Amar might have provided the answer in his posting to > the thread yesterday, which has just appeared in my autospam > folder. > > http://gluster.org/pipermail/gluster-users/2011-August/008579.html > > With size option, you can have a hardbound on min-free-disk > > This means that you can set a hard limit on min-free-disk, and > set a value in GB that is bigger than the biggest file that is > ever likely to be written. This looks likely to solve our > problem and make non-uniform brick sizes a practical > proposition. I wish I had known about this back in May when I > embarked on my cluster restructuring exercise; the issue was > discussed in this thread in May as well: > http://gluster.org/pipermail/gluster-users/2011-May/007794.html > > Once I have moved all the data off the large bricks and > standardised on a uniform brick size, it will be relatively > easy to stick to this because I use LVM. I create logical > volumes for new bricks when a volume needs extending. The > only problem with this approach is what happens when the > amount of free space left on a server is less than the size of > the brick you want to create. The only option then would be > to use new servers, potentially wasting several TB of free > space on existing servers. The standard brick size for most > of my volumes is 3TB, which allows me to use a mixture of > small servers and large servers in a volume and limits the > amount of free space that would be wasted if there wasn't > quite enough free space on a server to create another brick. > Another consequence of having 3TB bricks is that a single > server typically has two more more bricks belonging to a the > same volume, although I do my best to distribute the volumes > across different servers in order to spread the load. I am > not aware of any problems associated with exporting multiple > bricks from a single server and it has not caused me any > problems so far that I am aware of. > > -Dan. > > Hello Deyan, > > Have you tried giving min-free-disk a value in gigabytes, and if > so does it prevent new files being written to your bricks when > they are nearly full? I recently tried it myself and found that > min-free-disk had no effect all. I deliberately filled my > test/backup volume and most of the bricks became 100 full. I set > min-free-disk to "20GB", as reported in "gluster volume ... info" > below. > > cluster.min-free-disk: 20GB > > Unless I am doing something wrong it seems as though we can not > "have a hardbound on min-free-disk" after all, and uniform brick > size is therefore an essential requirement. It still doesn't say > that in the documentation, at least not in the volume creation > sections. > > > -Dan. > > On 08/09/11 06:35, Raghavendra Bhat wrote: > > This is how it is supposed to work. > > > > Suppose a distribute volume is created with 2 bricks. 1st brick is > having 25GB of free space, 2nd disk has 35 GB of free space. If one > sets a 30GB of minimum-free-disk through volume set (gluster volume > set <volname> min-free-disk 30GB), then whenever files are created, if > the file is hashed to the 1st brick (which has 25GB of free space), > then actual file will be created in the 2nd brick to which a linkfile > will be created in the 1st brick. So the linkfile points to the actual > file. A warning message indicating minimum free disk limit has been > crosses and adding more nodes will be printed in the glusterfs log > file. So any file which is hashed to the 1st brick will be created in > the 2nd brick. > > > > Once the free space of 2nd brick also comes below 30 GB, then the > files will be created in the respective hashed bricks only. There will > be a warning message in the log file about the 2nd brick also crossing > the minimum free disk limit. > > > > Regards, > > Raghavendra Bhat > Dear Raghavendra, Thanks for explaining this to me. This mechanism should allow a volume to function correctly with non-uniform brick sizes even though min-free-disk is not a hard limit. I can understand now why I had so many problems with the default value of 10% for min-free-disk. 10% of a large brick can be very large compared to 10% of a small brick, so when they started filling up at the same rate after all had less than 10% free space the small bricks usually filled up long before large ones, giving "device full" errors even when df still showed a lot of free space in the volume. At least now we can minimise this effect by setting min-free-disk to a value in GB. -Dan. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20110908/bbb7f854/attachment-0001.htm>