On 08/02/2013 06:22 AM, Xavier Trilla wrote: > > Hi, > > We have been playing for a while with GlusterFS (Now with ver 3.4). We > are running tests and playing with it to check if GlusterFS can be > really used as the distributed storage for OpenStack block storage > (Cinder) as new features in KVM, GlusterFS and OpenStack are pointing > to GlusterFS as the future of OpenStack open source block and object > storage. > > But we've found a problem just when we started playing with > GlusterFS... The way distribute translator (DHT) balances the load. I > mean, we understand and see the benefits of metadata less setup. Using > hashes based on filenames and assigning a hash range to each brick is > clever, reliable and fast, but from our understanding there is a big > problem when it comes to storing VM images of a OpenStack deployment. > > I mean, OpenStack Block Storage (Cinder) assigns a name to each volume > it creates (GUID), so GlusterFS does a hash of the filename and > decides in which brick it should be stored. But as in this scenario we > don't have many files (I mean, we would just have one big file per VM) > we may end with a really unbalanced storage. > > Let's say we have a 4 bricks setup with DHT distribute, and we want to > store 100 VMs there, so the ideal scenario would be: > > Brick1: 25 VMs > > Brick2: 25 VMs > > Brick3: 25 VMs > > Brick4: 25 VMs > > As VMs are IO intensive it's really important to correctly balance the > load, as each brick has a limited amount of IOPS, but as DHT is just > based on a filename HASH, we could end with something like the > following scenario (Or even worse): > > Brick1: 50 VMs > > Brick2: 10 VMs > > Brick3: 35 VMs > > Brick4: 5 VMs > > And if we scale this out, things may get even worse. I mean, we may > end with almost all VM file in one or two bricks and all the other > bricks almost empty. And if we use growing VM disk image files like > qcow2 the option "min-free-disk" will not prevent all VMs disk image > files being stored in the same brick. So, I understand DHT works well > for large amount of small files, but for few big IO intensive files > doesn't seem to be a really good solution... (I mean, we are looking > for a solution able to handle around 32 bricks and around 1500 VM for > the initial deployment and able to scale up to 256 bricks and 12000 > VMs :/ ) > > So, anybody has a suggestion about how to handle this? I mean so far > we only see two options: Either using legacy unify translator with ALU > scheduler or either use cluster/stripe translator with a big > block-size so at least load gets balanced across all bricks in some > way. But obviously we don't like unify as it needs a namespace brick, > and using stripping seems to have an impact on performance and really > complicates backup/restore/recovery strategies. > > Another suggestion that you may want to try is, have your GlusterFS node also serve as OpenStack Cinder and use NUFA[1] ~shanks [1] http://gluster.org/community/documentation/index.php/Translators/cluster/nufa -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130805/98ae88e0/attachment.html>