Are you actually observing this? With cryptographic hashes being effectively uniform the probability of such an extreme distribution is extraordinarily low. On Aug 2, 2013 6:23 AM, "Xavier Trilla" <xavier.trilla at silicontower.net> wrote: > Hi,**** > > ** ** > > We have been playing for a while with GlusterFS (Now with ver 3.4). We are > running tests and playing with it to check if GlusterFS can be really used > as the distributed storage for OpenStack block storage (Cinder) as new > features in KVM, GlusterFS and OpenStack are pointing to GlusterFS as the > future of OpenStack open source block and object storage. **** > > ** ** > > But we?ve found a problem just when we started playing with GlusterFS? The > way distribute translator (DHT) balances the load. I mean, we understand > and see the benefits of metadata less setup. Using hashes based on > filenames and assigning a hash range to each brick is clever, reliable and > fast, but from our understanding there is a big problem when it comes to > storing VM images of a OpenStack deployment. **** > > ** ** > > I mean, OpenStack Block Storage (Cinder) assigns a name to each volume it > creates (GUID), so GlusterFS does a hash of the filename and decides in > which brick it should be stored. But as in this scenario we don?t have many > files (I mean, we would just have one big file per VM) we may end with a > really unbalanced storage. **** > > ** ** > > Let?s say we have a 4 bricks setup with DHT distribute, and we want to > store 100 VMs there, so the ideal scenario would be:**** > > ** ** > > Brick1: 25 VMs**** > > Brick2: 25 VMs**** > > Brick3: 25 VMs**** > > Brick4: 25 VMs**** > > ** ** > > As VMs are IO intensive it?s really important to correctly balance the > load, as each brick has a limited amount of IOPS, but as DHT is just based > on a filename HASH, we could end with something like the following scenario > (Or even worse): **** > > ** ** > > Brick1: 50 VMs**** > > Brick2: 10 VMs**** > > Brick3: 35 VMs**** > > Brick4: 5 VMs**** > > ** ** > > And if we scale this out, things may get even worse. I mean, we may end > with almost all VM file in one or two bricks and all the other bricks > almost empty. And if we use growing VM disk image files like qcow2 the > option ?min-free-disk? will not prevent all VMs disk image files being > stored in the same brick. So, I understand DHT works well for large amount > of small files, but for few big IO intensive files doesn?t seem to be a > really good solution? (I mean, we are looking for a solution able to handle > around 32 bricks and around 1500 VM for the initial deployment and able to > scale up to 256 bricks and 12000 VMs :/ )**** > > ** ** > > So, anybody has a suggestion about how to handle this? I mean so far we > only see two options: Either using legacy unify translator with ALU > scheduler or either use cluster/stripe translator with a big block-size so > at least load gets balanced across all bricks in some way. But obviously > we don?t like unify as it needs a namespace brick, and using stripping > seems to have an impact on performance and really complicates > backup/restore/recovery strategies. **** > > ** ** > > So, suggestions? :) **** > > ** ** > > Thanks!**** > > ** ** > > Saludos cordiales,**** > > Xavier Trilla P.**** > > Silicon Hosting <https://siliconhosting.com/>**** > > ** ** > > ?Sab?as que ahora en SiliconHosting **** > > resolvemos tus dudas t?cnicas gratis?**** > > ** ** > > M?s informaci?n en: siliconhosting.com/qa/**** > > ** ** > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130802/a54063c6/attachment.html>