How to correctly distribute OpenStack VM files...

jdy at cryregarder.com (Joel Young) · Fri, 2 Aug 2013 06:53:57 -0700

Are you actually observing this?  With cryptographic hashes being
effectively uniform the probability of such an extreme distribution is
extraordinarily low.
On Aug 2, 2013 6:23 AM, "Xavier Trilla" <xavier.trilla at silicontower.net>
wrote:

>  Hi,****
>
> ** **
>
> We have been playing for a while with GlusterFS (Now with ver 3.4). We are
> running tests and playing with it to check if GlusterFS  can be really used
> as the distributed storage for OpenStack block storage (Cinder) as new
> features in KVM, GlusterFS and OpenStack are pointing to GlusterFS as the
> future of OpenStack open source block and object storage. ****
>
> ** **
>
> But we?ve found a problem just when we started playing with GlusterFS? The
> way distribute translator (DHT) balances the load. I mean, we understand
> and see the benefits of metadata less setup. Using hashes based on
> filenames and assigning a hash range to each brick is clever, reliable and
> fast, but from our understanding there is a big problem when it comes to
> storing VM images of a OpenStack deployment. ****
>
> ** **
>
> I mean, OpenStack Block Storage (Cinder) assigns a name to each volume it
> creates (GUID), so GlusterFS does a hash of the filename and decides in
> which brick it should be stored. But as in this scenario we don?t have many
> files (I mean, we would just have one big file per VM) we may end with a
> really unbalanced storage. ****
>
> ** **
>
> Let?s say we have a 4 bricks setup with DHT distribute, and we want to
> store 100 VMs there, so the ideal scenario would be:****
>
> ** **
>
> Brick1: 25 VMs****
>
> Brick2: 25 VMs****
>
> Brick3: 25 VMs****
>
> Brick4: 25 VMs****
>
> ** **
>
> As VMs are IO intensive it?s really important to correctly balance the
> load, as each brick has a limited amount of IOPS, but as DHT is just based
> on a filename HASH, we could end with something like the following scenario
> (Or even worse): ****
>
> ** **
>
> Brick1: 50 VMs****
>
> Brick2: 10 VMs****
>
> Brick3: 35 VMs****
>
> Brick4: 5 VMs****
>
> ** **
>
> And if we scale this out, things may get even worse. I mean, we may end
> with almost all VM file in one or two bricks and all the other bricks
> almost empty. And if we use growing VM disk image files like qcow2 the
> option ?min-free-disk? will not prevent all VMs disk image files being
> stored in the same brick. So, I understand DHT works well for large amount
> of small files, but for few big IO intensive files doesn?t seem to be a
> really good solution? (I mean, we are looking for a solution able to handle
> around 32 bricks and around 1500 VM for the initial deployment and able to
> scale up to 256 bricks and 12000 VMs :/ )****
>
> ** **
>
> So, anybody has a suggestion about how to handle this? I mean so far we
> only see two options: Either using legacy unify translator with ALU
> scheduler or either use cluster/stripe translator with a big block-size so
> at least load gets balanced across all bricks in some way.  But obviously
> we don?t like unify as it needs a namespace brick, and using stripping
> seems to have an impact on performance and really complicates
> backup/restore/recovery strategies. ****
>
> ** **
>
> So, suggestions? :) ****
>
> ** **
>
> Thanks!****
>
> ** **
>
> Saludos cordiales,****
>
> Xavier Trilla P.****
>
> Silicon Hosting <https://siliconhosting.com/>****
>
> ** **
>
> ?Sab?as que ahora en SiliconHosting ****
>
> resolvemos tus dudas t?cnicas gratis?****
>
> ** **
>
> M?s informaci?n en: siliconhosting.com/qa/****
>
> ** **
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130802/a54063c6/attachment.html>