Re: Couple of questions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Chris,

Some answers inline.

On Sun, May 21, 2017 at 2:17 AM, Chris Knipe <savage@xxxxxxxxxxxxx> wrote:

Hi All,

 

I have a couple of questions which I hope that there’s someone to just shed some light on for me please.  I don’t think it’s anything serious, other than really just trying to better understand how the underlying GlusterFS works.

 

Firstly, I plan to build a GluserFS using SM 5018D8-* boxes.  Essentially, 12 x 10TB disks, 128GB Ram, and a Xeon D-1537 CPU.  It does have an on-board LSI RAID controller, but there’s not a lot of detail forthcoming in terms of RAID configurations (caches for example).

 

Firstly, in terms of the nodes, I don’t care TOO much for data integrity (i.e. it is OK to lose SOME of data, but availability in terms of underlying hardware is more important).  Secondly, it may not be the perfect scenario for GlusterFS (although it works perfectly fine currently through NFS on standard servers), but we are talking about millions of > 500K < 1M files.  Files are stored in a very specific structure, so each file is read/written precisely to a unique directory.  There’s no expensive scanning of directories (i.e. ls) happening or anything like that.  It’s a simple and very static read/write operation for each file on the system.

 

Currently we store articles using a MD5 hash algorithm for a file name, and use 5 directory levels, so, /a/a0/a02/a02b/a02ba/a02ba1234567813dfa23bd2348901d33 Again everything works fine using multiple servers and standard ext4 / nfs exports.  We host /a on one server, /b on another server, etc.  So whilst the directories (and IO load) is split to address load issues, we are a bit limited in terms of how and how much we can expand.  I’m hoping to move all of this to GlusterFS.  The applications are very random IO intensive, and whilst we are nowhere CLOSE to the capabilities of the hardware, it is actually the seek times that are our limiting factor and the biggest bottleneck.  Therefore, I am fairly certain that growing through NFS, or, GlusterFS should be suitable and workable for us.

 

My main reason for wanting to go GlusterFS is mostly related to better and easier expansion of storage.  It seems that it is easier to manage, whilst also providing some degree of redundancy (even if only partially in the case of a Distributed volume, which I believe would be adequate for us).  All drives are hot swappable, and we will more than likely either look at a Distributed, or Stripped volume.  In the case of a Distributed system, we can still live with the fact that the majority of files remain available, whilst a certain amount of files becomes unavailable should a node or brick fail, so Distributed will more than likely be adequate for our needs.  Stripped would be nice to have, but I think it would have some complexities given our specific use case.  We are also talking high concurrently (we do about 6K read/writes per second over NFS currently, per NFS server)

 

1 On the client(s), mounting the GlusterFS the documentation is clear in that it will only fetch the GlusterFS configuration, whilst there after reading/writing directly to the GlusterFS nodes.  How non-stop is this?  If there is already a mount made and additional nodes are added / removed from the GlusterFS, does the client(s) get informed of this without the need to re-mount the file system?  What about the capacity on the mount (at the client) when a node is added?  Basically, how non-stop is this operation?  Can I assume that (in a perfect world) the client would never need to re-mount the file system? Are there any operations in GlusterFS that would require a client to re-mount?


No need to re-mount. GlusterFS fetches the volume config changes from the server from which it is mounted, and gives the scaled out storage layout to its clients / applications.
 

 

2 Given the Distributed nature of GlusterFS and how files are written to nodes, would it be safe to assume that how more nodes there are in the GlusterFS, how better the IO performance would be?  Surely, the IO load is distributed between the nodes, together with the individual files, right?  What kind of IO could (or should) reasonably be expected given the hardware mentioned above (yes, I know this is a how long is a piece of string question)?


Here, the performance improvements can be seen when there are more clients using the volume while you add the server. Most of the cases when you keep the number of clients same, client n/w would become bottleneck to not see any performance impact. But if you increase the client too along with server, in general we see linear improvement in performance for file I/O.
 

 

3 When bricks are added / removed / expanded / rebalanced / etc… What does GlusterFS actually do?  What would happen for instance, if I have a 250TB volume, with 10M files on it, and I add another node with ~50TB?  What is the impact on performance whilst these expensive operations are run?  Again, how non-stop is this in terms of the clients reading/writing a few thousand files per second?   If running a *purely* Distributed volume, would a rebalance still be required when adding a new node?  What impact does add/remove/rebalance have on large GlusterFS systems?  Especially a rebalance, I would expect the operation to become more and more expensive as more and more bricks are added?  Given the large amount of files I intend to have on GlusterFS, I am concerned about directory scans (for example) happening internally in GlusterFS…

 


This is where *lot* of planning would be required. Gluster provides easy CLI options to manage your scale out and shrink operation on volume, but while these operations are taking place, we are seeing lot of complaints about *performance* from the users.

Hence if you are starting fresh, our recommendation for you is to start with more number of bricks than nodes, (say if you have 16 nodes, start with 48 or 64 bricks). This way, when you add nodes, you can do just 'add-brick' and then 'remove-brick' to migrate subset of data, which will reduce the number of extra migrations, and will work optimally for you. Again, all these operations can work while the volume is online, so clients won't see any downtime. Ofcourse there would be some hit in performance, as server nodes would be busy rebalancing the data in-between them.

 

Hopefully I wasn’t too vague in my questions, but let’s see of some questions at least could be dealt with :-)


I guess I got the questions, and hence answered in my limited knowledge.

-Amar
 

 

Thanks,

Chris.

 

 

 


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users



--
Amar Tumballi (amarts)
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux