Re: gluster fails under heavy array job load load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/13/2013 02:00 PM, Alex Chekholko wrote:

My best guess is that you overloaded your interconnect. Do you have metrics for if/when your network was saturated? That would cause Gluster clients to time out.

My best guess is that you went into the "E" state of your "USE (Utilization, Saturation, Error)" spectrum.

IME, that is a common pattern for out Lustre/GPFS clients, you get all kinds of weird error states if you manage to saturate your I/O for an extended period of time and fill all of the buffers everywhere.

When we tried to roll out GlusterFS for a production environment a few years ago, we ran into exactly this problem. Our scenario was a multi-master cluster, and the worst part appeared to be log files. Any time a host wrote to a log file it had to synchronize the log file. And since there were multiple masters, this very quickly clogged our interconnect and ended things.

We ended up rolling back GlusterFS for this purpose and moved to a distributed, asynchronous logging system rolled in house that used Linux kernel message queues, with the understanding that replicated log files would see a small amount of jitter and out-of-order appearance between hosts. While this may sound irreverent, all log entries have a time stamp anyway so it's all good and has worked well for us.

It may be that this has been fixed recently, but it's a use case I thought might warrant consideration.

-Ben


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux