Haven't seen the "Transport endpoint" problem again, yet. Instead, a different problem surfaced this morning. The servers began to "hang", taking 60 seconds or more to return a read. lsof showed many open files in the GlusterFS partition, all being read. Traffic monitor showed extremely high volume data flow (essentially, 1 Gb) between the primary webserver and its glusterfs server twin. Shutting down the webserver, glusterfs as client, and glusterfs as server, then restarting the whole stack from server to client to apache resulted in a system that was responsive - for a while. The reads were nearly all of the same few dozen files that I want to have replicated on GlusterFS. Based on what I've seen, I guess that: 1) Glusterfs does some kind of coherency check at every file read. 2) Glusterfs processes these coherency checks serially, 3) The coherency checking was backing up. Am I out in left field, here? Is there something terrible and fundamental that I'm missing, or is GlusterFS + Ethernet + stock Fuse + basic config just not going to do all that well with medium-to-large amounts of reads of a few hundred small files? (say, 50/second) I ended up rolling back glusterfs and went back to a single, local file system, and would like to move forward on this... On Monday 19 January 2009 11:59:05 pm you wrote: > 1. whether glusterfsd is running on the server or not, with the > process state (from ps) if running > 2. backtrace of coredump using gdb if it has crashed. > > we can figure the next step only after having one of the above two > > avati > > On Tue, Jan 20, 2009 at 1:24 PM, Benjamin Smith > > <lists at benjamindsmith.com> wrote: > > On Monday 19 January 2009 05:05:20 pm you wrote: > >> Do you have a coredump on the server? was glusterfsd running on the > >> server at all? > > > > No. > > > > If it should happen again, what should I do to provide you with what you > > need? > > > > -Ben > > > >> <lists at benjamindsmith.com> wrote: > >> > Late last week, I rolled out GlusterFS on our production cluster. > >> > Config is very simple, two active servers that are also clients to > >> > each other. Usage is for a fairly low-volume distribution of file > >> > settings for an application cluster that are updated perhaps a few > >> > times per day and read constantly. (pretty much every web page hit) > >> > Here are the numbers: > >> > >> -- > >> This message has been scanned for viruses and > >> dangerous content by MailScanner, and is > >> believed to be clean. > > > > -- > > This message has been scanned for viruses and > > dangerous content by MailScanner, and is > > believed to be clean. > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.