At 05:16 AM 1/14/2009, artur.k wrote: >I have 6 www servers with lighttpd. Gluster resource is mounted on >those servers. 2 gluster servers are using AFR. Everything works >great until one of the gluster servers goes down. When this happens >everything works fine using one glusterfs server but when the other >one goes back on-line then after a few hours gluster starts working >slowly for 20 - 30 minutes. After that time period everything starts >to work normally however lighttpd tends to have problems when files >are not available to it "fast enough" (which happens during the 20 - >30 minutes time period after the second gluster servers is back). >Lighttpd simply shows HTTP 500 when it cannot access the file during >a certain time frame. What is problem ? During this time, most likely, gluster is auto-healing the server that was down. Unfortunately, it seems, the process for it doing so has changed in 2.0. I guess it's more robust, but it's also more time consuming. Previously, files were only healed when you accessed that file. now, it seems files are healed when you access a directory. So---- when lighthttp accesses a file x in directory Y, gluster not only auto-heals file x, but also ALL the other files in Y. It blocks the IO request until it's healed the entire directory. This is the safest thing, but what it should do is heal the file we need, return back to the application, then continue auto-healing the rest of the files. I've no idea if they're going to change this or not (or if it's too difficult), but it is kind of a pain having processes sit waiting while unrelated files are being dealt with. >glusterfs 2.0.0qa1 built on Jan 9 2009 14:14:17 >Repository revision: glusterfs--mainline--3.0--patch-840 > > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users