Whit, Genius! This morning I set out to remove as many variables as possible to whittle down the repro case as much as possible. I've become pretty good at debugging memory dumps on the Windows side over the years, and even inspected the web processes. Nothing looked out of the ordinary there, just a bunch of threads waiting to get file attribute data from the Gluster share. So then, to follow your lead, I reduced the Page of Death down from thousands of images to just five. I tried accessing the page, and boom, everything's frozen for minutes. Interesting. So I reduced it to one image, accessed the page, and boom, everything's dead instantly. That one image is a file that doesn't exist. So now, knowing that GlusterFS is kicking into overdrive fretting about a file it can't find, I decided to eliminate the web server altogether. I opened up Windows Explorer, and typed in a directory that didn't exist, and sure enough, I'm unable to navigate through the share in another Explorer window until it finally responds again a minute later. I think the Page of Death was exhibiting such a massive death (e.g. only able to respond again upwards of five minutes later) because it was systematically trying to access several files that weren't found, and each one it can't find causes the SMB connection to hang for close to a minute. I feel like this is a bit of major progress toward pinpointing the problem for a possible resolution. Here are some additional details that may help: The GlusterFS directory in question, /storage, has about 80,000 subdirs in it. As such, I'm using ext4 to overcome the subdir limitations of ext3. The non-existent image file that is able to cause everything to freeze exists in a directory, /storage/thisdirdoesntexist/images/blah.gif, where "thisdirdoesntexist" is in that storage directory along with those 80,000 real subdirs. I know it's a pretty laborious thing for Gluster to piece together a directory listing, and combined with Joseph's recognition of the flood of "getdents", does it seem reasonable that Gluster or Samba is freezing because it's for some reason generating a subdir listing of /storage whenever it can't find one of its subdirs? As another test, if I access a file inside a non-existent subdir of a dir that only has five subdirs, and nothing freezes. So the freezing seems to be a function of the number of subdirectories that are siblings of the first part of the path that doesn't exist, if that makes sense. So in /this/is/a/long/path, if "is" doesn't exist, then Samba will generate a list of subdirs under "/this". And if "/this" has 100,000 immediate subdirs under it, then you're about to experience a world of hurt. I read some where that FUSE's implementation of readdir() is a blocking operation. If true, the above explanation, plus FUSE's readdir(), are to blame. And I am therefore up a creek. It is not feasible to enforce the system to only have a few subdirs at any given level to prevent the lockup. Unless somebody, after reading this novel, has some ideas for me to try. =) Any magical ways to not get FUSE to block, or any trickery on Samba's side? Ken On Sun, Jul 17, 2011 at 10:29 PM, Whit Blauvelt <whit.gluster at transpect.com>wrote: > On Sun, Jul 17, 2011 at 10:19:00PM -0500, Ken Randall wrote: > > > (The no such file or directory part is expected since some of the image > > references don't exist.) > > Wild guess on that: Gluster may work harder at files it doesn't find than > files it finds. It's going to look on one side or the other of the > replicated file at first, and if it finds the file deliver it. But if it > doesn't find the file, wouldn't it then check the other side of the > replicated storage to make sure this wasn't a replication error? > > Might be interesting to run a version of the test where all the images > referenced do exist, to see if it's the missing files that are driving up > the CPU cycles. > > Whit > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20110718/02324439/attachment.htm>