Hi everybody! Love this community, and I love GlusterFS. All that, despite being burned by it, likely due to my own failures. Here's the scenario where I got burned, and my guesstimates on why they happened. We run a popular .NET-based web app that gets a lot of traffic, where people build websites using our system. The long short of it is, we tested, tweaked, tested some more, over a full month. After we deployed it to production, we saw performance take a dive into the dumpster. We had to revert back fairly quickly. The obvious blame is in our testing. We load tested the system many, many times over the course of an entire month, but with a narrow range of test scenarios. The wide range of live production traffic proved to render our testing moot. We tucked our tail between our legs and are researching tools that will let us play back life traffic to serve as a better simulation. In our earlier load-testing we were able to achieve many multiples of our peak traffic, but again it wasn't realistic traffic. Before I get to my suspicion of what's happening, keep in mind that we have 50+ million files (over hundreds of thousands of directories), most of them are small, and each page request will pull in upwards of 10-40 supporting assets (images, Flash files, CSS, JS, etc.). We also have people executing directory listings whenever they're editing their site, as they choose images, etc. to insert onto the page. We're also exporting the volume to CIFS so our Windows servers can access the GlusterFS client on the Linux machines in the cluster. The Samba settings on there were tweaked to the hilt as well, turning off case-insensitivity, bumping up caches and async IO, etc. It appears as if GlusterFS has some kind of I/O blocking going on. Whenever a directory listing is being pieced together, it noticeably slows down (or stops?) other operations through the same client. For a high-concurrency app like ours where the storage backend needs to be able to pull off 10 to 100 directory listings a second, and 5,000 to 10,000 IOPS overall, it's easy to see how perf would degrade if my blocking suspicion is correct. The biggest culprit, in my guess, is the directory listing. Executing one makes things drag. I've been able to demonstrate that through a simple script. And we're running some pretty monster machines with 24 cores, 24 GB RAM, etc. I tried as many tuning permutations as possible, only to run into the same result. Jacking the cache-size, the io-thread-count to 64, etc. certainly helped performance, but continued to exhibit this blocking behavior. I also made sure that each web server accessing the GlusterFS backend was talking to its own GlusterFS client, in the hopes of increasing parallelization. I'm sure it helped, but not enough. It's nowhere close to the concurrency and performance of a straight-out Windows share. (I realize the overhead of a clustered file system will have less perf than a straight share, but we saw a drop of performance as load increased, in the order of magnitude range.) Am I way off? Does GlusterFS block on directory listings (getdents) or any other operations? If so, is there a way to enable the database equivalent of "dirty reads" so it doesn't block? Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://gluster.org/pipermail/gluster-users/attachments/20110823/a0f25060/attachment.htm>