On 08/06/2012 09:51 AM, John Mark Walker wrote: > > > ------------------------------------------------------------------------------- > > Hi Ben, > > Thanks for the expert advice. > > On Fri, Aug 3, 2012 at 2:35 PM, Ben England <bengland at redhat.com > <mailto:bengland at redhat.com>> wrote: > > 4. Re: kernel parameters for improving gluster writes on millions of > small writes (long) (Harry Mangalam) > > Harry, You are correct, Glusterfs throughput with small write transfer > sizes is a client-side problem, here are workarounds that at least some > applications could use. > > > Not to be impertinent nor snarky, but why is the gluster client written in > this way and is that a high priority for fixing? It seems that > caching/buffering is one of the great central truths of computer science in > general. Is there a countering argument for not doing this? I'd say that caching/buffering is also one of the great central *battlegrounds* of computer science. ;) While they're useful and often necessary performance enhancers, when done without proper attention to issues such as ordering/consistency and fault handling they can also lead to some pretty awful problems. The GlusterFS philosophy has generally been to handle performance issues via horizontal scaling - which works even for data sizes greater than any cache - and be conservative about the other issues. The consistency issue alone could be dealt with in other ways, but most of those would require more complexity overall and particularly more complexity and overhead on the servers (violating another principle of moving work to the more numerous clients). Clearly these tradeoffs could have been made differently, but I'd argue that the additional effort necessary would have required sacrifice in other areas. Witness e.g. Ceph, which is technically superior in several ways but *as a direct consequence* has taken longer to mature and stabilize. The idea of trying to implement e.g. dynamic reconfiguration in a more stateful and complex system makes me a bit queasy. > As a general point, you'll find that glusterfs always (almost always?) errs on > the side of data consistency, even if it adversely affects performance. An NFS > client can cache because it doesn't have to worry about HA - which has to be > implemented with other tools. With recent changes in GlusterFS code, including > further development of server-side code, it should be possible to create some > type of client-side caching in the near future. There are also developments in > fuse to think about, but mostly, it has to do with glusterfs' new server code. > Previously, all the intelligence was in the client, so data consistency on the > client was absolutely essential. Now, with "smarter" server-side translators, > eg. self-heal and rebalancing, this is shifting. 3.3 was the first release with > the shift in that direction, and more is coming. I know this doesn't help you > *now*, but I wanted to give you an idea of why it is this way, and how it's > changing going forward. To elaborate a bit on that point, I think some of the changes are likely to involve support for explicit and deliberate sacrifices of consistency for the sake of performance, for users that want to make that choice. The defaults will probably remain fairly conservative, to support current users/workloads. Examples might include things like TTL-based caching or asynchronous replication implemented in a modular way, not deep changes such as an MPFS-style cache protocol. Apparently I'll be out west in a few weeks, with a trip up the coast preceding the Gluster workshop at LinuxCon. I'd be glad to discuss these issues in person with anyone who's interested, preferably over some sort of beverage. ;)