On 11/13/2013 10:19 AM, Vijay Bellur wrote:
Makes me wonder what would be a typical deployment scenario - would we have a single volume that spans around 10K nodes? If yes, what are the scalability problems that we foresee? DHT's directory spread is on the top of my mind. Would the directory spread count option be good enough to address this?
The idea of a single volume spanning 10K nodes kind of freaks me out, but if we support that many nodes (in this kind of scenario they're likely to be both servers and clients) then it's almost inevitable that we'll have users who try to create volumes across all of them. After all, that's the "unified namespace" value prop, right?
I don't think directory spread count is sufficient to address this. At that level, we can *never* do anything that involves hitting all bricks. That includes getxattr to fetch layouts (even if most of them or empty for a particular directory), it includes mkdir, and so on. We'll have to do *everything* via consistent hashing, including the things where we currently rely on information being global. Even having that many connections is going to be a serious problem, so we'll probably have to do some pooling or proxying or something. Tracking and coordinating rebalance state is going to be another problem, so we'll probably need a fundamentally different approach there as well.
But first, we have to solve the glusterd scaling issues. Any scaling in DHT or elsewhere in the I/O plane won't even matter until the management plane can support building a cluster that large.