On 09/08/2009 04:14 AM, Daniel Maher wrote: > Alan Ivey wrote: >> Like the subject implies, how does replication work exactly? >> >> If a client is the only one that has the IP addresses defined for the >> servers, does that mean that only a client writing a file ensures >> that it goes to both servers? That would tell me that the servers >> don't directly communicate with each other for replication. >> >> If so, how does healing work? Since the client is the only >> configuration with the multiple server IP addresses, is it the >> client's "task" to make sure the server heals itself once it's back >> online? >> >> If not, how do they servers know each other exist if not for the >> client config file? > > You've answered your own question. :) AFAIK, in the recommended > simple replication scenario, the client is actually responsible for > replication, as each server is functionally independant. > (This seems crazy to me, but yes, that's how it works.) For Alan: Active healing should only be necessary if the system is not working properly. Healing should only be required after a system crash or bug, a GlusterFS server or client crash or bug, or somebody messing around with the backing store file system underneath. For systems that are up and running without problems, healing should be completely unnecessary. For Daniel: For the seems crazy, compared to what? Every time I look at other solutions such as Lustre and see how they rely on a single metadata server, that itself is supposed to be highly available using other means, I have to ask, are they really solving the highly availability problem, or are they just narrowing the scope? If the whole cluster of 2 to 1000 nodes is relying on a single server to being up, this is the weakest link. Sure, having one weakest link to deal with is easier to solve using traditional means that having 1000 weakest links, but it seems clear that Lustre has not SOLVED the problem. They've just reduced it to something that might be more manageable. Even the "traditional means" of shared disk storage such as GFS and OCFS rely on a single piece of hardware - the shared storage. As a result, they make the shared storage really expensive - dual interfaces, dual power supplies, dual disks, ... but it's still one piece of hardware that everything else is reliant on. For "shared nothing", each node really does need to be fully independent and able to make its own decisions. I think the GlusterFS folk have the model right in this regard. The remaining question is whether they have the *implementation* right. :-) Right now they seem to be in a compromised position between simplicity, performance, and correctness. It seems it is a difficult problem to have all three no matter which model is selected (shared disk, shared metadata only, shared nothing). The self-healing is a good feature, but they seem to be leaning on it to provide correctness, so that they can provide performance with some amount of simplicity. An example here is how directory listings come from "the first up server". In theory, we could have correctness through self-healing if directory listing always queried all servers. The combined directory listing would be shown, and self healing would kick off in the back ground. But, this would cost performance - as all servers in the cluster would be involved in directory listing. This is just one example. I think GlusterFS has a lot of potential to close off on holes such as these. I don't think it would be difficult to add in things like an automatic election model for defining which machines are considered stable and the safest masters to use (simplest might be 'the one with the highest glusterfsd uptime'?), and having clients choose to pull things like directory listings only from the first stable / safest master, and having the non-stable / non-safe machines go into automatic full self-heal until they are back up-to-date with the master. In such a model, I'd like to see the locks being held against the stable/safe masters used for reads. Just throwing stuff out there... For me, I'm looking at this as - I have a problem to solve, and very few solutions seem to meet my requirements. GlusterFS looks very close. Do I write my own, which would probably start out only solving my requirements, and since my requirements will probably grow, this would mean eventually writing something the size of GlusterFS? Or do I start looking in to this GlusterFS thing - point out the problems, and see if I can help? I'm leaning towards the latter - try it out, point out the problems, see if I can help. As it is, I think GlusterFS is very stable with sufficient performance for the requirements of most potential users. It's the people who are really trying to push it to its limits that are causing the majority of the breakage being reported here. For these people, which includes me, I've looked around - and the solutions out there that are competitive are either very expensive, or insufficient. Cheers, mark -- Mark Mielke<mark at mielke.cc>