Hey Greg, On 2011. June 7. 04:15:31 Gregory Farnum wrote: > 2011/6/6 Székelyi Szabolcs <szekelyi@xxxxxxx>: > > I have a three node ceph setup, two nodes playing all three roles (OSD, > > MDS, MON), and one being just a monitor (which happens to be the client > > I'm using the filesystem from). > > > > I want to achieve high availablity by mirroring all data between the OSDs > > and being able to still access everything even if one of them goes down. > > The mirroring works fine, I see the space being consumed on both nodes > > as I copy data on the file system. According to `ceph -s`, all PGs are > > in active+clean state. If I start reading a big file and shut down one > > of the (OSD+MDS+MON) nodes, the file can still be read until the end, > > that's fine. Moreover, the contents read back seem correct when compared > > to the original file. Very nice. But if I start reading the file while > > one of the nodes is down, it blocks until the node comes up again. I > > can't even kill the reading process with KILL, TERM, or INT. > > > > Am I doing something wrong, or was not careful enough reading the docs, > > or may this be a bug? My ceph.conf is attached. > > The problem isn't in the OSD, it's the MDS. :) > > The MDS system is *slightly* less resilient than the OSD system is. > You can set up "standby" MDSes that will take over if the system > detects that an MDS has died; you can even set up "standby-replay" > MDSes that follow a specific MDS and keep all its data cached in > memory so they can take over right when a failure is detected. But if > you lose one MDS its data won't automatically be imported into the > remaining MDSes. (Because the MDS keeps all its data on the OSDs, > there's no danger of losing data -- it's a matter of how the data is > segregated that requires a new daemon. And generally the process is > dominated by the timeout, not the time it takes the new MDS to take > over.) Thanks for the clarification. I still have a few questions. If I understand things correctly, Ceph tries to have max_mds number of MDSes active at all times. I can have more MDSes than this number, but the excess ones will be standby MDSes, right? I can't really understand the difference between a standby and an active MDS. Now I have two active and no standby MDSes, and the filesystem stops working if I kill any of them. Does this mean that the system will stop working if it can't fill up the number of MDSes to max_mds from the standby pool? What is the reason for running standby MDSes and not setting max_mds to the number of all MDSes? > So in your case, you're trying to open a file that is controlled by > the MDS that you killed, and the client can't get the "capability" > bits that it needs in order to look at the file. So you've got a few > options: > 1) Kill the OSD, but not the MDS. Well, if a machine crashes, then both fall victim. :( > 2) Create an extra MDS daemon, perhaps on your monitor node. When the > system detects that one of your MDSes has died (a configurable > timeout, IIRC in the neighborhood of 30-60 seconds), this extra daemon > will take over. I will do this. Should this be a standby or an active MDS? Ie. should I increase max_mds from 2 to 3 after creating the new MDS? > (Or you can just start up the new daemon after you kill the old one, > doesn't matter.) > 3) Create a new system with only one MDS and don't kill that one. > (Eventually you will be able to shrink the number of MDSes, but this > isn't well-tested or documented so I'm not sure what state it's in > right now.) This is not an option since it will create a SPOF and that's exactly the thing I'm trying to avoid by using Ceph. Thanks, -- cc -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html