Hey Jon, Sorry nobody's been able to help you so far; I think your emails must have fallen into the cracks. :( I'm going to go through and try to address some of the things that sound like they might still be relevant... On Tue, Jul 2, 2013 at 5:05 PM, Jon <three18ti@xxxxxxxxx> wrote: > Now if I could figure out the exact same issue on my other host... Which issue are you currently seeing on your other host? You mentioned several in your first two emails and it didn't sound like they were all going on at the same time. On Thu, Jun 27, 2013 at 9:34 AM, Jon <three18ti@xxxxxxxxx> wrote: > The last recorded error was an "Unable to open superblock" error, that was > several weeks ago, but I think that coincides with the initial trouble I > experienced. I have tested these disks and can confirm that they have not > failed. This in combination with the checksum error below makes it sound like there's some kind of issue going on with your disks, though. :/ I see you mentioned at least one power outage, though Ceph should generally survive those as it's quite careful about disk commit orders. What's your underlying FS, and are you using any options when mounting it that might reduce safety (the most common one is setting nobarrier=true) or using a raid card that might have similar safety configuration issues? > Any help is greatly appreciated as I am really stumped. > I think my biggest frustration is the init scripts not working as described > in the docs. After I use ceph-deploy, do I need to write a config file? > based on my interpretation of the docs and upstart scripts, I don't think > so; the respective daemons start on boot... Hmm, what OS are you using? ceph-deploy does not require writing any additional config files (unless you want to for some reason); the modern scripts auto-detect disks of the appropriate type and folders in the appropriate locations and start up that way. Your issue with things starting up in the wrong order sounds like some of the init system ordering troubles we've run into with systemd and other non-upstart, non-sysvinit systems and we have some recent patches that should fix that up. On Sun, Jun 9, 2013 at 11:36 AM, Jon <three18ti@xxxxxxxxx> wrote: > I've tried a number of things in the docs, but something seems amiss because > when I try to restart monitors or osds, the init script tells me it's not > found. > I've copied my ceph.conf at the end of this e-mail. > >>> root@shepard:~# ls /var/lib/ceph/mon/ >>> ceph-shepard >>> root@shepard:~# /etc/init.d/ceph restart mon.shepard >>> /etc/init.d/ceph: mon.shepard not found (/etc/ceph/ceph.conf defines , >>> /var/lib/ceph defines ) > (I've also tried mon.0 .. mon.3 ) Hmm, what are the contents of /var/lib/ceph and its subdirectories? > My last question is where does ceph-deploy create the configs? I have the > original files in a directory where I ran ceph deploy, > and I know about the /etc/ceph/ceph.conf file, but there seems to be some > other config that the cluster is pulling from. > Maybe I'm mistaken, but there are no osds in my ceph.conf. Right. You can specify daemons in one of a couple of ways: 1) put them in the ceph.conf. 2) Tag OSD disks appropriately 3) put the data directories for those daemons in the standard system locations (/var/lib/ceph/*). We are in general moving away from having a single monolithic config that lists every daemon, because it's basically a lie — even with a monolithic config, each daemon is looking at its local copy so if those disagree then the stuff that works based on config contents (a select number of sysvinit commands) will behave differently across those hosts. ceph-deploy chooses to define daemons based on the third and second methods under most cases, because it's designed to allow incremental changes and doesn't want to have to handle simultaneous changes to the conf on a remote host. So when you create a cluster it gives you a skeleton config to modify as you see fit, then it pushes that out to every node you specify a daemon, but doesn't change it when you add new daemons. When you turn on Ceph, the init system looks for existing daemon data stores and turns them on, and they read in the contents of that minimal ceph.conf in order to find the monitor IPs and any other config options you might have specified for their daemon type. Given your odd disk sizes, I'm thinking they aren't real disks, and so you're either losing the partition types that ceph-disk sets or else they aren't located in quite the right place for the init system to find them. So again, what are the contents of /var/lib/ceph and its subdirectories on one of your working nodes? :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com