Hi Mandell, On Mon, 8 Oct 2012, Mandell Degerness wrote: > Hi list, > > I've run into a bit of a weird error and I'm hoping that you can tell > me what is going wrong. There seems to be a race condition in the way > I am using "ceph osd create <uuid>" and actually creating the OSD's. > The log from one of the servers is at: > > https://gist.github.com/528e347a5c0ffeb30abd > > The process I am trying to follow (for the OSDs) is: > > 1) Create XFS file system on disk. > 2) Use FS UUID as source to get a new OSD id #. > 'ceph', 'osd', 'create', '32895846-ca1c-4265-9ce7-9f2a42b41672' > (Returns 2.) > 3) Pass the UUID and OSD id to the create osd command > > ceph-osd -c /etc/ceph/ceph.conf --fsid > e61c1b11-4a1c-47aa-868d-7b51b1e610d3 --osd-uuid > 32895846-ca1c-4265-9ce7-9f2a42b41672 -i 2 --mkfs --osd-journal-size > 8192 > 4) Start the OSD, as part of the start process, I verify that the > whoami and osd fsid agree (in case this disk came from a previous > cluster, somehow) - should be just a sanity check > 'ceph', 'osd', 'create', '32895846-ca1c-4265-9ce7-9f2a42b41672' > (Returns 1!) > > This is clearly a race condition because we have several cluster > creations without this happening and then this happens about once > every 8 times or so. Thoughts? That definitely sounds like a race. I'm not seeing it by inspection, though, and wasn't able to reproduce. Is it possible to capture a monitor log (debug ms = 1, debug mon = 20) of this occurring and share that? Thanks! sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html