On Tuesday, March 6, 2012 at 1:19 PM, Tommi Virtanen wrote: > As you may have noticed, the docs [1] and Chef cookbooks [2] currently > use /srv/osd.$id and such paths. That's, shall we say, Not Ideal(tm). > > [1] http://ceph.newdream.net/docs/latest/ops/install/mkcephfs/#creating-a-ceph-conf-file > [2] https://github.com/ceph/ceph-cookbooks/blob/master/ceph/recipes/bootstrap_osd.rb#L70 > > > I initially used /srv purely because I needed to get them going quick, > and that directory was guaranteed to exist. Let's figure out the long > term goal. > > The kinds of things we have: > > - configuration, edited by humans (ONLY) > - machine-editable state similar to configuration > - OSD data is typically a dedicated filesystem, accommodate that > - OSD journal can be just about any file, including block devices > > OSD journal flexibility is limiting for automation.. support three > major use cases: > > - OSD journal may be fixed-basename file inside osd data directory > - OSD journal may be a file on a shared SSD > - OSD journal may be a block device (e.g. full SSD, partition on SSD, > 2nd LUN on the same RAID with different tuning) > > Requirements: > > - FHS compliant: http://www.pathname.com/fhs/ > - works well with Debian and RPM packaging > - OSD creation/teardown is completely automated > - ceph.conf is static for the whole cluster; not edited by per-machine > automation > - we're assuming GPT partitions, at least for not > > Desirable things: > > - ability to isolate daemons from each other more, e.g. > AppArmor/SELinux/different uids; e.g. do not assume all daemons can > mkdir in the same directory (ceph-mon vs ceph-osd) > - ability to move OSD data disk from server A to server B (e.g. > chassis swap due to faulty mother board) > > > The Plan (ta-daaa!): > > (These will be just the defaults -- if you're hand-rolling your setup, > and disagree, just override them.) > > (Apologies if this gets sketchy, I haven't had time to distill these > thoughts into something prettier.) > > - FHS says human-editable configuration goes in /etc > - FHS says machine-editable state goes in /var/lib/ceph > - use /var/lib/ceph/mon/$id/ for mon.$id > - use /var/lib/ceph/osd-journal/$id for osd.$id journal; symlink to > actual location > - use /var/lib/ceph/osd-data/$id for osd.$id data; may be a symlink to > actual location? > - embed the same random UUID in osd data & osd journal at ceph-osd > mkfs time, for safety > > On a disk hot plug event (and at bootup): > - found = {} > - scan the partitions for partition label with the prefix > "ceph-osd-data-". Take the remaining portion as $id and mount the fs > in /var/lib/ceph/osd-data/$id. Add $id to found (TODO handle > pre-existing). if osd-data/$id/journal exists, symlink osd-journal/$id > to it (TODO handle pre-existing). > - scan for partition label with the prefix "ceph-osd-journal-" and > special GUID type. Take the remaining portion as $id and symlink the > block device to /var/lib/ceph/osd-journal/$id. Add $id to found. (TODO > handle pre-existing) > - for each $id in found, if we have both osd-journal and osd-data, > start a ceph-osd for it > > > Moving journal > > As an admin, I want to move an OSD data disk from one physical host > (chassis) to another (e.g. for maintenance of non-hotswap power > supply). > I might have a single SSD, divided into multiple partitions, each > acting as the journal for a single OSD data disk. I want to spread the > load evenly across the rest of the cluster, so I move the OSD data > disks to multiple destination machines, as long as they have 1 slot > free. Naturally, I cannot easily saw the SSD apart and move it > physically. > > I would like to be able to: > > 1. shut down the osd daemon > 2. explicitly flush out & invalidate the journal on SSD (after this, > the journal would not be marked with the osd id and fsid anymore) > 3. move the HDD > 4. on the new host, assign a blank SSD partition and initialize it > with the right fsid etc metadata I have no thoughts on the rest of it, but I believe what you're asking for here is the existing ceph-osd --flushjournal Although this doesn't invalidate the existing journal (now, at least), it will let you do prototyping without much difficulty. :) -Greg > > It may actually be nicer to think of this as: > > 1. shut down the osd daemon > 2. move the journal inside the osd data dir, invalidate the old one > (flushing it is an optimization) > 3. physically move the HDD > 4. move the journal from inside the osd data dir to assigned block device > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx (mailto:majordomo@xxxxxxxxxxxxxxx) > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html