Re: Braindump: path names, partition labels, FHS, auto-discovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday, March 6, 2012 at 1:19 PM, Tommi Virtanen wrote:
> As you may have noticed, the docs [1] and Chef cookbooks [2] currently
> use /srv/osd.$id and such paths. That's, shall we say, Not Ideal(tm).
> 
> [1] http://ceph.newdream.net/docs/latest/ops/install/mkcephfs/#creating-a-ceph-conf-file
> [2] https://github.com/ceph/ceph-cookbooks/blob/master/ceph/recipes/bootstrap_osd.rb#L70
> 
> 
> I initially used /srv purely because I needed to get them going quick,
> and that directory was guaranteed to exist. Let's figure out the long
> term goal.
> 
> The kinds of things we have:
> 
> - configuration, edited by humans (ONLY)
> - machine-editable state similar to configuration
> - OSD data is typically a dedicated filesystem, accommodate that
> - OSD journal can be just about any file, including block devices
> 
> OSD journal flexibility is limiting for automation.. support three
> major use cases:
> 
> - OSD journal may be fixed-basename file inside osd data directory
> - OSD journal may be a file on a shared SSD
> - OSD journal may be a block device (e.g. full SSD, partition on SSD,
> 2nd LUN on the same RAID with different tuning)
> 
> Requirements:
> 
> - FHS compliant: http://www.pathname.com/fhs/
> - works well with Debian and RPM packaging
> - OSD creation/teardown is completely automated
> - ceph.conf is static for the whole cluster; not edited by per-machine
> automation
> - we're assuming GPT partitions, at least for not
> 
> Desirable things:
> 
> - ability to isolate daemons from each other more, e.g.
> AppArmor/SELinux/different uids; e.g. do not assume all daemons can
> mkdir in the same directory (ceph-mon vs ceph-osd)
> - ability to move OSD data disk from server A to server B (e.g.
> chassis swap due to faulty mother board)
> 
> 
> The Plan (ta-daaa!):
> 
> (These will be just the defaults -- if you're hand-rolling your setup,
> and disagree, just override them.)
> 
> (Apologies if this gets sketchy, I haven't had time to distill these
> thoughts into something prettier.)
> 
> - FHS says human-editable configuration goes in /etc
> - FHS says machine-editable state goes in /var/lib/ceph
> - use /var/lib/ceph/mon/$id/ for mon.$id
> - use /var/lib/ceph/osd-journal/$id for osd.$id journal; symlink to
> actual location
> - use /var/lib/ceph/osd-data/$id for osd.$id data; may be a symlink to
> actual location?
> - embed the same random UUID in osd data & osd journal at ceph-osd
> mkfs time, for safety
> 
> On a disk hot plug event (and at bootup):
> - found = {}
> - scan the partitions for partition label with the prefix
> "ceph-osd-data-". Take the remaining portion as $id and mount the fs
> in /var/lib/ceph/osd-data/$id. Add $id to found (TODO handle
> pre-existing). if osd-data/$id/journal exists, symlink osd-journal/$id
> to it (TODO handle pre-existing).
> - scan for partition label with the prefix "ceph-osd-journal-" and
> special GUID type. Take the remaining portion as $id and symlink the
> block device to /var/lib/ceph/osd-journal/$id. Add $id to found. (TODO
> handle pre-existing)
> - for each $id in found, if we have both osd-journal and osd-data,
> start a ceph-osd for it
> 
> 
> Moving journal
> 
> As an admin, I want to move an OSD data disk from one physical host
> (chassis) to another (e.g. for maintenance of non-hotswap power
> supply).
> I might have a single SSD, divided into multiple partitions, each
> acting as the journal for a single OSD data disk. I want to spread the
> load evenly across the rest of the cluster, so I move the OSD data
> disks to multiple destination machines, as long as they have 1 slot
> free. Naturally, I cannot easily saw the SSD apart and move it
> physically.
> 
> I would like to be able to:
> 
> 1. shut down the osd daemon
> 2. explicitly flush out & invalidate the journal on SSD (after this,
> the journal would not be marked with the osd id and fsid anymore)
> 3. move the HDD
> 4. on the new host, assign a blank SSD partition and initialize it
> with the right fsid etc metadata

I have no thoughts on the rest of it, but I believe what you're asking for here is the existing
ceph-osd --flushjournal
Although this doesn't invalidate the existing journal (now, at least), it will let you do prototyping without
much difficulty. :)
-Greg

 
> 
> It may actually be nicer to think of this as:
> 
> 1. shut down the osd daemon
> 2. move the journal inside the osd data dir, invalidate the old one
> (flushing it is an optimization)
> 3. physically move the HDD
> 4. move the journal from inside the osd data dir to assigned block device
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx (mailto:majordomo@xxxxxxxxxxxxxxx)
> More majordomo info at http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux