On Wed, 19 Aug 2015, Varada Kari wrote: > Hi all, > > This is regarding generalizing the ceph-disk to work with different osd backends like FileStore, KeyValueStore and NewStore etc... > All these object store implementations has different needs on the disk being used for holding data and Meta data. > Sage suggest in the one of the pull requests for the requirements what ceph-disk should satisfy to generalize the ceph-disk to handle all the backends optimally. From the current implementation of supported object store backends below are requirements ceph-disk is expected to perform. > > FileStore: > 1. Needs a partition/disk for the FileSystem > 2. Needs a partition/disk for the Journal > 3. Additionally if we can make the omap(Leveldb/RocksDB to be on separate partition depending on the backend medium being used HDD or SSD. > > NewStore: > 1. Needs a File System on a disk/partition > 2. Optionally needs a file system depending on backend DB used(LevelDB/RocksDB ...) for the journal > 3. Optionally needs a file system on a faster medium for the warm levels to hold the data. > > KeyValueStore: > 1. Needs a small partition to hold metadata of the OSD on a file system. > 2. Needs a partition/disk to hold data. Some backends need a file system, some can work of a raw partition/disk. > 3. Optionally may need a partition to hold the cache or journal > > Please add any of the details if I had missed. > > Ideally, ceph-disk should make a decision depending on the input given by the user through conf file or some options to ceph-disk in a manual deployment. Inputs from user can be what kind of file system need to be created, file system size, device to be created on etc... in case of a file store. > Similarly for KeyValueStore, backend can work on raw partition or a disk else if would need a file system to work. > > Quoting Sage again here. > Alternatively, we could say that it's the admin's job to express to ceph-disk what kind of OSD it should create (backend type, secondary fs's or partitions, etc.) instead of inferring that from the environment. In that case, we'd could > * make a generic way to specify which backend to use in the osd_data dir > * make sure all secondary devices or file systems are symlinked from the osd_data dir, the way the journal is today. This could be in a backend-specific way. e.g., FileStore wants the journal (to bdev) link, NewStore wants a db_wal link (to small + fast fs) link, etc. > * we could create uuid types for each secondary device type. A raw block dev would work just like ceph-disk activate-journal. A new uuid would be for secondary fs's, which would mount and then trigger ceph-disk activate-slave-fs DEV or similar. > * ceph-disk activate[-] can ensure that *all symlinks in the data dir resolve to real things (all devices or secondary fs's are mounted) before starting ceph-osd. > > Will be making the changes once we agree on requirements and > implementation specifics. Please correct me if I had understood wrong. I think the trick here is to figure out how to describe these requirements. I think it ought to be some structured thing ceph-osd can spit out for a given backend that says what it needs. For example, for filestore, { "data": { "type": "fs", "min_size": 10485760, "max_size": 1000000000000000, # whatever "preferred_size": 100000000000000000, "required": true }, "journal": { "type": "block", "min_size": 10485760, "max_size": 104857600, "preferred_size": 40960000, "required": false, "preferred": true }, } Then ceph-disk can be fed the devices to use based on those names. e.g., ceph-disk prepare objectstore=filestore data=/dev/sda journal=/dev/sdb Or for your KV backend, { "data": { "type": "fs", "min_size": 10485760, "max_size": 10485760, "preferred_size": 10485760, "required": true }, "kvdata": { "type": "block", "min_size": 10485760, "max_size": 1000000000000000, # whatever "preferred_size": 100000000000000000, "required": true }, "journal": { "type": "block", "min_size": 10485760, "max_size": 104857600, "preferred_size": 40960000, "required": false, "preferred": false }, } ceph-disk prepare objectstore=keyvaluestore data=/dev/sda kvdata=/dev/sda journal=/dev/sdb The ceph-disk logic would create partitions on the given devices as needed, trying for preferred size but doing what it needs to to make it fit. If something is required/preferred but not specified (e.g., with filestore's journal) it'll use the same device as the other stuff, so that the filestore case coudl simplify to ceph-disk prepare objectstore=filestore data=/dev/sda or whatever. Would something like this be general enough to capture the possibilities and still do everything we need it to? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html