Applicability and migration path

Matthew Pounsett <matt@xxxxxxxxxxxxx> · Thu, 9 Aug 2018 22:32:55 -0400

I'm looking for some high level information about the usefulness of ceph to a particular use case and, assuming it's considered a good choice, whether the migration path I have in mind has any particular gotchas that I should be on the look out for.
The current situation is that I've inherited responsibility for a set of large-ish file servers, each having a filesystem between 80TB and 130TB (20 disks each of varying sizes).  It's probably not important, but for completeness, the volumes vary from file server to file server; some are ZFS pools, others are MD/LVM2 software RAIDs.   The file servers are using NFS to share data among a set of other servers that need access.   In addition to the block duplication provided by the volumes, data is also duplicated between multiple file servers.

I want to replace the setup with ceph, or something like it, for several reasons:  more efficient use of space (minimum 4x duplication from the combination RAID and cross-chassis copies (worse for raidz2) could be reduced), more complete use of disks (datasets are large and not easily split, so a dataset can't be moved to a file server unless there's space, so available storage is inefficiently used), more predictable access (a single CephFS mount with a single copy of each dataset, rather than trying to figure out which copy of a dataset to use from which NFS mount).. the list goes on.

So that's what I'm working with.. on to the questions..

First, in my tests and reading I haven't encountered anything that suggests I should expect problems from using a small number of large file servers in a cluster.  But I recognize that this isn't the preferred configuration, and I'm wondering if I should be worried about any operational issues.  Obviously I/O won't be as good as it could be, but I don't expect it would be worse than software RAID served over NFS.  Is there anything there I've missed?  Eventually the plan will be to swap out the large file servers for a larger number of smaller servers, but that will take years of regular hardware refresh cycles.

Second, in order to migrate the current setup to Ceph I'd need to vacate a file server, convert it over and create the first OSDs, move data onto the new Ceph filesystem vacating the next file server, convert that and add it to the cluster, and so on like dominos until they're all converted over and joined to the cluster.  Is there any problem with this migration plan?  I think the only thing I'm not clear on is whether Ceph will automatically do cross-chassis block duplication and rebalance data as I convert file servers over and add on new OSDs. Is there any problem there?  Anything else I need to watch out for?

Thanks for any comments.  I've tried to sort out as much of this as I can in testing, but I can't lab this at the scale of the eventual production deployment because of the capital cost involved, and I recognize that there's a chance there are nuances I won't spot until it's too late.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com