Re: Ceph for "home lab" / hobbyist use?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I run Ceph on both a home server and a personal offsite backup server (both single-host setups). It's definitely feasible and comes with a lot of advantages over traditional RAID and ZFS and the like. The main disadvantages are performance overhead and resource consumption.

On 07/09/2019 06.16, William Ferrell wrote:
They're about $50 each, can boot from MicroSD or eMMC flash (basically
an SSD with a custom connector), and have one SATA port. They have
8-core 32-bit CPUs, 2GB of RAM and a gigabit ethernet port. Four of
them (including disks) can run off a single 12V/8A power adapter
(basically 100 watts per set of 4). The obvious appeal is price, plus
they're stackable so they'd be easy to hide away in a closet.

Is it feasible for these to work as OSDs at all? The Ceph hardware
recommendations page suggests OSDs need 1GB per TB of space, so does
this mean these wouldn't be suitable with, say, a 4TB or 8TB disk? Or
would they work, but just more slowly?

2GB seems tight, but *should* work if you're literally running an OSD and only an OSD on the thing.

I use a 32GB RAM server to run 64TB worth of raw storage (8x8TB SATA disks), plus mon/mds, plus a bunch of unrelated applications and servers, routing, and even a desktop UI (I'll soon be splitting off some of these duties, since this box has grown into way too much of a kitchen sink). It used to be 16GB when I first moved to Ceph, and that juuust about worked but it was tight. So the 1GB/1TB recommendation is ample, 1GB/2TB works well, and 1GB/4TB is tight.

I configure my OSDs for a 1.7GB memory target, so that should work on your 2GB RAM board, but it doesn't give you much headroom for increased consumption during recovery. To be safe I'd set them up for 1.2GB or so target on a board like yours.

I would recommend keeping your PG count low and relying on the Ceph balancer to even out your disk usage. Keep in mind that the true number of PGs is your PG count multiplied by your pool width: that's your replica count for replicated pools, or your erasure code k+m width for EC pools. I use 8 PGs for my metadata pool (x3 replication = 24 PGs, 3 per OSD) and 64 PGs for my data pool (x 5.2 RS profile = 448 PG, 56 per OSD). All my data is on CephFS on my home setup.

If you use EC and since your memory usage is tight, you might want to drop that a bit, e.g. target something like 24 PGs per OSD. For utilization balance, as long as your overall PG count (multiplied by pool width) is a multiple of your OSD count, you should be able to achieve a "perfect" distribution when using the balancer (otherwise you'll be off by +/- 1 PG, where more PGs makes that a smaller imbalance).

Pushing my luck further (assuming the HC2 can handle OSD duties at
all), is that enough muscle to run the monitor and/or metadata
servers? Should monitors and MDS's be run separately, or can/should
they piggyback on hosts running OSDs?

The MDS will need RAM to cache your metadata (and you want it to, because it makes performance *way* better). I would definitely keep it well away from such tight OSDs. In fact the MDS in my setup uses more RAM than any given OSD, about 2GB or so (the cache size is also configurable). If you have fewer large files then resource consumption will be lower (I have a mixed workload with most of the space taken up by large files, but I have a few trees full of tiny ones, totaling several million files). You might get away with dedicating a separate identical board to the MDS. Maybe consider multi-mds, but I've found that during recovery it adds some phases that eat up even more RAM, so that might not be a good idea.

The mon isn't as bad, but I wouldn't push it and try to co-host it on such limited hosts. Mine's sitting at about ~1GB res.

In either case, keep some larger x86 box that you have the software installed on and can plug in some disks into around. In the worst case, if you end up in an out-of-memory loop, you can move e.g. the MDS or some OSDs to that machine and bring the cluster back to health.

I'd be perfectly happy with a setup like this even if it could only
achieve speeds in the 20-30MB/sec range.

I can't speak for small ARM boards, but my performance (on a quad-core Haswell i5 hosting everything) is somewhere around half of what I'd get on a RAID6 over the same storage, using ~equivalent RS 5.2 encoding and dm-crypt (AES-NI) under the OSDs. Since you'd be running a single OSD per host, I imagine you should be able to get reasonable aggregate performance out of the whole thing, but I've never tried a setup like that.

I'm actually considering this kind of thing in the future (moving from one monolithic server to a more cluster-like setup) but it's just an idea for now.

--
Hector Martin (hector@xxxxxxxxxxxxxx)
Public Key: https://mrcn.st/pub
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux