Re: OT: How to Build a poor man's storage with ceph

Sebastian Knust <sknust@xxxxxxxxxxxxxxxxxxxxxxx> · Tue, 8 Jun 2021 12:58:21 +0200

Hi Michael,

On 08.06.21 11:38, Ml Ml wrote:
Now i was asked if i could also build a cheap 200-500TB Cluster
Storage, which should also scale. Just for Data Storage such as
NextCloud/OwnCloud.

With similar requirements (server primarily for Samba and NextCloud, 
some RBD use, very limited budget) I am using HDD for data and SSD for 
system and CephFS metadata.

Note that I am running NextCloud on CephFS storage. If you want to go 
with RGW/S3 as a storage backend instead, the following might not apply 
to your use case.

My nodes (bought end of 2020) are:
- 2U chassis with 12 3.5" SATA slots
- Intel Xeon Silver 4208
- 128 GB RAM
- 2 x 480 GB Samsung PM883 SSD
  -> 50 GB in MD-RAID1 for system
  -> 430 GB OSD (one per SSD)
- initially 6 x 14 TB Enterprise HDD
- 4 x 10 GBase-T (active/passive bonded, dedicated backend network)

Each node with this configuration cost about 4k EUR net at the end of 
2020. Due to increasing prices for storage, it will be a bit more 
expensive now. I am running five nodes now and have added a few more 
disks (ranging 8-14 TB), nearly filling up the nodes.

My experience so far:
- I had to throttle scrubbing (see below for details)
- For purely NextCloud and Samba performance is sufficient for a few 
hundred concurrent users with a handful of power users
- Migration of the mail server to this cluster was a disaster due to 
limited IOPS, had to add some more SSDs and place the mail server in an 
SSD-only pool.
- MDS needs a lot of memory for larger CephFS installs, I will move it 
to a dedicated server probably next year. 128 GB per node works but I 
would not recommend any less.
- Rebalancing takes an eternity (2-3 weeks), so make sure that your PG 
nums are okay from the start
- I have all but given up on snapshots with CephFS due to severe 
performance degradation with kernel client during backup

My scrubbing config looks like this:
osd_backfill_scan_max           16
osd_backfill_scan_min           4
osd_deep_scrub_interval         2592000.000000
osd_deep_scrub_randomize_ratio  0.030000
osd_recovery_max_active_hdd     1
osd_recovery_max_active_ssd     5
osd_recovery_sleep_hdd          0.050000
osd_scrub_begin_hour            18
osd_scrub_end_hour              7
osd_scrub_chunk_max             1
osd_scrub_chunk_min             1
osd_scrub_max_interval          2419200.000000
osd_scrub_min_interval          172800.000000
osd_scrub_sleep                 0.100000

My data is in a replicated pool with n=3 without compression. You might 
also consider EC and then want to aim for more nodes.

Cheers
Sebastian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx