Re: OSD memory leak?

Frank Schilder <frans@xxxxxx> · Tue, 14 Jul 2020 16:45:27 +0000

Hi Anthony and Mark,

thanks for your answers.

I have seen recommendations derived from test clusters with bluestore OSDs that read 16GB base line + 1GB per HDD + 4GB per SSD OSD, probably from the times when bluestore had a base-line+stress dependent. I would actually consider this already quite something. I understand that for high-performance requirements one adds RAM etc. to speed things up.

For a mostly cold data store with a thin layer of warm/hot data, however, this is quite a lot compared with what standard disk controllers can do with a cheap CPU, 4GB of RAM and 16 drives connected. Essentially, ceph is turning a server into a disk controller and it should be possible to run a configuration that does not require much more than an ordinary hardware controller per disk delivering reasonable performance. I'm thinking along the lines of 25MB/s throughput and maybe 10IOP/s per NL-SAS HDD OSD to the user side (simple collocated deployment, EC pool). This ought to be possible in a way similar to a RAID controller with comparably moderate hardware requirements.

Good aggregated performance then comes from scale and because the layer of hot data per disk is only a few GB per drive (a full re-write of just the hot data is only a few minutes). I thought this was the idea of ceph. Instead of trying to accommodate high-performance wishes for ridiculously small ceph clusters (I do see these "I have 3 servers with 3 disks each, why is it so slow" kind of complaints, which I would simply ignore), one talks about scale-out systems with thousands of OSDs. Something like 20 hosts serving 200 disks each would count as a small cluster. If the warm/hot data is only 1% or even less, such a system will be quite satisfying.

For low-cost scale-out we have ceph. For performance, we have technologies like Lustre (which by the way has much more moderate minimum hardware requirements).

For anything that requires higher performance one can then start using tiering, WAL/DB devices, SSD only pools, lots of RAM, whatever. However, there should be a stable, well-tested and low-demanding base line config for a cold store use case with hardware requirements similar to a NAS box per storage unit (one server+JBODs). I start missing support for the latter. 2 or even 4GB and 1core-GHz per HDD is really a lot compared with such systems.

Please don't take this as a start of a long discussion. Its just a wish from my side to have low-demanding configs available that scale easily and are easy to administrate at an overall low cost.

I will look into memory profiling of some OSDs. It doesn't look like a performance killer.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Anthony D'Atri <anthony.datri@xxxxxxxxx>
Sent: 14 July 2020 17:29
To: ceph-users@xxxxxxx
Subject:  Re: OSD memory leak?

>>  In the past, the minimum recommendation was 1GB RAM per HDD blue store OSD.

There was a rule of thumb of 1GB RAM *per TB* of HDD Filestore OSD, perhaps you were influenced by that?
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx