Thanks for the information, Burkhard. My current setup shows a bunch of these warnings (24 osds with spillover out of 36 which have wal/db on the ssd): osd.36 spilled over 1.9 GiB metadata from 'db' device (7.2 GiB used of 30 GiB) to slow device osd.37 spilled over 13 GiB metadata from 'db' device (4.2 GiB used of 30 GiB) to slow device osd.44 spilled over 26 GiB metadata from 'db' device (13 GiB used of 30 GiB) to slow device osd.45 spilled over 33 GiB metadata from 'db' device (10 GiB used of 30 GiB) to slow device osd.46 spilled over 37 GiB metadata from 'db' device (8.8 GiB used of 30 GiB) to slow device >From the above for example, osd.36 is a 3TB disk and osd.45 is 10TB disk. I was hoping to address those spillovers with the upgrade too, if it means increasing the ssd space. Currently we've got WAL of 1GB and DB is 30GB. Am I right in understanding that in case of osd.46 the DB size should be at least 67GB to stop the spillover (30 + 37)? Cheers Andrei ----- Original Message ----- > From: "Burkhard Linke" <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> > To: "ceph-users" <ceph-users@xxxxxxx> > Sent: Wednesday, 1 July, 2020 13:09:34 > Subject: Re: Advice on SSD choices for WAL/DB? > Hi, > > On 7/1/20 1:57 PM, Andrei Mikhailovsky wrote: >> Hello, >> >> We are planning to perform a small upgrade to our cluster and slowly start >> adding 12TB SATA HDD drives. We need to accommodate for additional SSD WAL/DB >> requirements as well. Currently we are considering the following: >> >> HDD Drives - Seagate EXOS 12TB >> SSD Drives for WAL/DB - Intel D3 S4510 960GB or Intel D3 S4610 960GB >> >> Our cluster isn't hosting any IO intensive DBs nor IO hungry VMs such as >> Exchange, MSSQL, etc. >> >> From the documentation that I've read the recommended size for DB is between 1% >> and 4% of the size of the osd. Would 2% figure be sufficient enough (so around >> 240GB DB size for each 12TB osd?) > > > The documentation is wrong. Rocksdb uses different levels to store data, > and need to store each level either completely in the DB partition or on > the data partition. There have been a number of mail threads about the > correct sizing. > > > In your case the best size would be 30GB for the DB part + the WAL size > (usually 2 GB). For compaction and other actions the ideal DB size needs > to be doubled, so you end up with 62GB per OSD. Larger DB partitions are > a waste of capacity, unless it can hold the next level (300GB per OSD). > > > If you have spare capacity on the SSD (>100GB) you can either leave it > untouched or create a small SSD based OSD for small pools that require a > lower latency, e.g. a small extra fast pool for RBD or the RGW > configuration pools. > >> >> Also, from your experience, which is a better model for the SSD DB/WAL? Would >> Intel S4510 be sufficient enough for our purpose or would the S4610 be a much >> better choice? Are there any other cost effective performance to consider >> instead of the above models? > > The SSD model should support fast sync writes, similar to the known > requirements for filestore journal SSDs. If your selected model is a > good fit according to the test methods, then it is probably also a good > choice for bluestore DBs. > > > Since not all data is written to the bluestore DB (no full data journal > in contrast to filestore), the amount of data written to the SSD is > probably lower. The DWPD requirements might be lower. To be on the safe > side, use the better model (higher DWPD / "write intensive") if possible. > > Regards, > > Burkhard > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx