Re: SSD Sizing for DB/WAL: 4% for large drives?

Konstantin Shalygin <k0ste@xxxxxxxx> · Wed, 29 May 2019 13:25:36 +0700

        We have a similar setup, but 24 disks and 2x P4800X. And the 375GB NVME 
drives are _not_ large enough:

2019-05-29 07:00:00.000108 mon.bcf-03 [WRN] overall HEALTH_WARN BlueFS 
spillover detected on 22 OSD(s)

root at bcf-10:~# parted /dev/nvme0n1 print
Model: NVMe Device (nvme)
Disk /dev/nvme0n1: 375GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
  1      1049kB  31.1GB  31.1GB
  2      31.1GB  62.3GB  31.1GB
  3      62.3GB  93.4GB  31.1GB
  4      93.4GB  125GB   31.1GB
  5      125GB   156GB   31.1GB
  6      156GB   187GB   31.1GB
  7      187GB   218GB   31.1GB
  8      218GB   249GB   31.1GB
  9      249GB   280GB   31.1GB
10      280GB   311GB   31.1GB
11      311GB   343GB   31.1GB
12      343GB   375GB   32.6GB

The second NVME has the same partition layout. The twelfth partition is 
actually large enough to hold all the data, but the other 11 partitions 
on this drive are a little bit too small. I'm still trying to calculate 
the exact sweet spot....

With 24 OSDs and two of them having a just-large-enough-db-partition, I 
end up with 22 OSD not fully using their db partition and spilling over 
into the slow disk...exactly as reported by ceph.

Details for one of the affected OSDs:

     "bluefs": {
         "gift_bytes": 0,
         "reclaim_bytes": 0,
         "db_total_bytes": 31138504704,
         "db_used_bytes": 2782912512,
         "wal_total_bytes": 0,
         "wal_used_bytes": 0,
         "slow_total_bytes": 320062095360,
         "slow_used_bytes": 5838471168,
         "num_files": 135,
         "log_bytes": 13295616,
         "log_compactions": 9,
         "logged_bytes": 338104320,
         "files_written_wal": 2,
         "files_written_sst": 5066,
         "bytes_written_wal": 375879721287,
         "bytes_written_sst": 227201938586,
         "bytes_written_slow": 65162240000,
         "max_bytes_wal": 0,
         "max_bytes_db": 5265940480,
         "max_bytes_slow": 7540310016
     },

Maybe it's just matter of shifting some megabytes. We are about to 
deploy more of these nodes, so I would be grateful if anyone can comment 
on the correct size of the DB partitions. Otherwise I'll have to use a 
RAID-0 for two drives.

Regards,

      Your block.db is 29Gb, should be 30Gb to prevent spillover to slow
      backend.

    k

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com