Re: Slow Write Issues

Darren Soothill <darren.soothill@xxxxxxxx> · Thu, 26 Sep 2019 13:52:02 +0000

HI Joao,

You can see how much RocksDB space has been used with this command “ceph daemon osd.X perf dump” Where X is an OSD id on the node you are running the command on.

You are looking for this section in the output :-
    "bluefs": {
        "gift_bytes": 0,
        "reclaim_bytes": 0,
        "db_total_bytes": 23966253056,
        "db_used_bytes": 1714421760,
        "wal_total_bytes": 0,
        "wal_used_bytes": 0,
        "slow_total_bytes": 0,
        "slow_used_bytes": 0,
        "num_files": 24,
        "log_bytes": 552120320,
        "log_compactions": 0,
        "logged_bytes": 537051136,
        "files_written_wal": 1,
        "files_written_sst": 11,
        "bytes_written_wal": 429315193,
        "bytes_written_sst": 601384180,
        "bytes_written_slow": 0,
        "max_bytes_wal": 0,
        "max_bytes_db": 1714421760,
        "max_bytes_slow": 0
    },

If you have numbers in the slow_ entries then your RocksDB is spilling over onto the HDD.

As to if moving RocksDb and WAL on HDD can cause a performance degradation then it depends how busy your disks are. If you HDD’s are working hard and you are now going to throw a lot more workload
 onto them then performance will degrade. Could be substantially. I have seen performance impacts of upto 75% when things have started spilling over from NVME to HDD.
By that I mean I had a lovely flat line ingesting objects and that line suddenly dropped by 75% once the RocksDB had filled up and spilt over onto the HDD.

From: João Victor Rodrigues Soares <jvrs2683@xxxxxxxxx>

Date: Wednesday, 25 September 2019 at 14:37

To: "ceph-users@xxxxxxx" <ceph-users@xxxxxxx>

Subject: [ceph-users] Slow Write Issues

Hello,

In my company, we currently have the following infrastructure:

- Ceph Luminous

- OpenStack Pike.

We have a cluster of 3 osd nodes with the following configuration:

- 1 x Xeon (R) D-2146NT CPU @ 2.30GHz

- 128GB RAM

- 128GB ROOT DISK

- 12 x 10TB SATA ST10000NM0146 (OSD)

- 1 x Intel Optane P4800X SSD DC 375GB (block.DB / block.wal)

- Ubuntu 16.04

- 2 X 10Gb network interface configured with lacp

The compute nodes have

- 4 x 10Gb network interfaces with lacp.

We also have 4 monitors with:

- 4 x 10Gb lacp network interfaces.

- The monitor nodes are approx. 90% cpu idle time with 32GB / 256GB available RAM

For each OSD disk we have created a partition of 33GB to block.db and block.wal.

We are recently facing a number of performance issues. Virtual machines created in OpenStack are experiencing slow writing issues (approx. 50MB / s).

The OSD nodes monitoring incur an average of 20% cpu IOwait time and 70 cpu idle time.

The memory consumption is around 30% consumption.

We have no latency issues (9ms average)

My question is if what is happening may have to do with the amount of disk dedicated to DB / WAL. In the CEPH documentation it says it is recommended that the block.db size is not smaller than 4% of block.

In this case for each disk in my environment block.db could not be less than 400GB / OSD.

Another question is if I set my disks to use block.db / block.wal on the mechanical disks themselves, if that could lead to a performance degradation.

Att.

João Victor Rodrigues Soares

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx