Re: Slow Write Issues

jvsoares@binario.cloud · Thu, 26 Sep 2019 14:38:25 -0000

I see...

That is the info of one of my spinning disks:

    "bluefs": {
        "gift_bytes": 0,
        "reclaim_bytes": 0,
        "db_total_bytes": 30169620480,
        "db_used_bytes": 2517630976,
        "wal_total_bytes": 1073737728,
        "wal_used_bytes": 524288000,
        "slow_total_bytes": 400033841152,
        "slow_used_bytes": 3996123136,
        "num_files": 583,
        "log_bytes": 7798784,
        "log_compactions": 39,
        "logged_bytes": 786444288,
        "files_written_wal": 2,
        "files_written_sst": 48742,
        "bytes_written_wal": 2410376267722,
        "bytes_written_sst": 2620043565235
    },

On my solid disks, I don't have any issues of slow_

    "bluefs": {
        "gift_bytes": 0,
        "reclaim_bytes": 0,
        "db_total_bytes": 153631064064,
        "db_used_bytes": 6822035456,
        "wal_total_bytes": 0,
        "wal_used_bytes": 0,
        "slow_total_bytes": 0,
        "slow_used_bytes": 0,
        "num_files": 250,
        "log_bytes": 16420864,
        "log_compactions": 511,
        "logged_bytes": 9285406720,
        "files_written_wal": 2,
        "files_written_sst": 79316,
        "bytes_written_wal": 4393750671932,
        "bytes_written_sst": 4626359292945
    },

In my understanding, the solid disks manage the db automatically, so it was reserved about 150GB of 3.84TB total disk.

My spinning disk is showing about 384GB of slow_total_bytes, while I only have dedicated 33GB of nvme disk for each raw disk.

Well, I believe I have to think about increasing my db partitions.

Thank you for your feedback, Darren!

Joao Victor R Soares

Darren Soothill wrote:
> HI Joao,
> 
> You can see how much RocksDB space has been used with this command “ceph daemon osd.X perf
> dump” Where X is an OSD id on the node you are running the command on.
> 
> You are looking for this section in the output :-
>     "bluefs": {
>         "gift_bytes": 0,
>         "reclaim_bytes": 0,
>         "db_total_bytes": 23966253056,
>         "db_used_bytes": 1714421760,
>         "wal_total_bytes": 0,
>         "wal_used_bytes": 0,
>         "slow_total_bytes": 0,
>         "slow_used_bytes": 0,
>         "num_files": 24,
>         "log_bytes": 552120320,
>         "log_compactions": 0,
>         "logged_bytes": 537051136,
>         "files_written_wal": 1,
>         "files_written_sst": 11,
>         "bytes_written_wal": 429315193,
>         "bytes_written_sst": 601384180,
>         "bytes_written_slow": 0,
>         "max_bytes_wal": 0,
>         "max_bytes_db": 1714421760,
>         "max_bytes_slow": 0
>     },
> 
> If you have numbers in the slow_ entries then your RocksDB is spilling over onto the
> HDD.
> 
> As to if moving RocksDb and WAL on HDD can cause a performance degradation then it depends
> how busy your disks are. If you HDD’s are working hard and you are now going to throw a
> lot more workload onto them then performance will degrade. Could be substantially. I have
> seen performance impacts of upto 75% when things have started spilling over from NVME to
> HDD.
> By that I mean I had a lovely flat line ingesting objects and that line suddenly dropped
> by 75% once the RocksDB had filled up and spilt over onto the HDD.
> 
> 
> 
> 
> From: João Victor Rodrigues Soares <jvrs2683(a)gmail.com&gt;
> Date: Wednesday, 25 September 2019 at 14:37
> To: &quot;ceph-users(a)ceph.io&quot; <ceph-users(a)ceph.io&gt;
> Subject:  Slow Write Issues
> 
> Hello,
> 
> In my company, we currently have the following infrastructure:
> 
> - Ceph Luminous
> - OpenStack Pike.
> 
> We have a cluster of 3 osd nodes with the following configuration:
> 
> - 1 x Xeon (R) D-2146NT CPU @ 2.30GHz
> - 128GB RAM
> - 128GB ROOT DISK
> - 12 x 10TB SATA ST10000NM0146 (OSD)
> - 1 x Intel Optane P4800X SSD DC 375GB (block.DB / block.wal)
> - Ubuntu 16.04
> - 2 X 10Gb network interface configured with lacp
> 
> 
> The compute nodes have
> - 4 x 10Gb network interfaces with lacp.
> 
> We also have 4 monitors with:
> - 4 x 10Gb lacp network interfaces.
> - The monitor nodes are approx. 90% cpu idle time with 32GB / 256GB available RAM
> 
> For each OSD disk we have created a partition of 33GB to block.db and block.wal.
> 
> We are recently facing a number of performance issues. Virtual machines created in
> OpenStack are experiencing slow writing issues (approx. 50MB / s).
> 
> The OSD nodes monitoring incur an average of 20% cpu IOwait time and 70 cpu idle time.
> The memory consumption is around 30% consumption.
> We have no latency issues (9ms average)
> 
> My question is if what is happening may have to do with the amount of disk dedicated to DB
> / WAL. In the CEPH documentation it says it is recommended that the block.db size is not
> smaller than 4% of block.
> 
> In this case for each disk in my environment block.db could not be less than 400GB /
> OSD.
> 
> Another question is if I set my disks to use block.db / block.wal on the mechanical disks
> themselves, if that could lead to a performance degradation.
> 
> Att.
> João Victor Rodrigues Soares
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx