Re: Slow Write Issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If you look at your used bytes its only 2.5GB on the NVME due to the way RocksDB works with levels it cant store the next level on the NVME as the levels increase by a factor of 10.

You need to look at the slow_used_bytes which says that you have used 4GB of space on your HDD. The other number is the space it has currently reserved.

Increasing the RocksDB and WAL to around 70GB per disk would allow that all to go on the NVME.

Darren




On 26/09/2019, 15:39, "jvsoares@binario.cloud" <jvsoares@binario.cloud> wrote:

    I see...
    
    That is the info of one of my spinning disks:
    
        "bluefs": {
            "gift_bytes": 0,
            "reclaim_bytes": 0,
            "db_total_bytes": 30169620480,
            "db_used_bytes": 2517630976,
            "wal_total_bytes": 1073737728,
            "wal_used_bytes": 524288000,
            "slow_total_bytes": 400033841152,
            "slow_used_bytes": 3996123136,
            "num_files": 583,
            "log_bytes": 7798784,
            "log_compactions": 39,
            "logged_bytes": 786444288,
            "files_written_wal": 2,
            "files_written_sst": 48742,
            "bytes_written_wal": 2410376267722,
            "bytes_written_sst": 2620043565235
        },
    
    On my solid disks, I don't have any issues of slow_
    
        "bluefs": {
            "gift_bytes": 0,
            "reclaim_bytes": 0,
            "db_total_bytes": 153631064064,
            "db_used_bytes": 6822035456,
            "wal_total_bytes": 0,
            "wal_used_bytes": 0,
            "slow_total_bytes": 0,
            "slow_used_bytes": 0,
            "num_files": 250,
            "log_bytes": 16420864,
            "log_compactions": 511,
            "logged_bytes": 9285406720,
            "files_written_wal": 2,
            "files_written_sst": 79316,
            "bytes_written_wal": 4393750671932,
            "bytes_written_sst": 4626359292945
        },
    
    
    In my understanding, the solid disks manage the db automatically, so it was reserved about 150GB of 3.84TB total disk.
    
    My spinning disk is showing about 384GB of slow_total_bytes, while I only have dedicated 33GB of nvme disk for each raw disk.
    
    Well, I believe I have to think about increasing my db partitions.
    
    Thank you for your feedback, Darren!
    
    Joao Victor R Soares
    
    
    Darren Soothill wrote:
    > HI Joao,
    > 
    > You can see how much RocksDB space has been used with this command “ceph daemon osd.X perf
    > dump” Where X is an OSD id on the node you are running the command on.
    > 
    > You are looking for this section in the output :-
    >     "bluefs": {
    >         "gift_bytes": 0,
    >         "reclaim_bytes": 0,
    >         "db_total_bytes": 23966253056,
    >         "db_used_bytes": 1714421760,
    >         "wal_total_bytes": 0,
    >         "wal_used_bytes": 0,
    >         "slow_total_bytes": 0,
    >         "slow_used_bytes": 0,
    >         "num_files": 24,
    >         "log_bytes": 552120320,
    >         "log_compactions": 0,
    >         "logged_bytes": 537051136,
    >         "files_written_wal": 1,
    >         "files_written_sst": 11,
    >         "bytes_written_wal": 429315193,
    >         "bytes_written_sst": 601384180,
    >         "bytes_written_slow": 0,
    >         "max_bytes_wal": 0,
    >         "max_bytes_db": 1714421760,
    >         "max_bytes_slow": 0
    >     },
    > 
    > If you have numbers in the slow_ entries then your RocksDB is spilling over onto the
    > HDD.
    > 
    > As to if moving RocksDb and WAL on HDD can cause a performance degradation then it depends
    > how busy your disks are. If you HDD’s are working hard and you are now going to throw a
    > lot more workload onto them then performance will degrade. Could be substantially. I have
    > seen performance impacts of upto 75% when things have started spilling over from NVME to
    > HDD.
    > By that I mean I had a lovely flat line ingesting objects and that line suddenly dropped
    > by 75% once the RocksDB had filled up and spilt over onto the HDD.
    > 
    > 
    > 
    > 
    > From: João Victor Rodrigues Soares <jvrs2683(a)gmail.com&gt;
    > Date: Wednesday, 25 September 2019 at 14:37
    > To: &quot;ceph-users(a)ceph.io&quot; <ceph-users(a)ceph.io&gt;
    > Subject:  Slow Write Issues
    > 
    > Hello,
    > 
    > In my company, we currently have the following infrastructure:
    > 
    > - Ceph Luminous
    > - OpenStack Pike.
    > 
    > We have a cluster of 3 osd nodes with the following configuration:
    > 
    > - 1 x Xeon (R) D-2146NT CPU @ 2.30GHz
    > - 128GB RAM
    > - 128GB ROOT DISK
    > - 12 x 10TB SATA ST10000NM0146 (OSD)
    > - 1 x Intel Optane P4800X SSD DC 375GB (block.DB / block.wal)
    > - Ubuntu 16.04
    > - 2 X 10Gb network interface configured with lacp
    > 
    > 
    > The compute nodes have
    > - 4 x 10Gb network interfaces with lacp.
    > 
    > We also have 4 monitors with:
    > - 4 x 10Gb lacp network interfaces.
    > - The monitor nodes are approx. 90% cpu idle time with 32GB / 256GB available RAM
    > 
    > For each OSD disk we have created a partition of 33GB to block.db and block.wal.
    > 
    > We are recently facing a number of performance issues. Virtual machines created in
    > OpenStack are experiencing slow writing issues (approx. 50MB / s).
    > 
    > The OSD nodes monitoring incur an average of 20% cpu IOwait time and 70 cpu idle time.
    > The memory consumption is around 30% consumption.
    > We have no latency issues (9ms average)
    > 
    > My question is if what is happening may have to do with the amount of disk dedicated to DB
    > / WAL. In the CEPH documentation it says it is recommended that the block.db size is not
    > smaller than 4% of block.
    > 
    > In this case for each disk in my environment block.db could not be less than 400GB /
    > OSD.
    > 
    > Another question is if I set my disks to use block.db / block.wal on the mechanical disks
    > themselves, if that could lead to a performance degradation.
    > 
    > Att.
    > João Victor Rodrigues Soares
    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    To unsubscribe send an email to ceph-users-leave@xxxxxxx
    

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux