Re: Luminous with osd flapping, slow requests when deep scrubbing

Andrei Mikhailovsky <andrei@xxxxxxxxxx> · Tue, 16 Oct 2018 14:09:23 +0100 (BST)

Hi Christian,

----- Original Message -----
> From: "Christian Balzer" <chibi@xxxxxxx>
> To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Cc: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
> Sent: Tuesday, 16 October, 2018 08:51:36
> Subject: Re:  Luminous with osd flapping, slow requests when deep scrubbing

> Hello,
> 
> On Mon, 15 Oct 2018 12:26:50 +0100 (BST) Andrei Mikhailovsky wrote:
> 
>> Hello,
>> 
>> I am currently running Luminous 12.2.8 on Ubuntu with 4.15.0-36-generic kernel
>> from the official ubuntu repo. The cluster has 4 mon + osd servers. Each osd
>> server has the total of 9 spinning osds and 1 ssd for the hdd and ssd pools.
>> The hdds are backed by the S3710 ssds for journaling with a ration of 1:5. The
>> ssd pool osds are not using external journals. Ceph is used as a Primary
>> storage for Cloudstack - all vm disk images are stored on the cluster.
>>
> 
> For the record, are you seeing the flapping only on HDD pools or with SSD
> pools as well?
> 

I think so, this tend to happen to the HDD pool.

> When migrating to Bluestore, did you see this starting to happen before
> the migration was complete (and just on Bluestore OSDs of course)?
> 

Nope, not that I can recall. I did have some issues with performance initially, but I've added a few temp disks to the servers to help with the free space. The cluster was well unhappy when the usage spiked above 90% on some of the osds. After the temp disks were in place, the cluster was back to being a happy.

> What's your HW like, in particular RAM? Current output of "free"?

Each of the mon/osd servers has 64GB of ram. Currently, one of the server's mem usage is (it has been restarted 30 mins ago):

root@arh-ibstorage4-ib:/home/andrei# free -h
              total        used        free      shared  buff/cache   available
Mem:            62G         11G         50G         10M        575M         49G
Swap:           45G          0B         45G

The servers with 24 hours uptime have a similar picture, but a slightly larger used amount.

> 
> If you didn't tune your bluestore cache you're likely just using a
> fraction of the RAM for caching, making things a LOT harder for OSDs to
> keep up when compared to filestore and the global (per node) page cache.
> 

I haven't done any bluestore cache changes at all after moving to the bluestore type. Could you please point me in the right direction?

> See the various bluestore cache threads here, one quite recently.
> 
> If your cluster was close to the brink with filestore just moving it to
> bluestore would nicely fit into what you're seeing, especially for the
> high stress and cache bypassing bluestore deep scrubbing.
> 

I have put in place the following config settings in the [global] section:

# Settings to try to minimise IO client impact / slow requests / osd flapping from scrubbing and snap trimming
osd_scrub_chunk_min = 1
osd_scrub_chunk_max = 5
#osd_scrub_begin_hour = 21
#osd_scrub_end_hour = 5
osd_scrub_sleep = 0.1
osd_scrub_max_interval = 1209600
osd_scrub_min_interval = 259200
osd_deep_scrub_interval = 1209600
osd_deep_scrub_stride = 1048576
osd_scrub_priority = 1
osd_snap_trim_priority = 1

Following the restart of the servers and doing a few tests by manually invoking 6 deep scrubbing processes I haven't seen any more issues with osd flapping or the slow requests. I will keep an eye on it over the next few weeks to see if the issue is resolved.

> Regards,
> 
> Christian
>> I have recently migrated all osds to the bluestore, which was a long process
>> with ups and downs, but I am happy to say that the migration is done. During
>> the migration I've disabled the scrubbing (both deep and standard). After
>> reenabling the scrubbing I have noticed the cluster started having a large
>> number of slow requests and poor client IO (to the point of vms stall for
>> minutes). Further investigation showed that the slow requests happen because of
>> the osds flapping. In a single day my logs have over 1000 entries which report
>> osd going down. This effects random osds. Disabling deep-scrubbing stabilises
>> the cluster and the osds are no longer flap and the slow requests disappear. As
>> a short term solution I've disabled the deepscurbbing, but was hoping to fix
>> the issues with your help.
>> 
>> At the moment, I am running the cluster with default settings apart from the
>> following settings:
>> 
>> [global]
>> osd_disk_thread_ioprio_priority = 7
>> osd_disk_thread_ioprio_class = idle
>> osd_recovery_op_priority = 1
>> 
>> [osd]
>> debug_ms = 0
>> debug_auth = 0
>> debug_osd = 0
>> debug_bluestore = 0
>> debug_bluefs = 0
>> debug_bdev = 0
>> debug_rocksdb = 0
>> 
>> 
>> Could you share experiences with deep scrubbing of bluestore osds? Are there any
>> options that I should set to make sure the osds are not flapping and the client
>> IO is still available?
>> 
>> Thanks
>> 
>> Andrei
> 
> 
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com