Hi Christian, ----- Original Message ----- > From: "Christian Balzer" <chibi@xxxxxxx> > To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > Cc: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx> > Sent: Tuesday, 16 October, 2018 08:51:36 > Subject: Re: Luminous with osd flapping, slow requests when deep scrubbing > Hello, > > On Mon, 15 Oct 2018 12:26:50 +0100 (BST) Andrei Mikhailovsky wrote: > >> Hello, >> >> I am currently running Luminous 12.2.8 on Ubuntu with 4.15.0-36-generic kernel >> from the official ubuntu repo. The cluster has 4 mon + osd servers. Each osd >> server has the total of 9 spinning osds and 1 ssd for the hdd and ssd pools. >> The hdds are backed by the S3710 ssds for journaling with a ration of 1:5. The >> ssd pool osds are not using external journals. Ceph is used as a Primary >> storage for Cloudstack - all vm disk images are stored on the cluster. >> > > For the record, are you seeing the flapping only on HDD pools or with SSD > pools as well? > I think so, this tend to happen to the HDD pool. > When migrating to Bluestore, did you see this starting to happen before > the migration was complete (and just on Bluestore OSDs of course)? > Nope, not that I can recall. I did have some issues with performance initially, but I've added a few temp disks to the servers to help with the free space. The cluster was well unhappy when the usage spiked above 90% on some of the osds. After the temp disks were in place, the cluster was back to being a happy. > What's your HW like, in particular RAM? Current output of "free"? Each of the mon/osd servers has 64GB of ram. Currently, one of the server's mem usage is (it has been restarted 30 mins ago): root@arh-ibstorage4-ib:/home/andrei# free -h total used free shared buff/cache available Mem: 62G 11G 50G 10M 575M 49G Swap: 45G 0B 45G The servers with 24 hours uptime have a similar picture, but a slightly larger used amount. > > If you didn't tune your bluestore cache you're likely just using a > fraction of the RAM for caching, making things a LOT harder for OSDs to > keep up when compared to filestore and the global (per node) page cache. > I haven't done any bluestore cache changes at all after moving to the bluestore type. Could you please point me in the right direction? > See the various bluestore cache threads here, one quite recently. > > If your cluster was close to the brink with filestore just moving it to > bluestore would nicely fit into what you're seeing, especially for the > high stress and cache bypassing bluestore deep scrubbing. > I have put in place the following config settings in the [global] section: # Settings to try to minimise IO client impact / slow requests / osd flapping from scrubbing and snap trimming osd_scrub_chunk_min = 1 osd_scrub_chunk_max = 5 #osd_scrub_begin_hour = 21 #osd_scrub_end_hour = 5 osd_scrub_sleep = 0.1 osd_scrub_max_interval = 1209600 osd_scrub_min_interval = 259200 osd_deep_scrub_interval = 1209600 osd_deep_scrub_stride = 1048576 osd_scrub_priority = 1 osd_snap_trim_priority = 1 Following the restart of the servers and doing a few tests by manually invoking 6 deep scrubbing processes I haven't seen any more issues with osd flapping or the slow requests. I will keep an eye on it over the next few weeks to see if the issue is resolved. > Regards, > > Christian >> I have recently migrated all osds to the bluestore, which was a long process >> with ups and downs, but I am happy to say that the migration is done. During >> the migration I've disabled the scrubbing (both deep and standard). After >> reenabling the scrubbing I have noticed the cluster started having a large >> number of slow requests and poor client IO (to the point of vms stall for >> minutes). Further investigation showed that the slow requests happen because of >> the osds flapping. In a single day my logs have over 1000 entries which report >> osd going down. This effects random osds. Disabling deep-scrubbing stabilises >> the cluster and the osds are no longer flap and the slow requests disappear. As >> a short term solution I've disabled the deepscurbbing, but was hoping to fix >> the issues with your help. >> >> At the moment, I am running the cluster with default settings apart from the >> following settings: >> >> [global] >> osd_disk_thread_ioprio_priority = 7 >> osd_disk_thread_ioprio_class = idle >> osd_recovery_op_priority = 1 >> >> [osd] >> debug_ms = 0 >> debug_auth = 0 >> debug_osd = 0 >> debug_bluestore = 0 >> debug_bluefs = 0 >> debug_bdev = 0 >> debug_rocksdb = 0 >> >> >> Could you share experiences with deep scrubbing of bluestore osds? Are there any >> options that I should set to make sure the osds are not flapping and the client >> IO is still available? >> >> Thanks >> >> Andrei > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Rakuten Communications _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com