Hi Oliver, I am also seeing this as a strange behavriour indeed! I was going through the logs and I was not able to find any errors or issues. There was also no slow/blocked requests that I could see during the recovery process. Does anyone has an idea what could be the issue here? I don't want to shut down all vms every time there is a new release with updated tunable values. Andrei ----- Original Message ----- > From: "Oliver Dzombic" <info@xxxxxxxxxxxxxxxxx> > To: "andrei" <andrei@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > Sent: Sunday, 19 June, 2016 10:14:35 > Subject: Re: cluster down during backfilling, Jewel tunables and client IO optimisations > Hi, > > so far the key values for that are: > > osd_client_op_priority = 63 ( anyway default, but i set it to remember it ) > osd_recovery_op_priority = 1 > > > In addition i set: > > osd_max_backfills = 1 > osd_recovery_max_active = 1 > > > ------------------- > > > But according to your settings its all ok. > > According to what you described, the problem was not the backfilling but > something else inside the cluster. Maybe something was blocked somewhere > and only a reset could help. The logs would might have given an answer > about that. > > -- > Mit freundlichen Gruessen / Best regards > > Oliver Dzombic > IP-Interactive > > mailto:info@xxxxxxxxxxxxxxxxx > > Anschrift: > > IP Interactive UG ( haftungsbeschraenkt ) > Zum Sonnenberg 1-3 > 63571 Gelnhausen > > HRB 93402 beim Amtsgericht Hanau > Geschäftsführung: Oliver Dzombic > > Steuer Nr.: 35 236 3622 1 > UST ID: DE274086107 > > > Am 18.06.2016 um 18:04 schrieb Andrei Mikhailovsky: >> Hello ceph users, >> >> I've recently upgraded my ceph cluster from Hammer to Jewel (10.2.1 and >> then 10.2.2). The cluster was running okay after the upgrade. I've >> decided to use the optimal tunables for Jewel as the ceph status was >> complaining about the straw version and my cluster settings were not >> optimal for jewel. I've not touched tunables since the Firefly release I >> think. After reading the release notes and the tunables section I have >> decided to set the crush tunables value to optimal. Taking into account >> that a few weeks ago I have done a /reweight/-by-/utilization /which has >> moved around about 8% of my cluster objects. This process has not caused >> any downtime and IO to the virtual machines was available. I have also >> altered several settings to prioritise client IO in case of repair and >> backfilling (see config show output below). >> >> Right, so, after i've set tunables to optimal value my cluster indicated >> that it needs to move around 61% of data in the cluster. The process >> started and I was seeing speeds of between 800MB/s - 1.5GB/s for >> recovery. My cluster is pretty small (3 osd servers with 30 osds in >> total). The load on the osd servers was pretty low. I was seeing a >> typical load of 4 spiking to around 10. The IO wait values on the osd >> servers were also pretty reasonable - around 5-15%. There were around >> 10-15 backfilling processes. >> >> About 10 minutes after the optimal tunables were set i've noticed that >> IO wait on the vms started to increase. Initially it was 15%, after >> another 10 mins or so it increased to around 50% and about 30-40 minutes >> later the iowait became 95-100% on all vms. Shortly after that the vms >> showed a bunch of hang tasks in dmesg output and shorly stopped >> responding all together. This kind of behaviour didn't happen after >> doing reweight-by-utilization, which i've done a few weeks prior. The >> vms IO wait during the reweithing was around 15-20% and there were no >> hanged tasks and all vms were running pretty well. >> >> I wasn't sure how to resolve the problem. On one hand I know that >> recovery and backfilling cause extra load on the cluster, but it should >> never break client IO. Afterall, this seems to negate one of the key >> points behind ceph - resilient storage cluster. Looking at the ceph -w >> output the client IO has decreased to 0-20 IOPs, where as a typical load >> that I see at that time of the day is around 700-1000 IOPs. >> >> The strange thing is that after the cluster has finished with data move >> (it took around 11 hours) the client IO was still not available! I was >> not able to start any new vms despite having OK health status and all >> PGs in active + clean state. This was pretty strange. All osd servers >> having almost 0 load, all PGs are active + clean, all osds are up and >> all mons are up, yet no client IO. The cluster became operational once >> again after a reboot of one of the osd servers, which seem to have >> brought the cluster to life. >> >> My question to the community is what ceph options should be implemented >> to make sure the client IO is _always_ available and has the highest >> priority during any recovery/migration/backfilling operations? >> >> My current settings, which i've gathered over the years from the advice >> of mailing list and irc members are: >> >> osd_recovery_max_chunk = 8388608 >> osd_recovery_op_priority = 1 >> osd_max_backfills = 1 >> osd_recovery_max_active = 1 >> osd_recovery_threads = 1 >> osd_disk_thread_ioprio_priority = 7 >> osd_disk_thread_ioprio_class = idle >> osd_scrub_chunk_min = 1 >> osd_scrub_chunk_max = 5 >> osd_deep_scrub_stride = 1048576 >> mon_osd_min_down_reporters = 6 >> mon_osd_report_timeout = 1800 >> mon_osd_min_down_reports = 7 >> osd_heartbeat_grace = 60 >> osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k,allocsize=4M" >> osd_mkfs_options_xfs = -f -i size=2048 >> filestore_max_sync_interval = 15 >> filestore_op_threads = 8 >> filestore_merge_threshold = 40 >> filestore_split_multiple = 8 >> osd_disk_threads = 8 >> osd_op_threads = 8 >> osd_pool_default_pg_num = 1024 >> osd_pool_default_pgp_num = 1024 >> osd_crush_update_on_start = false >> >> Many thanks >> >> Andrei >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com