Re: Recovery took too long on cuttlefish

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 13 Nov 2013 09:51:32 -0800



How did you generate these scenarios? At first glance it looks to me
like you've got very low limits set on how many PGs an OSD can be
recovering at once, and in the first example they were all targeted to
that one OSD, while in the second they were distributed.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Wed, Nov 13, 2013 at 3:00 AM, Andrey Korolyov <andrey@xxxxxxx> wrote:
> Hello,
>
> Using 5c65e1ee3932a021cfd900a74cdc1d43b9103f0f with
> large amount of data commit and relatively low PG rate,
> I`ve observed unexplainable long recovery times for PGs
> even if the degraded object count is almost zero:
>
> 04:44:42.521896 mon.0 [INF] pgmap v24807947: 2048 pgs: 911 active+clean,
> 1131 active+recovery_wait, 6 active+recovering; 5389 GB data, 16455 GB
> used, 87692 GB / 101 TB avail; 5839KB/s rd, 2986KB/s wr, 567op/s;
> 865/4162926 degraded (0.021%);  recovering 2 o/s, 9251KB/s
>
> at this moment we have freshly restarted cluster and large amount of PGs
> in recovery_wait state; after a couple of minutes picture changes a little:
>
> 2013-11-13 05:30:18.020093 mon.0 [INF] pgmap v24809483: 2048 pgs: 939
> active+clean, 1105 active+recovery_wait, 4 active+recovering; 5394 GB
> data, 16472 GB used, 87676 GB / 101 TB avail; 1627KB/s rd, 3866KB/s wr,
> 1499op/s; 2456/4167201 degraded (0.059%)
>
> and after a couple of hours we`re reaching a peak by degraded objects,
> PGs still moving to active+clean:
>
> 2013-11-13 10:05:36.245917 mon.0 [INF] pgmap v24816326: 2048 pgs: 1191
> active+clean, 854 active+recovery_wait, 3 active+recovering; 5467 GB
> data, 16690 GB used, 87457 GB / 101 TB avail; 16339KB/s rd, 18006KB/s
> wr, 16025op/s; 23495/4223061 degraded (0.556%)
>
> After peak was passed, object count starts to decrease and seems cluster
> will reach completely clean state in next ten hours.
>
> For example, with PG count ten times higher recovery goes way faster:
>
> 2013-11-05 03:20:56.330767 mon.0 [INF] pgmap v24143721: 27648 pgs: 26171
> active+clean, 1474 active+recovery_wait, 3 active+recovering; 7855 GB
> data, 25609 GB used, 78538 GB / 101 TB avail; 3
> 298KB/s rd, 7746KB/s wr, 3581op/s; 183/6554634 degraded (0.003%)
>
> 2013-11-05 04:04:55.779345 mon.0 [INF] pgmap v24145291: 27648 pgs: 27646
> active+clean, 1 active+recovery_wait, 1 active+recovering; 7857 GB data,
> 25615 GB used, 78533 GB / 101 TB avail; 999KB/s rd, 690KB/s wr, 563op/s
>
> Recovery and backfill settings was the same during all tests:
>   "osd_max_backfills": "1",
>   "osd_backfill_full_ratio": "0.85",
>   "osd_backfill_retry_interval": "10",
>   "osd_recovery_threads": "1",
>   "osd_recover_clone_overlap": "true",
>   "osd_backfill_scan_min": "64",
>   "osd_backfill_scan_max": "512",
>   "osd_recovery_thread_timeout": "30",
>   "osd_recovery_delay_start": "300",
>   "osd_recovery_max_active": "5",
>   "osd_recovery_max_chunk": "8388608",
>   "osd_recovery_forget_lost_objects": "false",
>   "osd_kill_backfill_at": "0",
>   "osd_debug_skip_full_check_in_backfill_reservation": "false",
>   "osd_recovery_op_priority": "10",
>
>
> Also during recovery some heartbeats may miss, it is not related to the
> current situation but observed for a very long time(for now, seems
> four-seconds delays between heartbeats distributed almost randomly over
> a time flow):
>
> 2013-11-13 14:57:11.316459 mon.0 [INF] pgmap v24826822: 2048 pgs: 1513
> active+clean, 529 active+recovery_wait, 6 active+recovering; 5469 GB
> data, 16708 GB used, 87440 GB / 101 TB avail; 16098KB/s rd, 4085KB/s wr,
> 623op/s; 15670/4227330 degraded (0.371%)
> 2013-11-13 14:57:12.328538 mon.0 [INF] pgmap v24826823: 2048 pgs: 1513
> active+clean, 529 active+recovery_wait, 6 active+recovering; 5469 GB
> data, 16708 GB used, 87440 GB / 101 TB avail; 3806KB/s rd, 3446KB/s wr,
> 284op/s; 15670/4227330 degraded (0.371%)
> 2013-11-13 14:57:13.336618 mon.0 [INF] pgmap v24826824: 2048 pgs: 1513
> active+clean, 529 active+recovery_wait, 6 active+recovering; 5469 GB
> data, 16708 GB used, 87440 GB / 101 TB avail; 11051KB/s rd, 12171KB/s
> wr, 1470op/s; 15670/4227330 degraded (0.371%)
> 2013-11-13 14:57:16.317271 mon.0 [INF] pgmap v24826825: 2048 pgs: 1513
> active+clean, 529 active+recovery_wait, 6 active+recovering; 5469 GB
> data, 16708 GB used, 87440 GB / 101 TB avail; 3610KB/s rd, 3171KB/s wr,
> 1820op/s; 15670/4227330 degraded (0.371%)
> 2013-11-13 14:57:17.366554 mon.0 [INF] pgmap v24826826: 2048 pgs: 1513
> active+clean, 529 active+recovery_wait, 6 active+recovering; 5469 GB
> data, 16708 GB used, 87440 GB / 101 TB avail; 11323KB/s rd, 1759KB/s wr,
> 13195op/s; 15670/4227330 degraded (0.371%)
> 2013-11-13 14:57:18.379340 mon.0 [INF] pgmap v24826827: 2048 pgs: 1513
> active+clean, 529 active+recovery_wait, 6 active+recovering; 5469 GB
> data, 16708 GB used, 87440 GB / 101 TB avail; 38113KB/s rd, 7461KB/s wr,
> 46511op/s; 15670/4227330 degraded (0.371%)
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com