Snapshot removed, cluster thrashed...

Marco Gaiarin <gaio@xxxxxxxxx> · Mon, 26 Jun 2017 11:36:16 +0200

I've hitted some strange things in my ceph cluster, and i'm asking some
fedback here.

Some cluster info: 3 nodes, 12 OSD (4 per node, symmetrical), size=3.
Proxmox based, still on hammer, so used for RBD only.
The cluser, was bult using some spare server, and there's a node that
are 'underpowered', so a bit of stress while there's some load is
expected. ;(

Last week i've used by the first time the snapshot feature. I've done
some test, before, on some ''spare'' VM doing snapshot on a powered off
VM (as expected, was merely istantaneus) and on a powered on one
(clearly, snapshotting the RAM pose some stress on that VM, but not so
much on the overral system, as expected).
I've also do some test of deleting the snapshot created, but some
minute after i've done that snapshot, and nothing relevant happens.

Friday, after 18.00 local time so with very little load on the system,
i've removed the snapshot i've done on my principal VM a week before:

 Jun 23 18:10:10 thor pvedaemon[39766]: <gaio@PASIAN> starting task UPID:thor:0000A0E2:09D67B5F:594D3D62:qmdelsnapshot:107:gaio@PASIAN:
 Jun 23 18:10:10 thor pvedaemon[41186]: <gaio@PASIAN> delete snapshot VM 107: Jessie
 [...]
 Jun 23 18:11:58 thor pvedaemon[39766]: <gaio@PASIAN> end task UPID:thor:0000A0E2:09D67B5F:594D3D62:qmdelsnapshot:107:gaio@PASIAN: OK

and the suddenly *ALL* the system get trashed, ceph go in HEALTH_WARN
status and so on:

 2017-06-23 18:12:24.941538 mon.0 10.27.251.7:6789/0 1408585 : cluster [INF] pgmap v17394099: 768 pgs: 768 active+clean; 2356 GB data, 7172 GB used, 9586 GB / 16758 GB avail; 176 kB/s rd, 2936 kB/s wr, 46 op/s
 2017-06-23 18:12:26.020387 mon.0 10.27.251.7:6789/0 1408586 : cluster [INF] pgmap v17394100: 768 pgs: 768 active+clean; 2356 GB data, 7171 GB used, 9586 GB / 16758 GB avail; 176 kB/s rd, 2913 kB/s wr, 48 op/s
 2017-06-23 18:12:27.086199 mon.0 10.27.251.7:6789/0 1408587 : cluster [INF] pgmap v17394101: 768 pgs: 768 active+clean; 2356 GB data, 7171 GB used, 9587 GB / 16758 GB avail; 7582 B/s rd, 36017 B/s wr, 11 op/s
 2017-06-23 18:12:28.147427 mon.0 10.27.251.7:6789/0 1408588 : cluster [INF] pgmap v17394102: 768 pgs: 768 active+clean; 2356 GB data, 7170 GB used, 9588 GB / 16758 GB avail; 5742 B/s rd, 36366 B/s wr, 9 op/s
 2017-06-23 18:12:28.588263 osd.11 10.27.251.7:6800/3810 1511 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.083296 secs
 2017-06-23 18:12:28.588337 osd.11 10.27.251.7:6800/3810 1512 : cluster [WRN] slow request 30.083296 seconds old, received at 2017-06-23 18:11:58.504270: osd_op(client.37904158.0:60595026 rbd_data.4384f22ae8944a.0000000000000042 [set-alloc-hint object_size 4194304 write_size 4194304,write 1384448~16384] 1.6e0a2eb snapc 2=[] ack+ondisk+write+known_if_redirected e4132) currently waiting for subops from 2,6
 2017-06-23 18:12:28.955268 osd.6 10.27.251.9:6800/4523 3167 : cluster [WRN] 2 slow requests, 2 included below; oldest blocked for > 30.449628 secs
 2017-06-23 18:12:28.955454 osd.6 10.27.251.9:6800/4523 3168 : cluster [WRN] slow request 30.340693 seconds old, received at 2017-06-23 18:11:58.614354: osd_op(client.37904104.1:731096 rbd_data.723c0238e1f29.0000000000000600 [set-alloc-hint object_size 4194304 write_size 4194304,write 16384~4096] 3.d31445db ondisk+write e4132) currently waiting for subops from 3,11
 2017-06-23 18:12:28.955461 osd.6 10.27.251.9:6800/4523 3169 : cluster [WRN] slow request 30.449628 seconds old, received at 2017-06-23 18:11:58.505420: osd_op(client.37904158.0:60595031 rbd_data.4384f22ae8944a.0000000000006881 [set-alloc-hint object_size 4194304 write_size 4194304,write 1646592~4096] 1.2610cd9b snapc 2=[] ack+ondisk+write+known_if_redirected e4132) currently waiting for subops from 1,3
 2017-06-23 18:12:29.588719 osd.11 10.27.251.7:6800/3810 1513 : cluster [WRN] 23 slow requests, 22 included below; oldest blocked for > 31.084351 secs
 2017-06-23 18:12:29.588729 osd.11 10.27.251.7:6800/3810 1514 : cluster [WRN] slow request 30.974787 seconds old, received at 2017-06-23 18:11:58.613834: osd_op(client.37904104.1:731102 rbd_data.723c0238e1f29.0000000000000800 [set-alloc-hint object_size 4194304 write_size 4194304,write 409600~4096] 3.338471cd ondisk+write e4132) currently waiting for subops from 4,7
 2017-06-23 18:12:29.588738 osd.11 10.27.251.7:6800/3810 1515 : cluster [WRN] slow request 30.974705 seconds old, received at 2017-06-23 18:11:58.613916: osd_op(client.37904104.1:731103 rbd_data.723c0238e1f29.0000000000000800 [set-alloc-hint object_size 4194304 write_size 4194304,write 483328~4096] 3.338471cd ondisk+write e4132) currently waiting for subops from 4,7
 2017-06-23 18:12:29.588744 osd.11 10.27.251.7:6800/3810 1516 : cluster [WRN] slow request 30.974635 seconds old, received at 2017-06-23 18:11:58.613986: osd_op(client.37904104.1:731104 rbd_data.723c0238e1f29.0000000000000800 [set-alloc-hint object_size 4194304 write_size 4194304,write 974848~4096] 3.338471cd ondisk+write e4132) currently waiting for subops from 4,7
 2017-06-23 18:12:29.588749 osd.11 10.27.251.7:6800/3810 1517 : cluster [WRN] slow request 30.974565 seconds old, received at 2017-06-23 18:11:58.614057: osd_op(client.37904104.1:731105 rbd_data.723c0238e1f29.0000000000000800 [set-alloc-hint object_size 4194304 write_size 4194304,write 995328~4096] 3.338471cd ondisk+write e4132) currently waiting for subops from 4,7
 2017-06-23 18:12:29.588754 osd.11 10.27.251.7:6800/3810 1518 : cluster [WRN] slow request 30.974481 seconds old, received at 2017-06-23 18:11:58.614140: osd_op(client.37904104.1:731106 rbd_data.723c0238e1f29.0000000000000800 [set-alloc-hint object_size 4194304 write_size 4194304,write 1052672~4096] 3.338471cd ondisk+write e4132) currently waiting for subops from 4,7
 2017-06-23 18:12:30.241211 mon.0 10.27.251.7:6789/0 1408589 : cluster [INF] pgmap v17394103: 768 pgs: 768 active+clean; 2356 GB data, 7169 GB used, 9589 GB / 16758 GB avail; 29802 B/s rd, 1125 kB/s wr, 12 op/s
 2017-06-23 18:12:29.116927 osd.9 10.27.251.9:6812/14449 1353 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.599687 secs
 [...]
 2017-06-23 18:12:33.941663 mon.0 10.27.251.7:6789/0 1408593 : cluster [INF] HEALTH_WARN; 46 requests are blocked > 32 sec
 2017-06-23 18:12:35.539814 mon.0 10.27.251.7:6789/0 1408594 : cluster [INF] pgmap v17394107: 768 pgs: 768 active+clean; 2355 GB data, 7167 GB used, 9591 GB / 16758 GB avail; 988 kB/s wr, 9 op/s
 2017-06-23 18:12:29.549916 osd.10 10.27.251.7:6812/4486 1501 : cluster [WRN] 4 slow requests, 4 included below; oldest blocked for > 30.936068 secs
 2017-06-23 18:12:29.550229 osd.10 10.27.251.7:6812/4486 1502 : cluster [WRN] slow request 30.936068 seconds old, received at 2017-06-23 18:11:58.613542: osd_op(client.37904104.1:731091 rbd_data.723c0238e1f29.0000000000000200 [set-alloc-hint object_size 4194304 write_size 4194304,write 65536~4096] 3.a4f93bb0 ondisk+write e4132) currently waiting for subops from 3,6
 2017-06-23 18:12:29.550238 osd.10 10.27.251.7:6812/4486 1503 : cluster [WRN] slow request 30.935936 seconds old, received at 2017-06-23 18:11:58.613674: osd_op(client.37904104.1:731092 rbd_data.723c0238e1f29.0000000000000200 [set-alloc-hint object_size 4194304 write_size 4194304,write 131072~4096] 3.a4f93bb0 ondisk+write e4132) currently waiting for subops from 3,6
 2017-06-23 18:12:29.550243 osd.10 10.27.251.7:6812/4486 1504 : cluster [WRN] slow request 30.935169 seconds old, received at 2017-06-23 18:11:58.614441: osd_op(client.37904104.1:731093 rbd_data.723c0238e1f29.0000000000000200 [set-alloc-hint object_size 4194304 write_size 4194304,write 745472~4096] 3.a4f93bb0 ondisk+write e4132) currently waiting for subops from 3,6
 2017-06-23 18:12:29.550247 osd.10 10.27.251.7:6812/4486 1505 : cluster [WRN] slow request 30.935066 seconds old, received at 2017-06-23 18:11:58.614544: osd_op(client.37904104.1:731094 rbd_data.723c0238e1f29.0000000000000200 [set-alloc-hint object_size 4194304 write_size 4194304,write 966656~4096] 3.a4f93bb0 ondisk+write e4132) currently waiting for subops from 3,6
 2017-06-23 18:12:30.550561 osd.10 10.27.251.7:6812/4486 1506 : cluster [WRN] 5 slow requests, 1 included below; oldest blocked for > 31.936964 secs
 2017-06-23 18:12:30.550569 osd.10 10.27.251.7:6812/4486 1507 : cluster [WRN] slow request 30.436836 seconds old, received at 2017-06-23 18:12:00.113670: osd_op(client.37904104.1:731124 rbd_data.723c0238e1f29.0000000000000435 [set-alloc-hint object_size 4194304 write_size 4194304,write 3936256~28672] 3.7ddab187 ondisk+write e4132) currently waiting for subops from 3,7
 2017-06-23 18:12:34.552135 osd.10 10.27.251.7:6812/4486 1508 : cluster [WRN] 12 slow requests, 7 included below; oldest blocked for > 35.938502 secs

and takes 10 minutes to come back to HEALTH_OK:

 2017-06-23 18:19:30.568317 osd.3 10.27.251.8:6812/18013 4172 : cluster [WRN] 2 slow requests, 1 included below; oldest blocked for > 160.806750 secs
 2017-06-23 18:19:30.568399 osd.3 10.27.251.8:6812/18013 4173 : cluster [WRN] slow request 60.332478 seconds old, received at 2017-06-23 18:18:30.235779: osd_op(client.37904139.0:30499048 rbd_data.21f7ba238e1 f29.000000000001e8c2 [set-alloc-hint object_size 4194304 write_size 4194304,write 335360~8192] 4.735cdd7b snapc 5=[5] ack+ondisk+write+known_if_redirected e4137) currently no flag points reached
 2017-06-23 18:19:40.994385 mon.0 10.27.251.7:6789/0 1408933 : cluster [INF] pgmap v17394401: 768 pgs: 768 active+clean; 2314 GB data, 7016 GB used, 9742 GB / 16758 GB avail; 863 kB/s rd, 349 kB/s wr, 112 op/s
 2017-06-23 18:19:42.063219 mon.0 10.27.251.7:6789/0 1408934 : cluster [INF] pgmap v17394402: 768 pgs: 768 active+clean; 2314 GB data, 7016 GB used, 9742 GB / 16758 GB avail; 354 kB/s rd, 695 kB/s wr, 40 op/s
 2017-06-23 18:19:43.106629 mon.0 10.27.251.7:6789/0 1408935 : cluster [INF] pgmap v17394403: 768 pgs: 768 active+clean; 2314 GB data, 7016 GB used, 9742 GB / 16758 GB avail; 1918 B/s rd, 925 kB/s wr, 34 op/s
 2017-06-23 18:19:44.212196 mon.0 10.27.251.7:6789/0 1408936 : cluster [INF] pgmap v17394404: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 477 B/s rd, 63935 B/s wr, 17 op/s
 2017-06-23 18:19:46.322367 mon.0 10.27.251.7:6789/0 1408937 : cluster [INF] pgmap v17394405: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 319 B/s rd, 71906 B/s wr, 17 op/s
 2017-06-23 18:19:47.385412 mon.0 10.27.251.7:6789/0 1408938 : cluster [INF] pgmap v17394406: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 6434 B/s rd, 379 kB/s wr, 27 op/s
 2017-06-23 18:19:49.458938 mon.0 10.27.251.7:6789/0 1408939 : cluster [INF] pgmap v17394407: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 6498 B/s rd, 338 kB/s wr, 15 op/s
 2017-06-23 18:19:50.568192 mon.0 10.27.251.7:6789/0 1408940 : cluster [INF] pgmap v17394408: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 2589 B/s rd, 12625 B/s wr, 3 op/s
 2017-06-23 18:19:51.660889 mon.0 10.27.251.7:6789/0 1408941 : cluster [INF] pgmap v17394409: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 28382 B/s rd, 276 kB/s wr, 52 op/s
 2017-06-23 18:19:52.735827 mon.0 10.27.251.7:6789/0 1408942 : cluster [INF] pgmap v17394410: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 47066 B/s rd, 2849 kB/s wr, 166 op/s
 2017-06-23 18:19:53.802365 mon.0 10.27.251.7:6789/0 1408943 : cluster [INF] pgmap v17394411: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 51028 B/s rd, 4144 kB/s wr, 174 op/s
 2017-06-23 18:19:56.814558 mon.0 10.27.251.7:6789/0 1408945 : cluster [INF] pgmap v17394412: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 136 kB/s rd, 1159 kB/s wr, 52 op/s
 2017-06-23 18:19:58.034610 mon.0 10.27.251.7:6789/0 1408946 : cluster [INF] pgmap v17394413: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 112 kB/s rd, 989 kB/s wr, 116 op/s
 2017-06-23 18:19:59.112915 mon.0 10.27.251.7:6789/0 1408947 : cluster [INF] pgmap v17394414: 768 pgs: 768 active+clean; 2314 GB data, 7015 GB used, 9742 GB / 16758 GB avail; 43695 B/s rd, 1380 kB/s wr, 146 op/s
 2017-06-23 18:20:00.223605 mon.0 10.27.251.7:6789/0 1408948 : cluster [INF] HEALTH_OK

Three question:

a) while a 'snapshot remove' action put system on load?

b) as for options like:

	osd scrub during recovery = false
        osd recovery op priority = 1
        osd recovery max active = 5
        osd max backfills = 1

 (for recovery), there are option to reduce the impact of a stapshot
 remove?

c) snapshot are handled differently from other IO ops, or doing some
 similar things (eg, a restore from a backup) i've to expect some
 similar result?

Thanks.

-- 
dott. Marco Gaiarin				        GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''          http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

		Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
      http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
	(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com