requests are blocked - problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Our setup is this:

4 x OSD nodes:
E5-1630 CPU
32 GB RAM
Mellanox MT27520 56Gbps network cards
SATA controller LSI Logic SAS3008
Storage nodes are connected to SuperMicro chassis: 847E1C-R1K28JBOD
Each node has 2-3 spinning OSDs (6TB drives) and 2 ssd OSDs (240GB drives)
3 monitors running on OSD nodes
ceph hammer 0.94.2
Ubuntu 14.04
cache tier with ecpool (3+1)

We've ran some tests on the cluster and results were promising - speeds, iops etc as expected, but now we tried to use more than one client for testing and ran into some problems:

We've created a couple rbd images and mapped them on clients (kernel rbd) running two rsync processes and one dd on a large number of files (~ 250 GB of maildirs rsync'ed from one rbd image to the other and a dd process writing one big 2TB file on another rbd image)

And the speeds now are less than OK, plus we have a lot requests blocked warnings. We've left the processes to run over night but this morning I came and none of the processes finished - they are able to write data, but at a very, very slow rate.

Please help me diagnose this problem, everything seems to work, just very, very slow... when we ran the tests with fio (librbd engine) everything seemd fine.. I know that kernel implementation is slower, but is this normal? I can't understand why are so many requests blocked.

Some diagnostic data:

root@cf01:/var/log/ceph# ceph -s
    cluster 3469081f-9852-4b6e-b7ed-900e77c48bb5
     health HEALTH_WARN
            31 requests are blocked > 32 sec
monmap e1: 3 mons at {cf01=10.4.10.211:6789/0,cf02=10.4.10.212:6789/0,cf03=10.4.10.213:6789/0}
            election epoch 202, quorum 0,1,2 cf01,cf02,cf03
     osdmap e1319: 18 osds: 18 up, 18 in
      pgmap v933010: 2112 pgs, 19 pools, 10552 GB data, 2664 kobjects
            14379 GB used, 42812 GB / 57192 GB avail
                2111 active+clean
                   1 active+clean+scrubbing
  client io 0 B/s rd, 12896 kB/s wr, 35 op/s


root@cf01:/var/log/ceph# ceph health detail
HEALTH_WARN 23 requests are blocked > 32 sec; 6 osds have slow requests
1 ops are blocked > 131.072 sec
22 ops are blocked > 65.536 sec
1 ops are blocked > 65.536 sec on osd.2
1 ops are blocked > 65.536 sec on osd.3
1 ops are blocked > 65.536 sec on osd.4
1 ops are blocked > 131.072 sec on osd.7
18 ops are blocked > 65.536 sec on osd.10
1 ops are blocked > 65.536 sec on osd.12
6 osds have slow requests


root@cf01:/var/log/ceph# grep WRN ceph.log | tail -50
2015-08-19 10:23:34.505669 osd.14 10.4.10.213:6810/21207 17942 : cluster [WRN] 2 slow requests, 2 included below; oldest blocked for > 30.575870 secs 2015-08-19 10:23:34.505796 osd.14 10.4.10.213:6810/21207 17943 : cluster [WRN] slow request 30.575870 seconds old, received at 2015-08-19 10:23:03.929722: osd_op(client.9203.1:22822591 rbd_data.37e02ae8944a.00000000000180ca [set-alloc-hint object_size 4194304 write_size 4194304,write 0~462848] 5.1c2aff5f ondisk+write e1319) currently waiting for blocked object 2015-08-19 10:23:34.505803 osd.14 10.4.10.213:6810/21207 17944 : cluster [WRN] slow request 30.560009 seconds old, received at 2015-08-19 10:23:03.945583: osd_op(client.9203.1:22822592 rbd_data.37e02ae8944a.00000000000180ca [set-alloc-hint object_size 4194304 write_size 4194304,write 462848~524288] 5.1c2aff5f ondisk+write e1319) currently waiting for blocked object 2015-08-19 10:23:35.489927 osd.1 10.4.10.211:6812/9198 7921 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.112783 secs 2015-08-19 10:23:35.490326 osd.1 10.4.10.211:6812/9198 7922 : cluster [WRN] slow request 30.112783 seconds old, received at 2015-08-19 10:23:05.376339: osd_op(osd.14.1299:731492 rbd_data.37e02ae8944a.0000000000017a90 [copy-from ver 61293] 4.5aa9fb69 ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently commit_sent 2015-08-19 10:23:36.799861 osd.6 10.4.10.213:6806/22569 22663 : cluster [WRN] 2 slow requests, 1 included below; oldest blocked for > 60.414800 secs 2015-08-19 10:23:36.800018 osd.6 10.4.10.213:6806/22569 22664 : cluster [WRN] slow request 60.414800 seconds old, received at 2015-08-19 10:22:36.384453: osd_op(osd.11.1299:705465 rbd_data.37e02ae8944a.0000000000017cd2 [copy-from ver 61156] 4.c961fc05 ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently commit_sent 2015-08-19 10:23:34.652970 osd.7 10.4.10.213:6808/22888 21331 : cluster [WRN] 16 slow requests, 1 included below; oldest blocked for > 58.978274 secs 2015-08-19 10:23:34.653013 osd.7 10.4.10.213:6808/22888 21332 : cluster [WRN] slow request 30.703792 seconds old, received at 2015-08-19 10:23:03.948535: osd_op(osd.14.1299:731491 rbd_data.37e02ae8944a.00000000000180ca [copy-get max 8388608] 4.1c2aff5f ack+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e1319) currently started 2015-08-19 10:23:35.653883 osd.7 10.4.10.213:6808/22888 21333 : cluster [WRN] 17 slow requests, 3 included below; oldest blocked for > 59.979456 secs 2015-08-19 10:23:35.654260 osd.7 10.4.10.213:6808/22888 21334 : cluster [WRN] slow request 30.737485 seconds old, received at 2015-08-19 10:23:04.916024: MOSDECSubOpWrite(4.1aes1 1319 ECSubWrite(tid=508685, reqid=osd.15.1301:577673, at_version=1319'5556, trim_to=1318'2500, trim_rollback_to=1319'5554)) currently started 2015-08-19 10:23:35.654268 osd.7 10.4.10.213:6808/22888 21335 : cluster [WRN] slow request 30.533515 seconds old, received at 2015-08-19 10:23:05.119993: osd_op(osd.11.1299:705493 rbd_data.37e02ae8944a.0000000000017dcc [copy-from ver 60974] 4.4c845496 ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently commit_sent 2015-08-19 10:23:35.654272 osd.7 10.4.10.213:6808/22888 21336 : cluster [WRN] slow request 30.255396 seconds old, received at 2015-08-19 10:23:05.398113: MOSDECSubOpWrite(4.169s1 1319 ECSubWrite(tid=614933, reqid=osd.14.1299:731492, at_version=1319'5541, trim_to=1318'2500, trim_rollback_to=1319'5538)) currently started 2015-08-19 10:23:36.654759 osd.7 10.4.10.213:6808/22888 21337 : cluster [WRN] 16 slow requests, 2 included below; oldest blocked for > 60.980631 secs 2015-08-19 10:23:36.654766 osd.7 10.4.10.213:6808/22888 21338 : cluster [WRN] slow request 60.980631 seconds old, received at 2015-08-19 10:22:35.674053: MOSDECSubOpWrite(4.1a9s1 1319 ECSubWrite(tid=604142, reqid=osd.16.1272:599035, at_version=1319'5715, trim_to=1318'2700, trim_rollback_to=1319'5711)) currently started 2015-08-19 10:23:36.654772 osd.7 10.4.10.213:6808/22888 21339 : cluster [WRN] slow request 60.271394 seconds old, received at 2015-08-19 10:22:36.383289: MOSDECSubOpWrite(4.1a9s1 1319 ECSubWrite(tid=604143, reqid=osd.16.1272:599036, at_version=1319'5716, trim_to=1318'2700, trim_rollback_to=1319'5711)) currently started 2015-08-19 10:23:37.654933 osd.7 10.4.10.213:6808/22888 21340 : cluster [WRN] 15 slow requests, 1 included below; oldest blocked for > 61.980829 secs 2015-08-19 10:23:37.655100 osd.7 10.4.10.213:6808/22888 21341 : cluster [WRN] slow request 60.225464 seconds old, received at 2015-08-19 10:22:37.429417: MOSDECSubOpWrite(4.5s3 1319 ECSubWrite(tid=563247, reqid=osd.11.1299:705465, at_version=1319'5545, trim_to=1318'2500, trim_rollback_to=1319'5539)) currently started 2015-08-19 10:23:42.656604 osd.7 10.4.10.213:6808/22888 21342 : cluster [WRN] 10 slow requests, 1 included below; oldest blocked for > 66.272631 secs 2015-08-19 10:23:42.657055 osd.7 10.4.10.213:6808/22888 21343 : cluster [WRN] slow request 30.549952 seconds old, received at 2015-08-19 10:23:12.105968: osd_op(osd.14.1299:731499 rbd_data.37e02ae8944a.00000000000177ee [copy-from ver 66450] 4.a71f73bb ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently commit_sent 2015-08-19 10:23:35.164040 osd.2 10.4.10.211:6808/10427 7307 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.272625 secs 2015-08-19 10:23:35.164208 osd.2 10.4.10.211:6808/10427 7308 : cluster [WRN] slow request 30.272625 seconds old, received at 2015-08-19 10:23:04.890780: osd_op(osd.15.1301:577673 rbd_data.37e02ae8944a.0000000000017b2c [copy-from ver 61313] 4.217af5ae ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently commit_sent 2015-08-19 10:23:43.567637 osd.3 10.4.10.212:6804/31427 7180 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.708292 secs 2015-08-19 10:23:43.567645 osd.3 10.4.10.212:6804/31427 7181 : cluster [WRN] slow request 60.708292 seconds old, received at 2015-08-19 10:22:42.859163: osd_op(osd.14.1299:731455 rbd_data.37e02ae8944a.0000000000017f5c [copy-from ver 61192] 4.a6121781 ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently commit_sent 2015-08-19 10:23:36.762303 osd.4 10.4.10.212:6806/31912 7895 : cluster [WRN] 4 slow requests, 1 included below; oldest blocked for > 61.142907 secs 2015-08-19 10:23:36.762685 osd.4 10.4.10.212:6806/31912 7896 : cluster [WRN] slow request 60.400444 seconds old, received at 2015-08-19 10:22:36.360861: osd_op(osd.16.1272:599036 rbd_data.37e02ae8944a.0000000000017bf5 [copy-from ver 63134] 4.95eb47a9 ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently commit_sent 2015-08-19 10:23:43.765242 osd.4 10.4.10.212:6806/31912 7897 : cluster [WRN] 4 slow requests, 1 included below; oldest blocked for > 67.403887 secs 2015-08-19 10:23:43.765623 osd.4 10.4.10.212:6806/31912 7898 : cluster [WRN] slow request 30.849907 seconds old, received at 2015-08-19 10:23:12.914840: osd_op(osd.16.1272:599065 rbd_data.37e02ae8944a.00000000000180ae [copy-get max 8388608] 4.faca2da9 ack+read+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e1319) currently started 2015-08-19 10:23:45.493885 osd.1 10.4.10.211:6812/9198 7923 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 30.014812 secs 2015-08-19 10:23:45.494246 osd.1 10.4.10.211:6812/9198 7924 : cluster [WRN] slow request 30.014812 seconds old, received at 2015-08-19 10:23:15.478121: osd_op(osd.16.1272:599072 rbd_data.37e02ae8944a.0000000000017b1c [copy-from ver 65567] 4.bead06e1 ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently commit_sent 2015-08-19 10:23:46.803793 osd.6 10.4.10.213:6806/22569 22665 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 60.664609 secs 2015-08-19 10:23:46.803971 osd.6 10.4.10.213:6806/22569 22666 : cluster [WRN] slow request 60.664609 seconds old, received at 2015-08-19 10:22:46.138538: osd_op(osd.11.1299:705468 rbd_data.37e02ae8944a.0000000000017abe [copy-from ver 61142] 4.c0253e05 ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently commit_sent 2015-08-19 10:23:43.657718 osd.7 10.4.10.213:6808/22888 21344 : cluster [WRN] 12 slow requests, 3 included below; oldest blocked for > 67.273998 secs 2015-08-19 10:23:43.657725 osd.7 10.4.10.213:6808/22888 21345 : cluster [WRN] slow request 30.728099 seconds old, received at 2015-08-19 10:23:12.929188: osd_op(osd.11.1299:705499 rbd_data.37e02ae8944a.0000000000017d85 [copy-from ver 60964] 4.ba4f1896 ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently commit_sent 2015-08-19 10:23:43.657887 osd.7 10.4.10.213:6808/22888 21346 : cluster [WRN] slow request 30.675844 seconds old, received at 2015-08-19 10:23:12.981443: MOSDECSubOpRead(4.1a9s1 1319 ECSubRead(tid=604164, to_read={faca2da9/rbd_data.37e02ae8944a.00000000000180ae/head//4=0,2797408,0}, attrs_to_read=)) currently no flag points reached 2015-08-19 10:23:43.657894 osd.7 10.4.10.213:6808/22888 21347 : cluster [WRN] slow request 60.771125 seconds old, received at 2015-08-19 10:22:42.886162: MOSDECSubOpWrite(4.181s2 1319 ECSubWrite(tid=566649, reqid=osd.14.1299:731455, at_version=1319'5518, trim_to=1318'2500, trim_rollback_to=1319'5514)) currently started 2015-08-19 10:23:44.658111 osd.7 10.4.10.213:6808/22888 21348 : cluster [WRN] 13 slow requests, 2 included below; oldest blocked for > 68.274757 secs 2015-08-19 10:23:44.658118 osd.7 10.4.10.213:6808/22888 21349 : cluster [WRN] slow request 30.969646 seconds old, received at 2015-08-19 10:23:13.688399: osd_op(osd.10.1297:558590 rbd_data.37e02ae8944a.0000000000017b0e [copy-from ver 63188] 4.646dd0af ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently started 2015-08-19 10:23:44.658122 osd.7 10.4.10.213:6808/22888 21350 : cluster [WRN] slow request 30.969351 seconds old, received at 2015-08-19 10:23:13.688695: osd_op(osd.10.1297:558591 rbd_data.37e02ae8944a.0000000000017973 [copy-from ver 63177] 4.fd21a2af ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently started 2015-08-19 10:23:45.658546 osd.7 10.4.10.213:6808/22888 21351 : cluster [WRN] 13 slow requests, 1 included below; oldest blocked for > 69.275060 secs 2015-08-19 10:23:45.658554 osd.7 10.4.10.213:6808/22888 21352 : cluster [WRN] slow request 30.156082 seconds old, received at 2015-08-19 10:23:15.502267: MOSDECSubOpWrite(4.e1s2 1319 ECSubWrite(tid=614936, reqid=osd.16.1272:599072, at_version=1319'5510, trim_to=1318'2500, trim_rollback_to=1319'5508)) currently started 2015-08-19 10:23:46.658900 osd.7 10.4.10.213:6808/22888 21353 : cluster [WRN] 13 slow requests, 2 included below; oldest blocked for > 70.275453 secs 2015-08-19 10:23:46.658908 osd.7 10.4.10.213:6808/22888 21354 : cluster [WRN] slow request 30.776570 seconds old, received at 2015-08-19 10:23:15.882172: osd_op(osd.11.1299:705504 rbd_data.37e02ae8944a.0000000000018028 [copy-from ver 60995] 4.cf21fa96 ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently started 2015-08-19 10:23:46.658913 osd.7 10.4.10.213:6808/22888 21355 : cluster [WRN] slow request 60.273271 seconds old, received at 2015-08-19 10:22:46.385472: MOSDECSubOpWrite(4.5s3 1319 ECSubWrite(tid=563263, reqid=osd.11.1299:705468, at_version=1319'5546, trim_to=1318'2500, trim_rollback_to=1319'5539)) currently started 2015-08-19 10:23:49.659552 osd.7 10.4.10.213:6808/22888 21356 : cluster [WRN] 12 slow requests, 1 included below; oldest blocked for > 73.276082 secs 2015-08-19 10:23:49.659559 osd.7 10.4.10.213:6808/22888 21357 : cluster [WRN] slow request 30.075904 seconds old, received at 2015-08-19 10:23:19.583468: osd_op(osd.12.1298:558271 rbd_data.37e02ae8944a.0000000000017f03 [copy-from ver 62273] 4.aafb718f ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently commit_sent 2015-08-19 10:23:51.660604 osd.7 10.4.10.213:6808/22888 21358 : cluster [WRN] 11 slow requests, 1 included below; oldest blocked for > 65.274645 secs 2015-08-19 10:23:51.661052 osd.7 10.4.10.213:6808/22888 21359 : cluster [WRN] slow request 30.271682 seconds old, received at 2015-08-19 10:23:21.388434: osd_op(osd.12.1298:558278 rbd_data.37e02ae8944a.0000000000017b74 [copy-from ver 62260] 4.60d7e38f ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently commit_sent 2015-08-19 10:23:53.768995 osd.4 10.4.10.212:6806/31912 7899 : cluster [WRN] 2 slow requests, 1 included below; oldest blocked for > 60.660749 secs 2015-08-19 10:23:53.769507 osd.4 10.4.10.212:6806/31912 7900 : cluster [WRN] slow request 60.660749 seconds old, received at 2015-08-19 10:22:53.107765: osd_op(osd.16.1272:599047 rbd_data.37e02ae8944a.000000000001803d [copy-from ver 63185] 4.a327fba9 ondisk+write+ignore_overlay+enforce_snapc+known_if_redirected e1319) currently commit_sent

root@cf01:/var/log/ceph# ceph osd tree
ID  WEIGHT   TYPE NAME                 UP/DOWN REWEIGHT PRIMARY-AFFINITY
-15 10.00000 root sata
 -7  3.00000     disktype cf01sata
  0  1.00000         osd.0                  up  1.00000          1.00000
  1  1.00000         osd.1                  up  1.00000          1.00000
  2  1.00000         osd.2                  up  1.00000          1.00000
 -9  2.00000     disktype cf02sata
  3  1.00000         osd.3                  up  1.00000          1.00000
  4  1.00000         osd.4                  up  1.00000          1.00000
-11  3.00000     disktype cf03sata
  5  1.00000         osd.5                  up  1.00000          1.00000
  6  1.00000         osd.6                  up  1.00000          1.00000
  7  1.00000         osd.7                  up  1.00000          1.00000
-13  2.00000     disktype cf04sata
  8  1.00000         osd.8                  up  1.00000          1.00000
  9  1.00000         osd.9                  up  1.00000          1.00000
-14  8.00000 root ssd
 -6  2.00000     disktype cf01ssd
 10  1.00000         osd.10                 up  1.00000          1.00000
 11  1.00000         osd.11                 up  1.00000          1.00000
 -8  2.00000     disktype cf02ssd
 12  1.00000         osd.12                 up  1.00000          1.00000
 13  1.00000         osd.13                 up  1.00000          1.00000
-10  2.00000     disktype cf03ssd
 14  1.00000         osd.14                 up  1.00000          1.00000
 15  1.00000         osd.15                 up  1.00000          1.00000
-12  2.00000     disktype cf04ssd
 17  1.00000         osd.17                 up  1.00000          1.00000
 16  1.00000         osd.16                 up  1.00000          1.00000
 -1 18.00000 root default
 -2  5.00000     host cf01
 -6  2.00000         disktype cf01ssd
 10  1.00000             osd.10             up  1.00000          1.00000
 11  1.00000             osd.11             up  1.00000          1.00000
 -7  3.00000         disktype cf01sata
  0  1.00000             osd.0              up  1.00000          1.00000
  1  1.00000             osd.1              up  1.00000          1.00000
  2  1.00000             osd.2              up  1.00000          1.00000
 -3  4.00000     host cf02
 -8  2.00000         disktype cf02ssd
 12  1.00000             osd.12             up  1.00000          1.00000
 13  1.00000             osd.13             up  1.00000          1.00000
 -9  2.00000         disktype cf02sata
  3  1.00000             osd.3              up  1.00000          1.00000
  4  1.00000             osd.4              up  1.00000          1.00000
 -4  5.00000     host cf03
-10  2.00000         disktype cf03ssd
 14  1.00000             osd.14             up  1.00000          1.00000
 15  1.00000             osd.15             up  1.00000          1.00000
-11  3.00000         disktype cf03sata
  5  1.00000             osd.5              up  1.00000          1.00000
  6  1.00000             osd.6              up  1.00000          1.00000
  7  1.00000             osd.7              up  1.00000          1.00000
 -5  4.00000     host cf04
-12  2.00000         disktype cf04ssd
 17  1.00000             osd.17             up  1.00000          1.00000
 16  1.00000             osd.16             up  1.00000          1.00000
-13  2.00000         disktype cf04sata
  8  1.00000             osd.8              up  1.00000          1.00000
  9  1.00000             osd.9              up  1.00000          1.00000


Cheers,
J

--
Jacek Jarosiewicz
Administrator Systemów Informatycznych

----------------------------------------------------------------------------------------
SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
ul. Senatorska 13/15, 00-075 Warszawa
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego Rejestru Sądowego,
nr KRS 0000029537; kapitał zakładowy 42.756.000 zł
NIP: 957-05-49-503
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa

----------------------------------------------------------------------------------------
SUPERMEDIA ->   http://www.supermedia.pl
dostep do internetu - hosting - kolokacja - lacza - telefonia
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux