Hello cephers,
I've been testing flashcache and enhanceio block device caching for the osds and i've noticed i have started getting the slow requests. The caching type that I use is ready only, so all writes bypass the caching ssds and go directly to osds, just like what it used to be before introducing the caching layer. Prior to introducing caching, i rarely had the slow requests. Judging by the logs, all slow requests are looking like these:
2014-10-16 01:09:15.600807 osd.7 192.168.168.200:6836/32031 100 : [WRN] slow request 30.999641 seconds old, received at 2014-10-16 01:08:44.601040: osd_op(client.36035566.0:16626375 rbd_data.51da686763845e
.0000000000005a15 [set-alloc-hint object_size 4194304 write_size 4194304,write 2007040~16384] 5.7b16421b snapc c4=[c4] ack+ondisk+write e61892) v4 currently waiting for subops from 9
2014-10-16 01:09:15.600811 osd.7 192.168.168.200:6836/32031 101 : [WRN] slow request 30.999581 seconds old, received at 2014-10-16 01:08:44.601100: osd_op(client.36035566.0:16626376 rbd_data.51da686763845e
.0000000000005a15 [set-alloc-hint object_size 4194304 write_size 4194304,write 2039808~16384] 5.7b16421b snapc c4=[c4] ack+ondisk+write e61892) v4 currently waiting for subops from 9
2014-10-16 01:09:16.185530 osd.2 192.168.168.200:6811/31891 76 : [WRN] 20 slow requests, 1 included below; oldest blocked for > 57.003961 secs
2014-10-16 01:09:16.185564 osd.2 192.168.168.200:6811/31891 77 : [WRN] slow request 30.098574 seconds old, received at 2014-10-16 01:08:46.086854: osd_op(client.38917806.0:3481697 rbd_data.251d05e3db45a54.
0000000000000304 [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 2732032~8192] 5.e4683bbb ack+ondisk+write e61892) v4 currently waiting for subops from 11
2014-10-16 01:09:16.601020 osd.7 192.168.168.200:6836/32031 102 : [WRN] 16 slow requests, 2 included below; oldest blocked for > 43.531516 secs
In general, I see between 0 and about 2,000 slow request log entries per day. On one day I saw over 100k entries, but it only happened once.
I am struggling to understand what is casing the slow requests? If all the writes go the same path as before caching was introduced, how come I am getting them? How can I investigate this further?
Thanks
Andrei
I've been testing flashcache and enhanceio block device caching for the osds and i've noticed i have started getting the slow requests. The caching type that I use is ready only, so all writes bypass the caching ssds and go directly to osds, just like what it used to be before introducing the caching layer. Prior to introducing caching, i rarely had the slow requests. Judging by the logs, all slow requests are looking like these:
2014-10-16 01:09:15.600807 osd.7 192.168.168.200:6836/32031 100 : [WRN] slow request 30.999641 seconds old, received at 2014-10-16 01:08:44.601040: osd_op(client.36035566.0:16626375 rbd_data.51da686763845e
.0000000000005a15 [set-alloc-hint object_size 4194304 write_size 4194304,write 2007040~16384] 5.7b16421b snapc c4=[c4] ack+ondisk+write e61892) v4 currently waiting for subops from 9
2014-10-16 01:09:15.600811 osd.7 192.168.168.200:6836/32031 101 : [WRN] slow request 30.999581 seconds old, received at 2014-10-16 01:08:44.601100: osd_op(client.36035566.0:16626376 rbd_data.51da686763845e
.0000000000005a15 [set-alloc-hint object_size 4194304 write_size 4194304,write 2039808~16384] 5.7b16421b snapc c4=[c4] ack+ondisk+write e61892) v4 currently waiting for subops from 9
2014-10-16 01:09:16.185530 osd.2 192.168.168.200:6811/31891 76 : [WRN] 20 slow requests, 1 included below; oldest blocked for > 57.003961 secs
2014-10-16 01:09:16.185564 osd.2 192.168.168.200:6811/31891 77 : [WRN] slow request 30.098574 seconds old, received at 2014-10-16 01:08:46.086854: osd_op(client.38917806.0:3481697 rbd_data.251d05e3db45a54.
0000000000000304 [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 2732032~8192] 5.e4683bbb ack+ondisk+write e61892) v4 currently waiting for subops from 11
2014-10-16 01:09:16.601020 osd.7 192.168.168.200:6836/32031 102 : [WRN] 16 slow requests, 2 included below; oldest blocked for > 43.531516 secs
In general, I see between 0 and about 2,000 slow request log entries per day. On one day I saw over 100k entries, but it only happened once.
I am struggling to understand what is casing the slow requests? If all the writes go the same path as before caching was introduced, how come I am getting them? How can I investigate this further?
Thanks
Andrei
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com