Re: Lot of blocked operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I think I found the problem : a way too large journal.
I catch this from logs of an OSD having blocked queries :

OSD.15 :

2015-09-19 00:41:12.717062 7fb8a3c44700  1 journal check_for_full at 3548528640 : JOURNAL FULL 3548528640 >= 1376255 (max_size 4294967296 start 3549904896)
2015-09-19 00:41:43.124590 7fb8a6181700  0 log [WRN] : 6 slow requests, 6 included below; oldest blocked for > 30.405719 secs
2015-09-19 00:41:43.124596 7fb8a6181700  0 log [WRN] : slow request 30.405719 seconds old, received at 2015-09-19 00:41:12.718829: osd_op(client.31621623.1:5392489797 rb.0.1b844d6.238e1f29.0000000004d3 [write 0~4096] 6.3aed306f snapc 11114=[11114,11096,11018] ondisk+write e847952) v4 currently waiting for subops from 19
2015-09-19 00:41:43.124599 7fb8a6181700  0 log [WRN] : slow request 30.172735 seconds old, received at 2015-09-19 00:41:12.951813: osd_op(client.31435077.1:8423014905 rb.0.1c39394.238e1f29.00000000037a [write 1499136~8192] 6.2ffed26e snapc 11118=[11118,1109a,1101c] ondisk+write e847952) v4 currently waiting for subops from 28
2015-09-19 00:41:43.124602 7fb8a6181700  0 log [WRN] : slow request 30.172703 seconds old, received at 2015-09-19 00:41:12.951845: osd_op(client.31435077.1:8423014906 rb.0.1c39394.238e1f29.00000000037a [write 1523712~8192] 6.2ffed26e snapc 11118=[11118,1109a,1101c] ondisk+write e847952) v4 currently waiting for subops from 28
2015-09-19 00:41:43.124604 7fb8a6181700  0 log [WRN] : slow request 30.172576 seconds old, received at 2015-09-19 00:41:12.951972: osd_op(client.31435077.1:8423014907 rb.0.1c39394.238e1f29.00000000037a [write 1515520~8192] 6.2ffed26e snapc 11118=[11118,1109a,1101c] ondisk+write e847952) v4 currently waiting for subops from 28
2015-09-19 00:41:43.124606 7fb8a6181700  0 log [WRN] : slow request 30.172546 seconds old, received at 2015-09-19 00:41:12.952002: osd_op(client.31435077.1:8423014909 rb.0.1c39394.238e1f29.00000000037a [write 1531904~8192] 6.2ffed26e snapc 11118=[11118,1109a,1101c] ondisk+write e847952) v4 currently waiting for subops from 28

and at same time on OSD.19 :

2015-09-19 00:41:19.549508 7f55973c0700  0 -- 192.168.42.22:6806/28596 >> 192.168.42.16:6828/38905 pipe(0x230f0000 sd=358 :6806 s=2 pgs=14268 cs=3 l=0 c=0x6d9cb00).fault with nothing to send, going to standby
2015-09-19 00:41:43.246421 7f55ba277700  0 log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 30.253274 secs
2015-09-19 00:41:43.246428 7f55ba277700  0 log [WRN] : slow request 30.253274 seconds old, received at 2015-09-19 00:41:12.993123: osd_op(client.31626115.1:4664205553 rb.0.1c918ad.238e1f29.000000002da9 [write 3063808~16384] 6.604ba242 snapc 10aaf=[10aaf,10a31,109b3] ondisk+write e847952) v4 currently waiting for subops from 15
2015-09-19 00:42:13.251591 7f55ba277700  0 log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 60.258446 secs
2015-09-19 00:42:13.251596 7f55ba277700  0 log [WRN] : slow request 60.258446 seconds old, received at 2015-09-19 00:41:12.993123: osd_op(client.31626115.1:4664205553 rb.0.1c918ad.238e1f29.000000002da9 [write 3063808~16384] 6.604ba242 snapc 10aaf=[10aaf,10a31,109b3] ondisk+write e847952) v4 currently waiting for subops from 15

So the blocking seem to be the "JOURNAL FULL" event, with big numbers. 
3548528640, is the journal size ?
I just reduced the filestore_max_sync_interval to 30s, and everything
seems to work fine.

For SSD OSD, with journal on same device, big journal is a crazy
thing... I suppose I break this setup when trying to tune the journal
for the HDD pool.

At same time, is there tips tuning journal in case of HDD OSD, with
(potentially big) SSD journal, and hardware RAID card which handle
write back ?

Thanks for your help.

Olivier


Le vendredi 18 septembre 2015 à 02:35 +0200, Olivier Bonvalet a écrit :
> Hi,
> 
> I have a cluster with lot of blocked operations each time I try to
> move
> data (by reweighting a little an OSD).
> 
> It's a full SSD cluster, with 10GbE network.
> 
> In logs, when I have blocked OSD, on the main OSD I can see that :
> 2015-09-18 01:55:16.981396 7f89e8cb8700  0 log [WRN] : 2 slow
> requests, 1 included below; oldest blocked for > 33.976680 secs
> 2015-09-18 01:55:16.981402 7f89e8cb8700  0 log [WRN] : slow request
> 30.125556 seconds old, received at 2015-09-18 01:54:46.855821:
> osd_op(client.29760717.1:18680817544
> rb.0.1c16005.238e1f29.00000000027f [write 180224~16384] 6.c11916a4
> snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4 currently
> reached pg
> 2015-09-18 01:55:46.986319 7f89e8cb8700  0 log [WRN] : 2 slow
> requests, 1 included below; oldest blocked for > 63.981596 secs
> 2015-09-18 01:55:46.986324 7f89e8cb8700  0 log [WRN] : slow request
> 60.130472 seconds old, received at 2015-09-18 01:54:46.855821:
> osd_op(client.29760717.1:18680817544
> rb.0.1c16005.238e1f29.00000000027f [write 180224~16384] 6.c11916a4
> snapc 11065=[11065,10fe7,10f69] ondisk+write e845819) v4 currently
> reached pg
> 
> How should I read that ? What this OSD is waiting for ?
> 
> Thanks for any help,
> 
> Olivier
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux