RE: Long tail latency due to journal aio io_submit takes long time to return

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Please check this PR:&nbsp;<a href="https://github.com/ceph/ceph/pull/5670"; target="_blank">https://github.com/ceph/ceph/pull/5670</a><br><br>Thanks.<br>Zhi Zhang (David)<br><br><br>----------------------------------------<br>&gt; Date: Tue, 25 Aug 2015 06:10:19 -0700<br>&gt; From: sage@xxxxxxxxxxxx<br>&gt; To: zhangz.david@xxxxxxxxxxx<br>&gt; CC: ceph-devel@xxxxxxxxxxxxxxx<br>&gt; Subject: Re: Long tail latency due to journal aio io_submit takes long time to return<br>&gt;<br>&gt; On Tue, 25 Aug 2015, Z Zhang wrote:<br>&gt; &gt; Hi Ceph-devel,<br>&gt; &gt;<br>&gt; &gt; Recently we find there are some (not much) long tail latency due to<br>&gt; &gt; journal aio io_submit takes long time to return. We added some logs in<br>&gt; &gt; the FileJouranl.cc and get following output as an example. AIO's<br>&gt; &gt; io_submit function takes ~400ms to return, occasionally even takes up to<br>&gt; &gt; 900ms.<br>&gt; &gt;<br>&gt; &gt; ------------------------------------------------------------<br>&gt; &gt; 2015-08-25 15:24:39.535206 7f22699c2700 20 journal write_thread_entry woke up<br>&gt; &gt; 2015-08-25 15:24:39.535212 7f22699c2700 10 journal room 10460594175 max_size 10484711424 pos 4786679808 header.start 4762566656 top 4096<br>&gt; &gt; 2015-08-25 15:24:39.535215 7f22699c2700 10 journal check_for_full at 4786679808 : 532480 &lt; 10460594175<br>&gt; &gt; 2015-08-25 15:24:39.535216 7f22699c2700 15 journal prepare_single_write 1 will write 4786679808 : seq 13075224 len 526537 -&gt; 532480 (head 40 pre_pad 4046 ebl 526537 post_pad 1817 tail 40) (ebl alignment 4086)<br>&gt; &gt; 2015-08-25 15:24:39.535322 7f22699c2700 20 journal prepare_multi_write queue_pos now 4787212288<br>&gt; &gt; 2015-08-25 15:24:39.535324 7f22699c2700 15 journal do_aio_write writing 4786679808~532480<br>&gt; &gt; 2015-08-25 15:24:39.535434 7f22699c2700 10 journal align_bl total memcopy: 532480<br>&gt; &gt; 2015-08-25 15:24:39.535439 7f22699c2700 20 journal write_aio_bl 4786679808~532480 seq 13075224<br>&gt; &gt; 2015-08-25 15:24:39.535442 7f22699c
2700 20 journal write_aio_bl .. 4786679808~532480 in 1<br>&gt; &gt; 2015-08-25 15:24:39.535444 7f22699c2700 5 journal io_submit starting...<br>&gt; &gt;<br>&gt; &gt; 2015-08-25 15:24:39.990372 7f22699c2700 5 journal io_submit return value: 1<br>&gt; &gt; 2015-08-25 15:24:39.990396 7f22699c2700 5 journal put_throttle finished 1 ops and 526537 bytes, now 12 ops and 6318444 bytes<br>&gt; &gt; 2015-08-25 15:24:39.990401 7f22691c1700 20 journal write_finish_thread_entry waiting for aio(s)<br>&gt; &gt; 2015-08-25 15:24:39.990406 7f22699c2700 20 journal write_thread_entry aio throttle: aio num 1 bytes 532480 ... exp 2 min_new 4 ... pending 6318444<br>&gt; &gt; 2015-08-25 15:24:39.990410 7f22699c2700 10 journal room 10460061695 max_size 10484711424 pos 4787212288 header.start 4762566656 top 4096<br>&gt; &gt; 2015-08-25 15:24:39.990413 7f22699c2700 10 journal check_for_full at 4787212288 : 532480 &lt; 10460061695<br>&gt; &gt; 2015-08-25 15:24:39.990415 7f22699c2700 15 journal prepare_single_write 1 will write 4787212288 : seq 13075225 len 526537 -&gt; 532480 (head 40 pre_pad 4046 ebl 526537 post_pad 1817 tail 40) (ebl alignment 4086)<br>&gt; &gt; ------------------------------------------------------------<br>&gt;<br>&gt; This is probably the kernel throttling us because the request queue is<br>&gt; full / device is busy / whatever.<br>&gt;<br>&gt; &gt; And write_aio_bl will hold aio_lock until it returns, so<br>&gt; &gt; write_finish_thread_entry can't process completed aio when io_submit is<br>&gt; &gt; blocking. We reduced the aio_lock's locking scope in<br>&gt; &gt; write_aio_bl, specifically not locking io_submit. It doesn't seem to<br>&gt; &gt; need lock on io_submit in journal because there is only 1<br>&gt; &gt; write_thread_entry and 1 write_finish_thread_entry.<br>&gt; &gt;<br>&gt; &gt; This might help those completed aio to get processed asap. But for the<br>&gt; &gt; one blocked in io_submit and the ones in the queue, they still need to<br>&gt; &gt; take more time.<br>&gt; &gt;<br>&gt; &gt; Any idea about this
?<br>&gt;<br>&gt; That sounds like the right fix! I don't think there is anything else to<br>&gt; be done.. if the kernel is throttling us it's generally for a good reason,<br>&gt; and there isn't anything else we can do.<br>&gt;<br>&gt; Do you have a patch?<br>&gt;<br>&gt; sage<br> 		 	   		  ?韬{.n?????%??檩??w?{.n????u朕?Ф?塄}?财??j:+v??????2??璀??摺?囤??z夸z罐?+?????w棹f




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux