Re: Fwd: aio stalls in recent BlueStore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 31 Jan 2017, Igor Fedotov wrote:
> The issue was introduced by commit:
> 
> c776710a8ec04a2fd810fc5a1899fa8868cbc80a
> 
> os/bluestore/BlueFS: increase size threshold before we flush
> 
> Mainly the bug applies to 'dummy' writes i.e. writes hat update object
> metadata only (enabled by bluestore_debug_omit_block_device_write=true )
> 
> See http://tracker.ceph.com/issues/18742
> 
> for more details.

That's super weird!  It looks to me like the small flush size is masking 
some underlying bug, since the IO submitted earlier is what takes >90s to 
complete.  A few thoughts:

1- We can verify the device is actually doing the IO by capturing a 
blktrace during the run...

2- We've had bugs in the past with subtle races or missed wakeups in the 
aio completion path.  This is my guess...  were there any other aios 
that were submitted and/or completed when the aio finally came back?  It's 
odd that it is always ~90s.

sage



> 
> Thanks,
> 
> Igor
> 
> 
> -------- Forwarded Message --------
> Subject:     aio stalls in recent BlueStore
> Date:     Mon, 30 Jan 2017 20:34:09 +0300
> From:     Igor Fedotov <ifedotov@xxxxxxxxxxxx>
> To:     ceph-devel <ceph-devel@xxxxxxxxxxxxxxx>
> 
> 
> Hi,
> 
> as I mentioned during BlueStore standup I'm observing write stalls when
> performing append-only test  case from FIO.
> 
> Here is a log snippet.
> 
> Full log is available at
> https://drive.google.com/file/d/0B-4q9QFReegLd1lNVTdzUXRxZ2M/view?usp=sharing
> 
> Look for "LONG AIO_WAIT!!!" to locate all the occurrences.
> 
> Master branch clone made at mid December doesn't have the issue.
> 
> 2017-01-30 17:08:01.563477 7f72497f2700 20
> bdev(./fio-bluestore/block.wal) aio_write 0x233a000~81000 (direct)
> 2017-01-30 17:08:01.563650 7f72497f2700 20
> bdev(./fio-bluestore/block.wal) aio_write rebuilding buffer to be aligned
> 2017-01-30 17:08:01.563660 7f72497f2700 20
> bdev(./fio-bluestore/block.wal) _aio_log_start 0x233a000~81000
> 2017-01-30 17:08:01.563666 7f72497f2700  5
> bdev(./fio-bluestore/block.wal) aio_write 0x233a000~81000 aio 0x7f72301b3940
> 2017-01-30 17:08:01.563669 7f72497f2700 20
> bdev(./fio-bluestore/block.wal) aio_submit ioc 0x7f7230091ee0 pending 1
> running 0
> 2017-01-30 17:08:01.563671 7f72497f2700 20
> bdev(./fio-bluestore/block.wal) aio_submit  aio 0x7f72301b3940 fd 13
> 0x233a000~81000
> 2017-01-30 17:08:01.563757 7f72497f2700 20 bluefs _flush_range h
> 0x7f72301dd090 pos now 0x1eba17c
> 
> 2017-01-30 17:08:01.563807 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 537 < min_flush_size 524288
> 2017-01-30 17:08:01.563831 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 1080 < min_flush_size 524288
> 2017-01-30 17:08:01.563851 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 1618 < min_flush_size 524288
> 2017-01-30 17:08:01.563874 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 2163 < min_flush_size 524288
> 2017-01-30 17:08:01.563896 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 2701 < min_flush_size 524288
> 2017-01-30 17:08:01.563923 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 3239 < min_flush_size 524288
> 2017-01-30 17:08:01.563947 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 3776 < min_flush_size 524288
> 2017-01-30 17:08:01.563971 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 4320 < min_flush_size 524288
> 2017-01-30 17:08:01.563995 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 4864 < min_flush_size 524288
> 2017-01-30 17:08:01.564026 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 5409 < min_flush_size 524288
> 2017-01-30 17:08:01.564043 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 5946 < min_flush_size 524288
> 2017-01-30 17:08:01.564069 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 6483 < min_flush_size 524288
> 2017-01-30 17:08:01.564097 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 7028 < min_flush_size 524288
> 2017-01-30 17:08:01.564120 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 7565 < min_flush_size 524288
> 2017-01-30 17:08:01.564141 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 8103 < min_flush_size 524288
> 2017-01-30 17:08:01.564163 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 8646 < min_flush_size 524288
> 2017-01-30 17:08:01.564180 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 9183 < min_flush_size 524288
> 2017-01-30 17:08:01.564204 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 9720 < min_flush_size 524288
> 2017-01-30 17:08:01.564225 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 10265 < min_flush_size 524288
> 2017-01-30 17:08:01.564253 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 10802 < min_flush_size 524288
> 2017-01-30 17:08:01.564275 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 11346 < min_flush_size 524288
> 2017-01-30 17:08:01.564297 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 11883 < min_flush_size 524288
> 2017-01-30 17:08:01.564331 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 12426 < min_flush_size 524288
> 2017-01-30 17:08:01.564357 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 12963 < min_flush_size 524288
> 2017-01-30 17:08:01.564379 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 13508 < min_flush_size 524288
> 2017-01-30 17:08:01.564408 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 14051 < min_flush_size 524288
> 2017-01-30 17:08:01.564431 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 14595 < min_flush_size 524288
> 2017-01-30 17:08:01.564499 7f72497f2700 10 bluefs get_usage bdev 0 free
> 160581025792 (149 GB) / 161061269504 (149 GB), used 0%
> 2017-01-30 17:08:01.564510 7f72497f2700 10 bluefs get_usage bdev 1 free
> 107338530816 (102366 MB) / 107373125632 (102398 MB), used 0%
> 2017-01-30 17:08:01.564513 7f72497f2700 10 bluefs get_usage bdev 2 free
> 15032385536 (14336 MB) / 15033425920 (14336 MB), used 0%
> 2017-01-30 17:08:01.564519 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 14618 < min_flush_size 524288
> 2017-01-30 17:08:01.564521 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> ignoring, length 14618 < min_flush_size 524288
> 2017-01-30 17:08:01.564524 7f72497f2700 10 bluefs _fsync 0x7f72301dd090
> file(ino 15 size 0xdbe066c mtime 2017-01-30 17:07:58.151831 bdev 0
> extents [0:0x500000+c400000,0:0xcd00000+1800000])
> 2017-01-30 17:08:01.564538 7f72497f2700 10 bluefs _flush 0x7f72301dd090
> 0x1eba17c~391a to file(ino 15 size 0xdbe066c mtime 2017-01-30
> 17:07:58.151831 bdev 0 extents [0:0x500000+c400000,0:0xcd00000+1800000])
> 2017-01-30 17:08:01.564543 7f72497f2700 10 bluefs _flush_range
> 0x7f72301dd090 pos 0x1eba17c 0x1eba17c~391a to file(ino 15 size
> 0xdbe066c mtime 2017-01-30 17:07:58.151831 bdev 0 extents
> [0:0x500000+c400000,0:0xcd00000+1800000])
> 2017-01-30 17:08:01.564548 7f72497f2700 20 bluefs _flush_range file now
> file(ino 15 size 0xdbe066c mtime 2017-01-30 17:07:58.151831 bdev 0
> extents [0:0x500000+c400000,0:0xcd00000+1800000])
> 2017-01-30 17:08:01.564555 7f72497f2700 20 bluefs _flush_range in
> 0:0x500000+c400000 x_off 0x1eba17c
> 2017-01-30 17:08:01.564558 7f72497f2700 20 bluefs _flush_range using
> partial tail 0x17c
> 2017-01-30 17:08:01.564560 7f72497f2700 20 bluefs _flush_range waiting
> for previous aio to complete
> 2017-01-30 17:08:01.564562 7f72497f2700 10 bdev aio_wait 0x7f7230091ee0
> waiting for 1 aios and/or 0 readers to complete
> 
> >>>>>90 seconds!
> 
> 2017-01-30 17:09:32.117785 7f725d108700 20
> bdev(./fio-bluestore/block.wal) _aio_log_finish 1 0x233a000~81000
> 2017-01-30 17:09:32.117819 7f725d108700 10
> bdev(./fio-bluestore/block.wal) _aio_thread finished aio 0x7f72301b3940
> r 528384 ioc 0x7f7230091ee0 with  aios left
> 2017-01-30 17:09:32.117886 7f72497f2700 20 bdev aio_wait 0x7f7230091ee0 done
> 2017-01-30 17:09:32.117904 7f72497f2700 -1 bdev LONG AIO_WAIT!!!
> 2017-01-30 17:09:32.117914 7f72497f2700 20 bdev aio_wait 0x7f7230083f60 done
> 2017-01-30 17:09:32.117949 7f72497f2700 20 bdev aio_wait 0x7f72301ae810 done
> 2017-01-30 17:09:32.118018 7f72497f2700 20 bluefs _flush_range caching
> tail of 0xa96 and padding block with 0x56a
> 2017-01-30 17:09:32.118043 7f72497f2700 20
> bdev(./fio-bluestore/block.wal) aio_write 0x23ba000~4000 (direct)
> 2017-01-30 17:09:32.118100 7f72497f2700 20
> bdev(./fio-bluestore/block.wal) aio_write rebuilding buffer to be aligned
> 2017-01-30 17:09:32.118109 7f72497f2700 20
> bdev(./fio-bluestore/block.wal) _aio_log_start 0x23ba000~4000
> 2017-01-30 17:09:32.118123 7f72497f2700  5
> bdev(./fio-bluestore/block.wal) aio_write 0x23ba000~4000 aio 0x7f723000d6d0
> 2017-01-30 17:09:32.118127 7f72497f2700 20
> bdev(./fio-bluestore/block.wal) aio_submit ioc 0x7f7230091ee0 pending 1
> running 0
> 2017-01-30 17:09:32.118133 7f72497f2700 20
> bdev(./fio-bluestore/block.wal) aio_submit  aio 0x7f723000d6d0 fd 13
> 0x23ba000~4000
> 2017-01-30 17:09:32.118205 7f72497f2700 20 bluefs _flush_range h
> 0x7f72301dd090 pos now 0x1ebda96
> 2017-01-30 17:09:32.118219 7f72497f2700 10 bluefs _claim_completed_aios
> got 2 aios
> 2017-01-30 17:09:32.118224 7f72497f2700 10 bluefs wait_for_aio
> 0x7f72301dd090
> 2017-01-30 17:09:32.118227 7f72497f2700 10 bdev aio_wait 0x7f7230091ee0
> waiting for 1 aios and/or 0 readers to complete
> 2017-01-30 17:09:32.118316 7f725d108700 20
> bdev(./fio-bluestore/block.wal) _aio_log_finish 1 0x23ba000~4000
> 2017-01-30 17:09:32.118332 7f725d108700 10
> bdev(./fio-bluestore/block.wal) _aio_thread finished aio 0x7f723000d6d0
> r 16384 ioc 0x7f7230091ee0 with  aios left
> 2017-01-30 17:09:32.118376 7f72497f2700 20 bdev aio_wait 0x7f7230091ee0 done
> 2017-01-30 17:09:32.118381 7f72497f2700 20 bdev aio_wait 0x7f7230083f60 done
> 2017-01-30 17:09:32.118402 7f72497f2700 20 bdev aio_wait 0x7f72301ae810 done
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux