Re: Log message --> "bdev(/var/lib/ceph/osd/ceph-x/block) aio_submit retries"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 15 Mar 2017, Brad Hubbard wrote:
> +ceph-devel
> 
> On Wed, Mar 15, 2017 at 5:25 PM, nokia ceph <nokiacephusers@xxxxxxxxx> wrote:
> > Hello,
> >
> > We suspect these messages not only at the time of OSD creation. But in idle
> > conditions also. May I know what is the impact of these error? Can we safely
> > ignore this? Or is there any way/config to fix this problem
> >
> > Few occurrence for these events as follows:---
> >
> > ====
> > 2017-03-14 17:16:09.500370 7fedeba61700  4 rocksdb: (Original Log Time
> > 2017/03/14-17:16:09.453130) [default] Level-0 commit table #60 started
> > 2017-03-14 17:16:09.500374 7fedeba61700  4 rocksdb: (Original Log Time
> > 2017/03/14-17:16:09.500273) [default] Level-0 commit table #60: memtable #1
> > done
> > 2017-03-14 17:16:09.500376 7fedeba61700  4 rocksdb: (Original Log Time
> > 2017/03/14-17:16:09.500297) EVENT_LOG_v1 {"time_micros": 1489511769500289,
> > "job": 17, "event": "flush_finished", "lsm_state": [2, 4, 6, 0, 0, 0, 0],
> > "immutable_memtables": 0}
> > 2017-03-14 17:16:09.500382 7fedeba61700  4 rocksdb: (Original Log Time
> > 2017/03/14-17:16:09.500330) [default] Level summary: base level 1 max bytes
> > base 268435456 files[2 4 6 0 0 0 0] max score 0.76
> >
> > 2017-03-14 17:16:09.500390 7fedeba61700  4 rocksdb: [JOB 17] Try to delete
> > WAL files size 244090350, prev total WAL file size 247331500, number of live
> > WAL files 2.
> >
> > 2017-03-14 17:34:11.610513 7fedf3a71700 -1
> > bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 6
> 
> These errors come from here.
> 
> void KernelDevice::aio_submit(IOContext *ioc)
> {
> ...
>     int r = aio_queue.submit(*cur, &retries);
>     if (retries)
>       derr << __func__ << " retries " << retries << dendl;
> 
> The submit function is this one which calls libaio's io_submit
> function directly and increments retries if it receives EAGAIN.
> 
> #if defined(HAVE_LIBAIO)
> int FS::aio_queue_t::submit(aio_t &aio, int *retries)
> {
>   // 2^16 * 125us = ~8 seconds, so max sleep is ~16 seconds
>   int attempts = 16;
>   int delay = 125;
>   iocb *piocb = &aio.iocb;
>   while (true) {
>     int r = io_submit(ctx, 1, &piocb);     <-------------NOTE
>     if (r < 0) {
>       if (r == -EAGAIN && attempts-- > 0) {     <-------------NOTE
>         usleep(delay);
>         delay *= 2;
>         (*retries)++;
>         continue;
>       }
>       return r;
>     }
>     assert(r == 1);
>     break;
>   }
>   return 0;
> }
> 
> 
> From the man page.
> 
> IO_SUBMIT(2)                                   Linux Programmer's
> Manual                                  IO_SUBMIT(2)
> 
> NAME
>        io_submit - submit asynchronous I/O blocks for processing
> ...
> RETURN VALUE
>        On success, io_submit() returns the number of iocbs submitted
> (which may be 0 if nr is zero).  For the  failure
>        return, see NOTES.
> 
> ERRORS
>        EAGAIN Insufficient resources are available to queue any iocbs.
> 
> I suspect increasing bdev_aio_max_queue_depth may help here but some
> of the other devs may have more/better ideas.

Yes--try increasing bdev_aio_max_queue_depth.  It defaults to 32; try 
changing it to 128, 1024, or 4096 and see if these errors go away.

I've never been able to trigger this on my test boxes, but I put in the 
warning to help ensure we pick a good default.

What kernel version are you running?

Thanks!
sage
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux