On Wed, 15 Mar 2017, Brad Hubbard wrote: > +ceph-devel > > On Wed, Mar 15, 2017 at 5:25 PM, nokia ceph <nokiacephusers@xxxxxxxxx> wrote: > > Hello, > > > > We suspect these messages not only at the time of OSD creation. But in idle > > conditions also. May I know what is the impact of these error? Can we safely > > ignore this? Or is there any way/config to fix this problem > > > > Few occurrence for these events as follows:--- > > > > ==== > > 2017-03-14 17:16:09.500370 7fedeba61700 4 rocksdb: (Original Log Time > > 2017/03/14-17:16:09.453130) [default] Level-0 commit table #60 started > > 2017-03-14 17:16:09.500374 7fedeba61700 4 rocksdb: (Original Log Time > > 2017/03/14-17:16:09.500273) [default] Level-0 commit table #60: memtable #1 > > done > > 2017-03-14 17:16:09.500376 7fedeba61700 4 rocksdb: (Original Log Time > > 2017/03/14-17:16:09.500297) EVENT_LOG_v1 {"time_micros": 1489511769500289, > > "job": 17, "event": "flush_finished", "lsm_state": [2, 4, 6, 0, 0, 0, 0], > > "immutable_memtables": 0} > > 2017-03-14 17:16:09.500382 7fedeba61700 4 rocksdb: (Original Log Time > > 2017/03/14-17:16:09.500330) [default] Level summary: base level 1 max bytes > > base 268435456 files[2 4 6 0 0 0 0] max score 0.76 > > > > 2017-03-14 17:16:09.500390 7fedeba61700 4 rocksdb: [JOB 17] Try to delete > > WAL files size 244090350, prev total WAL file size 247331500, number of live > > WAL files 2. > > > > 2017-03-14 17:34:11.610513 7fedf3a71700 -1 > > bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 6 > > These errors come from here. > > void KernelDevice::aio_submit(IOContext *ioc) > { > ... > int r = aio_queue.submit(*cur, &retries); > if (retries) > derr << __func__ << " retries " << retries << dendl; > > The submit function is this one which calls libaio's io_submit > function directly and increments retries if it receives EAGAIN. > > #if defined(HAVE_LIBAIO) > int FS::aio_queue_t::submit(aio_t &aio, int *retries) > { > // 2^16 * 125us = ~8 seconds, so max sleep is ~16 seconds > int attempts = 16; > int delay = 125; > iocb *piocb = &aio.iocb; > while (true) { > int r = io_submit(ctx, 1, &piocb); <-------------NOTE > if (r < 0) { > if (r == -EAGAIN && attempts-- > 0) { <-------------NOTE > usleep(delay); > delay *= 2; > (*retries)++; > continue; > } > return r; > } > assert(r == 1); > break; > } > return 0; > } > > > From the man page. > > IO_SUBMIT(2) Linux Programmer's > Manual IO_SUBMIT(2) > > NAME > io_submit - submit asynchronous I/O blocks for processing > ... > RETURN VALUE > On success, io_submit() returns the number of iocbs submitted > (which may be 0 if nr is zero). For the failure > return, see NOTES. > > ERRORS > EAGAIN Insufficient resources are available to queue any iocbs. > > I suspect increasing bdev_aio_max_queue_depth may help here but some > of the other devs may have more/better ideas. Yes--try increasing bdev_aio_max_queue_depth. It defaults to 32; try changing it to 128, 1024, or 4096 and see if these errors go away. I've never been able to trigger this on my test boxes, but I put in the warning to help ensure we pick a good default. What kernel version are you running? Thanks! sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html