On Thu, 16 Mar 2017, Brad Hubbard wrote: > On Thu, Mar 16, 2017 at 4:33 PM, nokia ceph <nokiacephusers@xxxxxxxxx> wrote: > > Hello Brad, > > > > I meant for this parameter bdev_aio_max_queue_depth , Sage suggested try > > diff values, 128,1024 , 4096 . So my doubt how this calculation happens? Is > > this related to memory? > > The bdev_aio_max_queue_depth parameter represents the nr_events > argument to the libaio io_setup function. > > int io_setup(unsigned nr_events, aio_context_t *ctx_idp); > > From the man page for io_setup: > > "The io_setup() system call creates an asynchronous I/O context > suitable for concurrently processing nr_events operations." > > The current theory we are working with is that io_submit is returning > EAGAIN because nr_events is too small at the default of 32. Therefore > we have suggested raising this value. There is no real calculation > involved in the values Sage is suggesting other than they are > *larger*. It's a matter of playing with the value to see if, and when, > the error messages go away. If we know a larger value reduces or > eradicates the error we can then turn our focus more to *why*. Longer > term this can assist us in setting a more reasonable default. One nuance is that small values are equivalent because the kernel apparently rounds up to a page size-aligned buffer full of struct iocb's (or whatever the kernel equivalent is). My guess is that we want a default that equates to 2 or 4 pages instead of 1 page. We do lots of testing of the same kernel in the lab; I'm adding an item to my list to look for these messages in our logs too. FWIW, as long as the retry is succeeding this is pretty harmless (we're basic just limiting the depth of the io queue at the kernel and device to some probably-reasoanble value). My curiousity here is somewhat academic. :) Thanks! sage > > > > > Thanks > > > > > > > > > > On Thu, Mar 16, 2017 at 11:53 AM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote: > >> > >> On Thu, Mar 16, 2017 at 4:15 PM, nokia ceph <nokiacephusers@xxxxxxxxx> > >> wrote: > >> > Hello, > >> > > >> > We are running latest kernel - 3.10.0-514.2.2.el7.x86_64 { RHEL 7.3 } > >> > > >> > Sure I will try to alter this directive - bdev_aio_max_queue_depth and > >> > will > >> > share our results. > >> > > >> > Could you please explain how this calculation happens? > >> > >> What calculation are you referring to? > >> > >> > Thanks > >> > > >> > > >> > On Wed, Mar 15, 2017 at 7:54 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > >> >> > >> >> On Wed, 15 Mar 2017, Brad Hubbard wrote: > >> >> > +ceph-devel > >> >> > > >> >> > On Wed, Mar 15, 2017 at 5:25 PM, nokia ceph > >> >> > <nokiacephusers@xxxxxxxxx> > >> >> > wrote: > >> >> > > Hello, > >> >> > > > >> >> > > We suspect these messages not only at the time of OSD creation. But > >> >> > > in > >> >> > > idle > >> >> > > conditions also. May I know what is the impact of these error? Can > >> >> > > we > >> >> > > safely > >> >> > > ignore this? Or is there any way/config to fix this problem > >> >> > > > >> >> > > Few occurrence for these events as follows:--- > >> >> > > > >> >> > > ==== > >> >> > > 2017-03-14 17:16:09.500370 7fedeba61700 4 rocksdb: (Original Log > >> >> > > Time > >> >> > > 2017/03/14-17:16:09.453130) [default] Level-0 commit table #60 > >> >> > > started > >> >> > > 2017-03-14 17:16:09.500374 7fedeba61700 4 rocksdb: (Original Log > >> >> > > Time > >> >> > > 2017/03/14-17:16:09.500273) [default] Level-0 commit table #60: > >> >> > > memtable #1 > >> >> > > done > >> >> > > 2017-03-14 17:16:09.500376 7fedeba61700 4 rocksdb: (Original Log > >> >> > > Time > >> >> > > 2017/03/14-17:16:09.500297) EVENT_LOG_v1 {"time_micros": > >> >> > > 1489511769500289, > >> >> > > "job": 17, "event": "flush_finished", "lsm_state": [2, 4, 6, 0, 0, > >> >> > > 0, > >> >> > > 0], > >> >> > > "immutable_memtables": 0} > >> >> > > 2017-03-14 17:16:09.500382 7fedeba61700 4 rocksdb: (Original Log > >> >> > > Time > >> >> > > 2017/03/14-17:16:09.500330) [default] Level summary: base level 1 > >> >> > > max > >> >> > > bytes > >> >> > > base 268435456 files[2 4 6 0 0 0 0] max score 0.76 > >> >> > > > >> >> > > 2017-03-14 17:16:09.500390 7fedeba61700 4 rocksdb: [JOB 17] Try to > >> >> > > delete > >> >> > > WAL files size 244090350, prev total WAL file size 247331500, > >> >> > > number > >> >> > > of live > >> >> > > WAL files 2. > >> >> > > > >> >> > > 2017-03-14 17:34:11.610513 7fedf3a71700 -1 > >> >> > > bdev(/var/lib/ceph/osd/ceph-73/block) aio_submit retries 6 > >> >> > > >> >> > These errors come from here. > >> >> > > >> >> > void KernelDevice::aio_submit(IOContext *ioc) > >> >> > { > >> >> > ... > >> >> > int r = aio_queue.submit(*cur, &retries); > >> >> > if (retries) > >> >> > derr << __func__ << " retries " << retries << dendl; > >> >> > > >> >> > The submit function is this one which calls libaio's io_submit > >> >> > function directly and increments retries if it receives EAGAIN. > >> >> > > >> >> > #if defined(HAVE_LIBAIO) > >> >> > int FS::aio_queue_t::submit(aio_t &aio, int *retries) > >> >> > { > >> >> > // 2^16 * 125us = ~8 seconds, so max sleep is ~16 seconds > >> >> > int attempts = 16; > >> >> > int delay = 125; > >> >> > iocb *piocb = &aio.iocb; > >> >> > while (true) { > >> >> > int r = io_submit(ctx, 1, &piocb); <-------------NOTE > >> >> > if (r < 0) { > >> >> > if (r == -EAGAIN && attempts-- > 0) { <-------------NOTE > >> >> > usleep(delay); > >> >> > delay *= 2; > >> >> > (*retries)++; > >> >> > continue; > >> >> > } > >> >> > return r; > >> >> > } > >> >> > assert(r == 1); > >> >> > break; > >> >> > } > >> >> > return 0; > >> >> > } > >> >> > > >> >> > > >> >> > From the man page. > >> >> > > >> >> > IO_SUBMIT(2) Linux Programmer's > >> >> > Manual IO_SUBMIT(2) > >> >> > > >> >> > NAME > >> >> > io_submit - submit asynchronous I/O blocks for processing > >> >> > ... > >> >> > RETURN VALUE > >> >> > On success, io_submit() returns the number of iocbs submitted > >> >> > (which may be 0 if nr is zero). For the failure > >> >> > return, see NOTES. > >> >> > > >> >> > ERRORS > >> >> > EAGAIN Insufficient resources are available to queue any > >> >> > iocbs. > >> >> > > >> >> > I suspect increasing bdev_aio_max_queue_depth may help here but some > >> >> > of the other devs may have more/better ideas. > >> >> > >> >> Yes--try increasing bdev_aio_max_queue_depth. It defaults to 32; try > >> >> changing it to 128, 1024, or 4096 and see if these errors go away. > >> >> > >> >> I've never been able to trigger this on my test boxes, but I put in the > >> >> warning to help ensure we pick a good default. > >> >> > >> >> What kernel version are you running? > >> >> > >> >> Thanks! > >> >> sage > >> > > >> > > >> > >> > >> > >> -- > >> Cheers, > >> Brad > > > > > > > > -- > Cheers, > Brad > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html