On Tue, Jul 28, 2020 at 11:39 AM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > On Tue, Jul 28, 2020 at 11:19 AM Johannes Naab > <johannes.naab@xxxxxxxxxxxxxxxx> wrote: > > > > On 2020-07-28 15:52, Jason Dillaman wrote: > > > On Tue, Jul 28, 2020 at 9:44 AM Johannes Naab > > > <johannes.naab@xxxxxxxxxxxxxxxx> wrote: > > >> > > >> On 2020-07-28 14:49, Jason Dillaman wrote: > > >>>> VM in libvirt with: > > >>>> <pre> > > >>>> <disk type='network' device='disk'> > > >>>> <driver name='qemu' type='raw' discard='unmap'/> > > >>>> <source protocol='rbd' name='pool/disk' index='4'> > > >>>> <!-- omitted --> > > >>>> </source> > > >>>> <iotune> > > >>>> <read_bytes_sec>209715200</read_bytes_sec> > > >>>> <write_bytes_sec>209715200</write_bytes_sec> > > >>>> <read_iops_sec>5000</read_iops_sec> > > >>>> <write_iops_sec>5000</write_iops_sec> > > >>>> <read_bytes_sec_max>314572800</read_bytes_sec_max> > > >>>> <write_bytes_sec_max>314572800</write_bytes_sec_max> > > >>>> <read_iops_sec_max>7500</read_iops_sec_max> > > >>>> <write_iops_sec_max>7500</write_iops_sec_max> > > >>>> <read_bytes_sec_max_length>60</read_bytes_sec_max_length> > > >>>> <write_bytes_sec_max_length>60</write_bytes_sec_max_length> > > >>>> <read_iops_sec_max_length>60</read_iops_sec_max_length> > > >>>> <write_iops_sec_max_length>60</write_iops_sec_max_length> > > >>>> </iotune> > > >>>> </disk> > > >>>> </pre> > > >>>> > > >>>> workload: > > >>>> <pre> > > >>>> fio --rw=write --name=test --size=10M > > >>>> timeout 30s fio --rw=write --name=test --size=20G > > >>>> timeout 3m fio --rw=write --name=test --size=20G --direct=1 > > >>>> timeout 1m fio --rw=randrw --name=test --size=20G --direct=1 > > >>>> timeout 10s fio --numjobs=8 --rw=randrw --name=test --size=1G --direct=1 > > >>>> # the backtraces are then observed while the following command is running > > >>>> fio --ioengine=libaio --iodepth=16 --numjobs=8 --rw=randrw --name=test --size=1G --direct=1 > > >>> > > >>> I'm not sure I understand this workload. Are you running these 6 "fio" > > >>> processes sequentially or concurrently? Does it only crash on that > > >>> last one? Do you have "exclusive-lock" enabled on the image since > > >>> "--numjobs 8" would cause lots of lock fighting if it was enabled. > > >> > > >> The workload is a virtual machine with the above libvirt device > > >> configuration. Within that virtual machine, the workload is run > > >> sequentially (as script crash.sh) on the xfs formatted device. > > >> > > >> I.e. librbd/ceph should only the one qemu process, which is then running > > >> the workload. > > >> > > >> Only the last fio invocation causes the problems. > > >> When skipping some (I did not test it exhaustively) of the fio > > >> invocations, the crash is no longer reliably triggered. > > > > > > Hmm, all those crash backtraces are in > > > "AioCompletion::complete_event_socket", but QEMU does not have any > > > code that utilizes the event socket notification system. AFAIK, only > > > the fio librbd engine has integrated with that callback system. > > > > > > > > > The host is an Ubuntu 20.04 with minor backports in libvirt (6.0.0-0ubuntu8.1) > > and qemu (1:4.2-3ubuntu6.3) for specific CPU IDs, and the ceph.com librbd1. > > > > > > Upon further testing, changing the libvirt device configuration to: > > > > > <driver name='qemu' type='raw' cache='none' io='native' discard='unmap'/> > > > > (adding cache='none' amd io='native'), did not yet resurface the crash. > > > > Based on my understanding, cache='writeback' and io='thread' are the > > defaults when not otherwise configured. However, I do not yet fully > > understand the dependencies between those options. > > The "io=native" vs "io=threads" is a no-op for all by file-based IO. "for all non file-based" > The "cache" setting will just configure the librbd in-memory cache > mode (disabled, write-through, or write-back). > > > Are the libvirt <driver cache='...'> and librbd caches distinct, or do > > they refer to the same cache > > (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008486.html)? > > > > > > >>> Are all the crashes seg faults? They all seem to hint that the > > >>> internal ImageCtx instance was destroyed somehow while there was still > > >>> in-flight IO. If the crashes appeared during the "timeout XYZ fio ..." > > >>> calls, I would think it's highly likely that "fio" is incorrectly > > >>> closing the RBD image while there was still in-flight IO via its > > >>> signal handler. > > >> > > >> They are all segfaults of the qemu process, captured on the host system. > > >> librbd should not see any image open/closing during the workload run > > >> within the VM. > > >> The `timeout` is used to approximate the initial (manual) workload > > >> generation, which caused a crash. > > >> > > > > > > > > > > > -- > Jason -- Jason _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx