Thanks Vitaliy Posting here for the archives or if anyone else sees the same problem it might save them some work. After going through the code and logs (debug bluestore 20/5) it actually looks like the write-small-pre-read counter increases every time the WAL gets appended to (it reads the previous entries, but these should be in the oNode cache), and writes to the WAL get counted as deferred_write_ops regardless of bluestore_prefer_deferred_size - because they are not really data blocks. Further more, I think I have discovered why there are an astronomical number of reads when numjobs >= 2 in FIO. Each job in FIO is in effect another client, and even though I have RBD mirroring disabled, each client is still generating I/O operations to rados journal objects, for example: do_op osd_op(client.274191.0:25 38.4 38:22bdbd0a:::journal_data.38.3e91a1f693ff7.278:head [read 0~33554432 [fadvise_dontneed]] snapc 0=[] ondisk+read+known_if_redirected e529) v8 may_read -> read-ordered flags ondisk+read+known_if_redirected It looks as though each job is having to read in the other jobs journals, causing a huge amount of read I/O when this is not required. The pool does not have RBD mirroring enabled, but the image feature does (rbd_default_features = 125 in client section). After changing the image features there are no more high reads. Thank you for your tips and pointers everyone! Regards -- Brad. On Wed, 5 Feb 2020 at 12:44, <vitalif@xxxxxxxxxx> wrote: > Hi, > > This helped to disable deferred writes in my case: > > bluestore_min_alloc_size=4096 > bluestore_prefer_deferred_size=0 > bluestore_prefer_deferred_size_ssd=0 > > If you already deployed your OSDs with min_alloc_size=4K then you don't > need to redeploy them again. > > > Hi Vitality, > > > > I completely destroyed the test cluster and re-deployed it after > > changing these settings but it did not make a difference - there are > > still a high number of deferred writes. > > > > Regards > > -- > > Brad. > > > > On Wed, 5 Feb 2020 at 10:55, <vitalif@xxxxxxxxxx> wrote: > > > >> min_alloc_size can't be changed after formatting an OSD, and yes, > >> bluestore defers all writes that are < min_alloc_size. And default > >> min_alloc_size_ssd is 16KB. > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx