Re: Understanding Bluestore performance characteristics

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Vitaliy

Posting here for the archives or if anyone else sees the same problem it
might save them some work.

After going through the code and logs (debug bluestore 20/5) it actually
looks like the write-small-pre-read counter increases every time the WAL
gets appended to (it reads the previous entries, but these should be in the
oNode cache), and writes to the WAL get counted as deferred_write_ops
regardless of bluestore_prefer_deferred_size - because they are not really
data blocks.

Further more, I think I have discovered why there are an astronomical
number of reads when numjobs >= 2 in FIO.

Each job in FIO is in effect another client, and even though I have RBD
mirroring disabled, each client is still generating I/O operations to rados
journal objects, for example:

do_op osd_op(client.274191.0:25 38.4
38:22bdbd0a:::journal_data.38.3e91a1f693ff7.278:head [read 0~33554432
[fadvise_dontneed]] snapc 0=[] ondisk+read+known_if_redirected e529) v8
may_read -> read-ordered flags ondisk+read+known_if_redirected

It looks as though each job is having to read in the other jobs journals,
causing a huge amount of read I/O when this is not required. The pool does
not have RBD mirroring enabled, but the image feature does
(rbd_default_features = 125 in client section). After changing the image
features there are no more high reads.

Thank you for your tips and pointers everyone!

Regards
--
Brad.


On Wed, 5 Feb 2020 at 12:44, <vitalif@xxxxxxxxxx> wrote:

> Hi,
>
> This helped to disable deferred writes in my case:
>
> bluestore_min_alloc_size=4096
> bluestore_prefer_deferred_size=0
> bluestore_prefer_deferred_size_ssd=0
>
> If you already deployed your OSDs with min_alloc_size=4K then you don't
> need to redeploy them again.
>
> > Hi Vitality,
> >
> > I completely destroyed the test cluster and re-deployed it after
> > changing these settings but it did not make a difference - there are
> > still a high number of deferred writes.
> >
> > Regards
> > --
> > Brad.
> >
> > On Wed, 5 Feb 2020 at 10:55, <vitalif@xxxxxxxxxx> wrote:
> >
> >> min_alloc_size can't be changed after formatting an OSD, and yes,
> >> bluestore defers all writes that are < min_alloc_size. And default
> >> min_alloc_size_ssd is 16KB.
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux