Re: How does mclock work?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Frédéric,

Please see answers below.


> Could someone please explain how mclock works regarding reads and writes?
> Does mclock intervene on both read and write iops? Or only on reads or only
> on writes?
>

mClock schedules both read and write ops.


>
> And what type of underlying hardware performance is calculated and
> considered by mclock? Seems to be only write performance.
>

Random write performance is considered for setting the maximum IOPS
capacity of an OSD. This along with the sequential bandwidth
capability of the OSD is used to calculate the cost per IO that is
internally used by mClock for scheduling Ops. In addition, the mClock
profiles use the capacity information to allocate reservation and limit for
different classes of service (for e.g., client, background-recovery,
scrub, snaptrim etc.).

The write performance is used to set a lower bound on the amount of
bandwidth to be allocated for different classes of services. For e.g.,
the 'balanced' profile allocates 50% of the OSD's IOPS capacity to cllent
ops. In other words, a minimum guarantee of 50% of the OSD's
bandwidth is allocated to client ops (read or write). If you look at the
'balanced' profile, there is no upper limit set for client ops (i.e. set to
MAX) which means that reads can potentially use the maximum possible
bandwidth (i.e., not contrained by max IOPS capacity) if there
are no other competing ops.

Please see
https://docs.ceph.com/en/reef/rados/configuration/mclock-config-ref/#built-in-profiles
for more information about mClock profiles.


> The mclock documentation shows HDDs and SSDs specific configuration
> options (capacity and sequential bandwidth) but nothing regarding hybrid
> setups and these configuration options do not distinguish reads and writes.
> But read and write performance are often not in par for a single drive and
> even less when using hybrid setups.
>
> With hybrid setups (RocksDB+WAL on SSDs or NVMes and Data on HDD), if
> mclock only considers write performance, it may fail to properly schedule
> read iops (does mclock schedule read iops?) as the calculated iops capacity
> would be way too high for reads.
>
> With HDD only setups (RocksDB+WAL+Data on HDD), if mclock only considers
> write performance, the OSD may not take advantage of higher read
> performance.
>
> Can someone please shed some light on this?
>

As mentioned above, as long as there are no competing ops, the mClock
profiles ensure that there is nothing constraining client
ops from using the full available bandwidth of an OSD for both reads and
writes regardless of the type of setup (hybrid, HDD,
SSD) being employed. The important aspect is to ensure that the set IOPS
capacity for the OSD reflects a fairly accurate
representation of the underlying device capability. This is because the
reservation criteria based the IOPS capacity helps
maintain an acceptable level of performance with other active competing ops.

You could run some synthetic benchmarks to ensure that read and write
performance are along expected lines with the
default mClock profile to confirm the above.

I hope this helps.

-Sridhar
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux