Re: Query regarding min_size.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



min_size will also block reads.  Just to add a +1 to what has been said, a write operation will always wait to ack until all osds for a PG have acked the write.  min_size has absolutely no affect on this.  min_size is calculated BEFORE the write or read is handled by any osds.  If there is not the appropriate min_size, then the read and write will block until there are.

On Wed, Jan 3, 2018 at 9:59 AM Ronny Aasen <ronny+ceph-users@xxxxxxxx> wrote:
On 03. jan. 2018 14:51, James Poole wrote:
> Hi all,
>
> Whilst on a training course recently I was told that 'min_size' had an
> affect on client write performance, in that it's the required number of
> copies before ceph reports back to the client that an object has been
> written therefore setting a 'min_size' of 0 would only require a write
> to be accepted by the journal before confirming it's been accepted.
>
> This is contrary to further reading elsewhere that the 'min_size' is the
> minimum number of copies required of an object to allow I/O and that
> 'size' is the parameter that would affect write speed i.e. desired
> number of replicas.
>
> Setting 'min_size' to 0 with a 'size' of 3 you would still have an
> effective 'min_size' of 2 from:
>
> https://raw.githubusercontent.com/ceph/ceph/master/doc/release-notes.rst
>
> "* Degraded mode (when there fewer than the desired number of replicas)
> is now more configurable on a per-pool basis, with the min_size
> parameter. By default, with min_size 0, this allows I/O to objects
> with N - floor(N/2) replicas, where N is the total number of
> expected copies. Argonaut behavior was equivalent to having min_size
> = 1, so I/O would always be possible if any completely up to date
> copy remained. min_size = 1 could result in lower overall
> availability in certain cases, such as flapping network partition"
>
> Which leads to the conclusion that changing 'min_size' has nothing to do
> with performance but is solely related to data integrity/resilience.
>
> Could someone confirm my assertion is correct?
>
> Many thanks
>
> James


you are correct that it is related to data integrity.


the writes to a osd filestore is allways acked internally when it have
hit the journal. unrelated to size/min_size.

in normal operation, all osd's must ack the write before the write is
acked to the client: iow all 3 (size 3) must ack. and min_size is not
relevant in any case.

min_size is only relevant when a pg is degraded while being remapped or
backfilled (or degraded because of no space to remap/backfill into)
because of a osd or node failure. in that case min_size specify how many
osd's must ack the write before the write is acked to the client.

since failure is most likely when disks are stressing (eg with rebuild),
reducing min_size is just asking for corruption and data loss.

kind regards
Ronny Aasen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux