Re: IO to OSD with librados

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Mon, 18 Jun 2018 09:22:14 +0200

Hi,

One way you can see exactly what is happening when you write an object
is with --debug_ms=1.

For example, I write a 100MB object to a test pool:  rados
--debug_ms=1 -p test put 100M.dat 100M.dat
I pasted the output of this here: https://pastebin.com/Zg8rjaTV
In this case, it first gets the cluster maps from a mon, then writes
the object to osd.58, which is the primary osd for PG 119.77:

# ceph pg 119.77 query | jq .up
[
  58,
  49,
  31
]

Otherwise I answered your questions below...

On Sun, Jun 17, 2018 at 8:29 PM Jialin Liu <jalnliu@xxxxxxx> wrote:
>
> Hello,
>
> I have a couple questions regarding the IO on OSD via librados.
>
>
> 1. How to check which osd is receiving data?
>

See `ceph osd map`.
For my example above:

# ceph osd map test 100M.dat
osdmap e236396 pool 'test' (119) object '100M.dat' -> pg 119.864b0b77
(119.77) -> up ([58,49,31], p58) acting ([58,49,31], p58)

> 2. Can the write operation return immediately to the application once the write to the primary OSD is done? or does it return only when the data is replicated twice? (size=3)

Write returns once it is safe on *all* replicas or EC chunks.

> 3. What is the I/O size in the lower level in librados, e.g., if I send a 100MB request with 1 thread, does librados send the data by a fixed transaction size?

This depends on the client. The `rados` CLI example I showed you broke
the 100MB object into 4MB parts.
Most use-cases keep the objects around 4MB or 8MB.

> 4. I have 4 OSS, 48 OSDs, will the 4 OSS become the bottleneck? from the ceph documentation, once the cluster map is received by the client, the client can talk to OSD directly, so the assumption is the max parallelism depends on the number of OSDs, is this correct?
>

That's more or less correct -- the IOPS and BW capacity of the cluster
generally scales linearly with number of OSDs.

Cheers,
Dan
CERN
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com