raw rados listing with seek to marker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



rgw uses raw rados listing when doing certain operations. For example
when listing metadata entries (these aren't being indexed). When doing
a full metadata sync the rgw at the secondary zone sends a request to
the master to get a list of all the metadata entries (of a specific
section), so that it can build an internal index for these entries so
that it can fetch them.
One of the main issues here is that since the metadata listing relies
on raw rados objects listing, the operation cannot be paged.
I've looked at the nobjects listing code, and this is what I
understand happening:
 - we iterate through all the pgs in the pool, one by one and in order
 - for each pg we're at we read a chunk of the next X (where X 1024)
entries via rados pgls operation
 - for each chunk rados returns a cookie, which is some kind of a
descriptor that points at either the current, or the next chunk that
can be used to fetch the next chunk

rados itself with the nobjects api provides some seek mechanism, but
it's pretty rudimentary and only seeks by rounding down to the current
pg. I was looking at introducing a marker for nobjects listing (see:
https://github.com/yehudasa/ceph/commits/wip-18079), but there are a
few points I'm not completely sure about:
 - I was using cookie and current_pg as the marker, but that only
points to the current chunk that was just read (but not necessarily
consumed completely). Is there a way to generate a cookie that would
point at any entry within the current chunk?
 - Is entries order guaranteed within the chunk? E.g., if I know that
the last cookie was C, and the last object we saw was O, can we
request C again, and skip to object O and not miss any entries created
before original operation started? (it is ok to miss on entries
created after original operation started, as we're going to get these
via separate log)
 - is there any other field that we need to keep for the marker other
than these two?
 - what to do with the old object listing api? the code internally
defaults to using it, so if implementing seek only for nobjects it
won't work by default. I'm not sure we want to implement it for the
legacy listing infrastructure.

Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux