Re: OSD->ReplicaOSD->OSD questions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 3, 2015 at 1:30 AM, Lakis, Jacek <jacek.lakis@xxxxxxxxx> wrote:
> Hi cephers!
> I got two questions about "client->osd->replica osd->osd->client" path that appears during my deep dive into this part.
>         1. eval_repop() is called twice [in C_OSD_RepopCommit and C_OSD_RepopApplied context finish] in primary OSD, after receiving MOSDOpRepopReply message from replica OSD. It's called twice with different flags (ondisk, onack) and sends reply to the client two times. Do the client really need to receive two replies, why? Maybe single reply after operation is applied and committed is enough?

In common deployments the client is actually only getting sent the
ondisk response, but yes, we need to maintain both of those paths.
It's part of the protocol that if the data is made readable before
it's committed, we tell the client that it's happened.

>         2. MOSDOpRepopReply, caught by Pipe::reader() need to go through all the dispatching->enq->deq->shards->workers path just to call finish() in contexts mentioned before. Since the number of checks for this kind of message is smaller than for the OSD ops, maybe it's good to consider another, faster way to execute it, e.g. another simple queue with single thread consuming and executing it, without whole enqueueing-dequeueing-shards stuff? Ordering and PrioritizedQueue features are really important for this kind of message?

That's an interesting question. Off the top of my head, maybe these
are important. The priority stuff probably isn't, but we do need to
maintain ordering within each PG — and I'm not sure if we can easily
identify which messages are "just" client data requests versus more
complicated things like returning data?
It's really a question of which particular pieces of the system we
could skip over, and whether those specific ones are worth the time
investment of doing so. I tend to assume it's not worth the effort —
the edge case handling would be hard to replace separately (eg, what
happens when we get a reply for a PG which we no longer have?).
-Greg

>
> Thank you.
>
> Best regards,
> JJ
>
> --------------------------------------------------------------------
>
> Intel Technology Poland sp. z o.o.
> ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
>
> Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
> przegladanie lub rozpowszechnianie jest zabronione.
> This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
> others is strictly prohibited.
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux