Re: [PATCH 1/3] libceph: call r_unsafe_callback when unsafe reply is received

"Yan, Zheng" <ukernel@xxxxxxxxx> · Tue, 2 Jul 2013 22:27:46 +0800

On Tue, Jul 2, 2013 at 9:07 PM, Alex Elder <alex.elder@xxxxxxxxxx> wrote:
> On 06/24/2013 01:41 AM, Yan, Zheng wrote:
>> From: "Yan, Zheng" <zheng.z.yan@xxxxxxxxx>
>
> Sorry it took so long, I intended to take a look at this
> for you sooner.
>
> I would also like to thank you for this nice clear
> description.  It made it very easy to understand
> why you were proposing the change, and to focus in
> on exactly which parts of the design it's affecting.
>
>> We can't use !req->r_sent to check if OSD request is sent for the
>> first time, this is because __cancel_request() zeros req->r_sent
>> when OSD map changes. Rather than adding a new variable to struct
>
> You're right.
>
>> ceph_osd_request to indicate if it's sent for the first time, We
>> can call the unsafe callback only when unsafe OSD reply is received.
>> If OSD's first reply is safe, just skip calling the unsafe callback.
>
> This seems reasonable, but it's different from the way I
> thought about what constituted "unsafe."  But I may be
> wrong, and the way this is used by the file system might
> do something that addresses my concern.
>
> The way I interpreted "unsafe" was simply that it was possible
> a write *could* have been made persistent, even if the client
> doesn't know about it.  A request could have made it to its
> target osd, been written, and the response could be in flight
> at the point something (maybe a router?) crashes and the response
> gets lost.  During that time window, the stored data may not be
> in a state that's consistent with the client's view of it.
>
> So I thought of "unsafe" as meaning that a write is in flight,
> and until we get a successful response, the storage might
> contain the old data or it might contain the new data; the
> client has no way of knowing which.
>
> With that interpretation, a request becomes unsafe the
> instant it leaves the client, and becomes safe again
> the instant a response arrives.
>
> If my interpretation is correct, this change is wrong.
>
> But I may be wrong, and there may really be no need to
> worry about a possible modification of data until after
> an acknowledgement response is received.  In that case,
> I've looked at your patch and it looks good.
>
> Can you explain why I'm wrong about what is "unsafe?"

I didn't say you are wrong.  the reason I changed the meaning of the
unsafe callback is that the "unsafe' callback is only used for fsync(2).
I think it's OK to change it as long as the change does not break
fsync(2).

regards
yan, zheng

>
>                                         -Alex
>
>> The purpose of unsafe callback is adding unsafe request to a list,
>> so that fsync(2) can wait for the safe reply. fsync(2) doesn't need
>> to wait for a write(2) that hasn't returned yet. So it's OK to add
>> request to the unsafe list when the first OSD reply is received.
>> (ceph_sync_write() returns after receiving the first OSD reply)
>>
>> Signed-off-by: Yan, Zheng <zheng.z.yan@xxxxxxxxx>
>> ---
>>  net/ceph/osd_client.c | 14 +++++++-------
>>  1 file changed, 7 insertions(+), 7 deletions(-)
>>
>> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
>> index 540dd29..dd47889 100644
>> --- a/net/ceph/osd_client.c
>> +++ b/net/ceph/osd_client.c
>> @@ -1337,10 +1337,6 @@ static void __send_request(struct ceph_osd_client *osdc,
>>
>>       ceph_msg_get(req->r_request); /* send consumes a ref */
>>
>> -     /* Mark the request unsafe if this is the first timet's being sent. */
>> -
>> -     if (!req->r_sent && req->r_unsafe_callback)
>> -             req->r_unsafe_callback(req, true);
>>       req->r_sent = req->r_osd->o_incarnation;
>>
>>       ceph_con_send(&req->r_osd->o_con, req->r_request);
>> @@ -1431,8 +1427,6 @@ static void handle_osds_timeout(struct work_struct *work)
>>
>>  static void complete_request(struct ceph_osd_request *req)
>>  {
>> -     if (req->r_unsafe_callback)
>> -             req->r_unsafe_callback(req, false);
>>       complete_all(&req->r_safe_completion);  /* fsync waiter */
>>  }
>>
>> @@ -1559,14 +1553,20 @@ static void handle_reply(struct ceph_osd_client *osdc, struct ceph_msg *msg,
>>       mutex_unlock(&osdc->request_mutex);
>>
>>       if (!already_completed) {
>> +             if (req->r_unsafe_callback &&
>> +                 result >= 0 && !(flags & CEPH_OSD_FLAG_ONDISK))
>> +                     req->r_unsafe_callback(req, true);
>>               if (req->r_callback)
>>                       req->r_callback(req, msg);
>>               else
>>                       complete_all(&req->r_completion);
>>       }
>>
>> -     if (flags & CEPH_OSD_FLAG_ONDISK)
>> +     if (flags & CEPH_OSD_FLAG_ONDISK) {
>> +             if (req->r_unsafe_callback && already_completed)
>> +                     req->r_unsafe_callback(req, false);
>>               complete_request(req);
>> +     }
>>
>>  done:
>>       dout("req=%p req->r_linger=%d\n", req, req->r_linger);
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html