Re: A bug may cause a request be executed twice in mClockQueue

"J. Eric Ivancich" <ivancich@xxxxxxxxxx> · Mon, 15 Oct 2018 14:04:10 -0400

This topic is now spread across three threads in ceph-devel. Can we keep
it in a single thread? That would help me, so I won't have to reply the
same way multiple times.

In another of the threads I asked you if you'd modified the source code
to turn off allow_limit_break. Because 12.2.4 has allow_limit_break set
to true (later versions support three states and use an enum class to
distinguish them). And if that's the case
PullPriorityQueue::pull_request should not be returning a "future",
unless there's a bug.

I'm happy to track down the bug, but I'd first need to know whether the
source code has been modified to create this situation.

Eric

On 10/11/18 5:47 AM, 韦皓诚 wrote:
> Yes, I agree.
> kefu chai <tchaikov@xxxxxxxxx> 于2018年10月11日周四 下午5:25写道：
>>
>> On Thu, Oct 11, 2018 at 1:19 PM 韦皓诚 <whc0000001@xxxxxxxxx> wrote:
>>>
>>> Hi guys
>>> Class mClockQueue calls  PullPriorityQueue::pull_request in function
>>> dequeue() . But PullPriorityQueue::pull_request may return a "future"
>>> value. That means the mclock tag of the request are greater than now,
>>> so it should be executed in the future instead of now.The mistake is
>>> that the mClockQueue::dequeue() do not consider about the type of
>>> return value
>>
>> in that case, the returned PullReq is a "future" instead of a "retn",
>> so ceph_assert() will abort OSD if a "future" is returned. but i agree
>> with you, probably we need to wait for "crimson::dmclock::get_time() -
>> pr.getTime()" before retrying.
>>
>>> and execute the request immediately. What is more serious is that
>>> PullPriorityQueue remains the request in queue and it would be
>>> executed twice in the future.
>>>
>>>
>>>                              From Weihaocheng
>>
>>
>>
>> --
>> Regards
>> Kefu Chai