-----Original Message-----
From: Jason Wang [mailto:jasowang@xxxxxxxxxx]
Sent: Friday, November 28, 2014 18:13 PM
To: Dexuan Cui
Cc: gregkh@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
driverdev-
devel@xxxxxxxxxxxxxxxxxxxxxx; olaf@xxxxxxxxx; apw@xxxxxxxxxxxxx; KY
Srinivasan; vkuznets@xxxxxxxxxx; Haiyang Zhang
Subject: RE: [PATCH v3] hv: hv_fcopy: drop the obsolete message on
transfer
failure
On Fri, Nov 28, 2014 at 4:36 PM, Dexuan Cui <decui@xxxxxxxxxxxxx>
wrote:
>> -----Original Message-----
>> From: Jason Wang [mailto:jasowang@xxxxxxxxxx]
>> Sent: Friday, November 28, 2014 14:47 PM
>> To: Dexuan Cui
>> Cc: gregkh@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
>> driverdev-
>> devel@xxxxxxxxxxxxxxxxxxxxxx; olaf@xxxxxxxxx;
apw@xxxxxxxxxxxxx; KY
>> Srinivasan; vkuznets@xxxxxxxxxx; Haiyang Zhang
>> Subject: Re: [PATCH v3] hv: hv_fcopy: drop the obsolete message
on
>> transfer
>> failure
>> On Thu, Nov 27, 2014 at 9:09 PM, Dexuan Cui
<decui@xxxxxxxxxxxxx>
>> wrote:
>> > In the case the user-space daemon crashes, hangs or is
killed, we
>> > need to down the semaphore, otherwise, after the daemon starts
>> next
>> > time, the obsolete data in fcopy_transaction.message or
>> > fcopy_transaction.fcopy_msg will be used immediately.
>> >
>> > Cc: Jason Wang <jasowang@xxxxxxxxxx>
>> > Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
>> > Cc: K. Y. Srinivasan <kys@xxxxxxxxxxxxx>
>> > Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx>
>> > ---
>> >
>> > v2: I removed the "FCP" prefix as Greg asked.
>> >
>> > I also updated the output message a little:
>> > "FCP: failed to acquire the semaphore" -->
>> > "can not acquire the semaphore: it is benign"
>> >
>> > v3: I added the code in fcopy_release() as Jason Wang
suggested.
>> > I removed the pr_debug (it isn't so meaningful)and added a
>> > comment instead.
>> >
>> > drivers/hv/hv_fcopy.c | 19 +++++++++++++++++++
>> > 1 file changed, 19 insertions(+)
>> >
>> > diff --git a/drivers/hv/hv_fcopy.c b/drivers/hv/hv_fcopy.c
>> > index 23b2ce2..faa6ba6 100644
>> > --- a/drivers/hv/hv_fcopy.c
>> > +++ b/drivers/hv/hv_fcopy.c
>> > @@ -86,6 +86,18 @@ static void fcopy_work_func(struct
work_struct
>> > *dummy)
>> > * process the pending transaction.
>> > */
>> > fcopy_respond_to_host(HV_E_FAIL);
>> > +
>> > + /* In the case the user-space daemon crashes, hangs or is
>> killed, we
>> > + * need to down the semaphore, otherwise, after the daemon
>> starts
>> > next
>> > + * time, the obsolete data in fcopy_transaction.message or
>> > + * fcopy_transaction.fcopy_msg will be used immediately.
>> > + *
>> > + * NOTE: fcopy_read() happens to get the semaphore (very
rare)?
>> > We're
>> > + * still OK, because we've reported the failure to the host.
>> > + */
>> > + if (down_trylock(&fcopy_transaction.read_sema))
>> > + ;
>>
>> Sorry, I'm not quite understand how if () ; can help here.
>>
>> Btw, a question not relate to this patch.
>>
>> What happens if a daemon is resume from SIGSTOP and expires the
>> check
>> here?
> Hi Jason,
> My idea is: here we need down_trylock(), but in case we can't get
the
> semaphore, it's OK anyway:
>
> Scenario 1):
> 1.1: when the daemon is blocked on the pread(), the daemon
receives
> SIGSTOP;
> 1.2: the host user runs the PowerShell Copy-VMFile command;
> 1.3.1: the driver reports the failure to the host user in 5s and
> 1.3.2: the driver down()-es the semaphore;
> 1.4: the daemon receives SIGCONT and it will be still blocked on
the
> pread().
> Without the down_trylock(), in 1.4, the daemon can receive an
> obsolete message.
> NOTE: in this scenario, the daemon is not killed.
>
> Scenario 2):
> In senario 1), if the daemon receives SIGCONT between 1.3.1 and
1.3.2
> and
> do down() in fcopy_read(), it will receive the message but: the
> driver has
> reported the failure to the host user and the driver's 1.3.2 can't
> get the
> semaphore -- IMO this is acceptably OK, though in the VM, an
> incomplete
> file will be left there.
> BTW, I think in the daemon's hv_start_fcopy() we should add a
> close(target_fd) before open()-ing a new one.
Right, but how about the case when resuming from SIGSTOP but no
timeout?