On Mon, Dec 01, 2014 at 06:16:28PM +0000, David Vrabel wrote: > On 01/12/14 16:19, Luis R. Rodriguez wrote: > > On Mon, Dec 01, 2014 at 03:54:24PM +0000, David Vrabel wrote: > >> On 01/12/14 15:44, Luis R. Rodriguez wrote: > >>> On Mon, Dec 1, 2014 at 10:18 AM, David Vrabel <david.vrabel@xxxxxxxxxx> wrote: > >>>> On 01/12/14 15:05, Luis R. Rodriguez wrote: > >>>>> On Mon, Dec 01, 2014 at 11:11:43AM +0000, David Vrabel wrote: > >>>>>> On 27/11/14 18:36, Luis R. Rodriguez wrote: > >>>>>>> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote: > >>>>>>>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote: > >>>>>>>>> From: "Luis R. Rodriguez" <mcgrof@xxxxxxxx> > >>>>>>>>> > >>>>>>>>> Some folks had reported that some xen hypercalls take a long time > >>>>>>>>> to complete when issued from the userspace private ioctl mechanism, > >>>>>>>>> this can happen for instance with some hypercalls that have many > >>>>>>>>> sub-operations, this can happen for instance on hypercalls that use > >>>>>> [...] > >>>>>>>>> --- a/drivers/xen/privcmd.c > >>>>>>>>> +++ b/drivers/xen/privcmd.c > >>>>>>>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata) > >>>>>>>>> hypercall.arg[0], hypercall.arg[1], > >>>>>>>>> hypercall.arg[2], hypercall.arg[3], > >>>>>>>>> hypercall.arg[4]); > >>>>>>>>> +#ifndef CONFIG_PREEMPT > >>>>>>>>> + schedule(); > >>>>>>>>> +#endif > >>>>>> > >>>>>> As Juergen points out, this does nothing. You need to schedule while in > >>>>>> the middle of the hypercall. > >>>>>> > >>>>>> Remember that Xen's hypercall preemption only preempts the hypercall to > >>>>>> run interrupts in the guest. > >>>>> > >>>>> How is it ensured that when the kernel preempts on this code path on > >>>>> CONFIG_PREEMPT=n kernel that only interrupts in the guest are run? > >>>> > >>>> Sorry, I really didn't describe this very well. > >>>> > >>>> If a hypercall needs a continuation, Xen returns to the guest with the > >>>> IP set to the hypercall instruction, and on the way back to the guest > >>>> Xen may schedule a different VCPU or it will do any upcalls (as per normal). > >>>> > >>>> The guest is free to return from the upcall to the original task > >>>> (continuing the hypercall) or to a different one. > >>> > >>> OK so that addresses what Xen will do when using continuation and > >>> hypercall preemption, my concern here was that using > >>> preempt_schedule_irq() on CONFIG_PREEMPT=n kernels in the middle of a > >>> hypercall on the return from an interrupt (e.g., the timer interrupt) > >>> would still let the kernel preempt to tasks other than those related > >>> to Xen. > >> > >> Um. Why would that be a problem? We do want to switch to any task the > >> Linux scheduler thinks is best. > > > > Its safe but -- it technically is doing kernel preemption, unless we want > > to adjust the definition of CONFIG_PREEMPT=n to exclude hypercalls. This > > was my original concern with the use of preempt_schedule_irq() to do this. > > I am afraid of setting precedents without being clear or wider review and > > acceptance. > > It's voluntary preemption at a well defined point. Its voluntarily preempting the kernel even for CONFIG_PREEMPT=n kernels... > It's no different to a cond_resched() call. Then I do agree its a fair analogy (and find this obviously odd that how widespread cond_resched() is), we just don't have an equivalent for IRQ context, why not avoid the special check then and use this all the time in the middle of a hypercall on the return from an interrupt (e.g., the timer interrupt)? diff --git a/include/linux/sched.h b/include/linux/sched.h index 5e344bb..e60b5a1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2759,6 +2759,12 @@ static inline int signal_pending_state(long state, struct task_struct *p) */ extern int _cond_resched(void); +/* + * Voluntarily preempting the kernel even for CONFIG_PREEMPT=n kernels + * on very special circumstances. + */ +extern int cond_resched_irq(void); + #define cond_resched() ({ \ __might_sleep(__FILE__, __LINE__, 0); \ _cond_resched(); \ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 240157c..1c4d443 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4264,6 +4264,16 @@ int __sched _cond_resched(void) } EXPORT_SYMBOL(_cond_resched); +int __sched cond_resched_irq(void) +{ + if (should_resched()) { + preempt_schedule_irq(); + return 1; + } + return 0; +} +EXPORT_SYMBOL_GPL(cond_resched_irq); + /* * __cond_resched_lock() - if a reschedule is pending, drop the given lock, * call schedule, and on return reacquire the lock. Luis -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html