On Wed, Dec 13, 2023 at 09:55:23AM -0500, Michael S. Tsirkin wrote: > On Wed, Dec 13, 2023 at 01:45:35PM +0100, Tobias Huschle wrote: > > On Wed, Dec 13, 2023 at 07:00:53AM -0500, Michael S. Tsirkin wrote: > > > On Wed, Dec 13, 2023 at 11:37:23AM +0100, Tobias Huschle wrote: > > > > On Tue, Dec 12, 2023 at 11:15:01AM -0500, Michael S. Tsirkin wrote: > > > > > On Tue, Dec 12, 2023 at 11:00:12AM +0800, Jason Wang wrote: > > > > > > On Tue, Dec 12, 2023 at 12:54 AM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > > > > [...] > > > > > > Apparently schedule is already called? > > > > > > > What about this: > > > > static int vhost_task_fn(void *data) > > { > > <...> > > did_work = vtsk->fn(vtsk->data); --> this calls vhost_worker if I'm not mistaken > > if (!did_work) > > schedule(); > > <...> > > } > > > > static bool vhost_worker(void *data) > > { > > struct vhost_worker *worker = data; > > struct vhost_work *work, *work_next; > > struct llist_node *node; > > > > node = llist_del_all(&worker->work_list); > > if (node) { > > <...> > > llist_for_each_entry_safe(work, work_next, node, node) { > > <...> > > } > > } > > > > return !!node; > > } > > > > The llist_for_each_entry_safe does not actually change the node value, doesn't it? > > > > If it does not change it, !!node would return 1. > > Thereby skipping the schedule. > > > > This was changed recently with: > > f9010dbdce91 fork, vhost: Use CLONE_THREAD to fix freezer/ps regression > > > > It returned a hardcoded 0 before. The commit message explicitly mentions this > > change to make vhost_worker return 1 if it did something. > > > > Seems indeed like a nasty little side effect caused by EEVDF not scheduling > > the woken up kworker right away. > > > So we are actually making an effort to be nice. > Documentation/kernel-hacking/hacking.rst says: > > If you're doing longer computations: first think userspace. If you > **really** want to do it in kernel you should regularly check if you need > to give up the CPU (remember there is cooperative multitasking per CPU). > Idiom:: > > cond_resched(); /* Will sleep */ > > > and this is what vhost.c does. > > At this point I'm not sure why it's appropriate to call schedule() as opposed to > cond_resched(). Ideas? > Peter, would appreciate feedback on this. When is cond_resched() insufficient to give up the CPU? Should Documentation/kernel-hacking/hacking.rst be updated to require schedule() instead? > -- > MST