hi Bodo,
I have implemented a zero-copy prototype for tcmu(not sent out yet), it
reallys
improves throughput in large io size cases, and I met some code question
that
may need you kind help, thanks in advance. In my prototype, I choose to
unmap pages in tcmu_irqcontrol():
==> tcmu_irqcontrol, which holds cmdr_lock.
====> tcmu_handle_completions
======> tcmu_handle_completion
========> zap_page_range, which needs to hold mm related lock.
But in page fault procedure:
==> tcmu_vma_fault, which I think its caller hold mm related lock.
====> tcmu_try_get_data_page, which tries to hold cmdr_lock.
Then dead lock may happens. In tcmu_try_get_data_page(),
it mainly acts like below:
mutex_lock(&udev->cmdr_lock);
page = xa_load(&udev->data_pages, dpi);
if (likely(page)) {
mutex_unlock(&udev->cmdr_lock);
return page;
}
/*
* Userspace messed up and passed in a address not in the
* data iov passed to it.
*/
pr_err("Invalid addr to data page mapping (dpi %u) on device %s\n",
dpi, udev->name);
mutex_unlock(&udev->cmdr_lock);
I wonder do we really need to hold cmdr_lock here, seems it doesn't
protect any real resources. If cmdr_lock here is used to avoid concurrent
xa_erase(), after releasing cmdr_lock, page still can be removed.
So my question is that can we safely remove this cmdr_lock here? thanks.
Regards,
Xiaoguang Wang