Am 15.04.24 um 12:35 schrieb zhiguojiang:
在 2024/4/12 14:39, Christian König 写道:
[Some people who received this message don't often get email from
christian.koenig@xxxxxxx. Learn why this is important at
https://aka.ms/LearnAboutSenderIdentification ]
Am 12.04.24 um 08:19 schrieb zhiguojiang:
[SNIP]
-> Here task 2220 do epoll again where internally it will get/put then
start to free twice and lead to final crash.
Here is the basic flow:
1. Thread A install the dma_buf_fd via dma_buf_export, the fd refcount
is 1
2. Thread A add the fd to epoll list via epoll_ctl(EPOLL_CTL_ADD)
3. After use the dma buf, Thread A close the fd, then here fd refcount
is 0,
and it will run __fput finally to release the file
Stop, that isn't correct.
The fs layer which calls dma_buf_poll() should make sure that the file
can't go away until the function returns.
Then inside dma_buf_poll() we add another reference to the file while
installing the callback:
/* Paired with fput in dma_buf_poll_cb */
get_file(dmabuf->file);
Hi,
The problem may just occurred here.
Is it possible file reference count already decreased to 0 and fput
already being in progressing just before calling
get_file(dmabuf->file) in dma_buf_poll()?
No, exactly that isn't possible.
If a function gets a dma_buf pointer or even more general any reference
counted pointer which has already decreased to 0 then that is a major
bug in the caller of that function.
BTW: It's completely illegal to work around such issues by using
file_count() or RCU functions. So when you suggest stuff like that it
will immediately face rejection.
Regards,
Christian.
This reference is only dropped after the callback is completed in
dma_buf_poll_cb():
/* Paired with get_file in dma_buf_poll */
fput(dmabuf->file);
So your explanation for the issue just seems to be incorrect.
4. Here Thread A not do epoll_ctl(EPOLL_CTL_DEL) manunally, so it
still resides in one epoll_list.
Although __fput will call eventpoll_release to remove the file from
binded epoll list,
but it has small time window where Thread B jumps in.
Well if that is really the case then that would then be a bug in
epoll_ctl().
5. During the small window, Thread B do the poll action for
dma_buf_fd, where it will fget/fput for the file,
this means the fd refcount will be 0 -> 1 -> 0, and it will call
__fput again.
This will lead to __fput twice for the same file.
6. So the potenial fix is use get_file_rcu which to check if file
refcount already zero which means under free.
If so, we just return and no need to do the dma_buf_poll.
Well to say it bluntly as far as I can see this suggestion is completely
nonsense.
When the reference to the file goes away while dma_buf_poll() is
executed then that's a massive bug in the caller of that function.
Regards,
Christian.
Here is the race condition:
Thread A Thread B
dma_buf_export
fd_refcount is 1
epoll_ctl(EPOLL_ADD)
add dma_buf_fd to epoll list
close(dma_buf_fd)
fd_refcount is 0
__fput
dma_buf_poll
fget
fput
fd_refcount is zero again
Thanks