On 2024/5/20 20:54, Gao Xiang wrote:
On 2024/5/20 20:42, Baokun Li wrote:On 2024/5/20 18:04, Jeff Layton wrote:On Mon, 2024-05-20 at 12:06 +0800, Baokun Li wrote:Hi Jeff, Thank you very much for your review! On 2024/5/19 19:11, Jeff Layton wrote:On Wed, 2024-05-15 at 20:51 +0800, libaokun@xxxxxxxxxxxxxxx wrote:From: Baokun Li <libaokun1@xxxxxxxxxx>Reusing the msg_id after a maliciously completed reopen request may cause a read request to remain unprocessed and result in a hung, as shown below:t1 | t2 | t3 ------------------------------------------------- cachefiles_ondemand_select_req cachefiles_ondemand_object_is_close(A) cachefiles_ondemand_set_object_reopening(A) queue_work(fscache_object_wq, &info->work) ondemand_object_worker cachefiles_ondemand_init_object(A) cachefiles_ondemand_send_req(OPEN) // get msg_id 6 wait_for_completion(&req_A->done) cachefiles_ondemand_daemon_read // read msg_id 6 req_A cachefiles_ondemand_get_fd copy_to_user // Malicious completion msg_id 6 copen 6,-1 cachefiles_ondemand_copen complete(&req_A->done)// will not set the object to close // because ondemand_id && fd is valid.// ondemand_object_worker() is done // but the object is still reopening. // new open req_B cachefiles_ondemand_init_object(B) cachefiles_ondemand_send_req(OPEN) // reuse msg_id 6 process_open_req copen 6,A.size // The expected failed copen was executed successfully Expect copen to fail, and when it does, it closes fd, which sets theobject to close, and then close triggers reopen again. However, due to msg_id reuse resulting in a successful copen, the anonymous fd is not closed until the daemon exits. Therefore read requests waiting for reopento complete may trigger hung task.To avoid this issue, allocate the msg_id cyclically to avoid reusing themsg_id for a very short duration of time.Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx> --- fs/cachefiles/internal.h | 1 + fs/cachefiles/ondemand.c | 20 ++++++++++++++++---- 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 8ecd296cc1c4..9200c00f3e98 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -128,6 +128,7 @@ struct cachefiles_cache { unsigned long req_id_next;struct xarray ondemand_ids; /* xarray for ondemand_id allocation */u32 ondemand_id_next; + u32 msg_id_next; };static inline bool cachefiles_in_ondemand_mode(struct cachefiles_cache *cache)diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index f6440b3e7368..b10952f77472 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c@@ -433,20 +433,32 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,smp_mb(); if (opcode == CACHEFILES_OP_CLOSE && - !cachefiles_ondemand_object_is_open(object)) { + !cachefiles_ondemand_object_is_open(object)) { WARN_ON_ONCE(object->ondemand->ondemand_id == 0); xas_unlock(&xas); ret = -EIO; goto out; } - xas.xa_index = 0; + /*+ * Cyclically find a free xas to avoid msg_id reuse that would+ * cause the daemon to successfully copen a stale msg_id. + */ + xas.xa_index = cache->msg_id_next; xas_find_marked(&xas, UINT_MAX, XA_FREE_MARK); + if (xas.xa_node == XAS_RESTART) { + xas.xa_index = 0;+ xas_find_marked(&xas, cache->msg_id_next - 1, XA_FREE_MARK);+ } if (xas.xa_node == XAS_RESTART) xas_set_err(&xas, -EBUSY); + xas_store(&xas, req); - xas_clear_mark(&xas, XA_FREE_MARK); - xas_set_mark(&xas, CACHEFILES_REQ_NEW); + if (xas_valid(&xas)) { + cache->msg_id_next = xas.xa_index + 1;If you have a long-standing stuck request, could this counter wrap around and you still end up with reuse?Yes, msg_id_next is declared to be of type u32 in the hope that when xa_index == UINT_MAX, a wrap around occurs so that msg_id_next goes to zero. Limiting xa_index to no more than UINT_MAX is to avoid the xarry being too deep. If msg_id_next is equal to the id of a long-standing stuck request after the wrap-around, it is true that the reuse in the above problem may also occur.But I feel that a long stuck request is problematic in itself, it meansthat after we have sent 4294967295 requests, the first one has not been processed yet, and even if we send a million requests per second, this one hasn't been completed for more than an hour. We have a keep-alive process that pulls the daemon back up as soon as it exits, and there is a timeout mechanism for requests in the daemon to prevent the kernel from waiting for long periods of time. In other words, we should avoid the situation where a request is stuck for a long period of time. If you think UINT_MAX is not enough, perhaps we could raise the maximum value of msg_id_next to ULONG_MAX?Maybe this should be using ida_alloc/free instead, which would prevent that too?The id reuse here is that the kernel has finished the open request req_A and freed its id_A and used it again when sending the open request req_B, but the daemon is still working on req_A, so the copen id_A succeeds but operates on req_B. The id that is being used by the kernel will not be allocated here so it seems that ida _alloc/free does not prevent reuse either, could you elaborate a bit more how this works?ida_alloc and free absolutely prevent reuse while the id is in use. That's sort of the point of those functions. Basically it uses a set of bitmaps in an xarray to track which IDs are in use, so ida_alloc only hands out values which are not in use. See the comments over ida_alloc_range() in lib/idr.c.Thank you for the explanation! The logic now provides the same guarantees as ida_alloc/free.The "reused" id, indeed, is no longer in use in the kernel, but it is stillin use in the userland, so a multi-threaded daemon could be handling two different requests for the same msg_id at the same time. Previously, the logic for allocating msg_ids was to start at 0 and look for a free xas.index, so it was possible for an id to be allocated to a new request just as the id was being freed. With the change to cyclic allocation, the kernel will not use the same id again until INT_MAX requests have been sent, and during the time it takes to send requests, the daemon has enough time to process requests whose ids are still in use by the daemon, but have already been freed in the kernel.Again, If I understand correctly, I think the main point here is wait_for_completion(&req_A->done) which could hang due to some malicious deamon. But I think it should be switched to wait_for_completion_killable() instead. * It's up to users to kill the mount instance if there is a malicious user daemon. So in that case, hung task will not be triggered anymore, and you don't need to care about cyclic allocation too. Thanks, Gao Xiang
Hi Xiang, The problem is not as simple as you think. If you make it killable, it just won't trigger a hung task in cachefiles_ondemand_send_req(), and the process waiting for the resource in question will also be hung. * When the open/read request in the mount process gets stuck, the sync/drop cache will trigger a hung task panic in iterate_supers() as it waits for sb->umount to be unlocked. * After umount, anonymous fd is not closed causing a hung task panic in fscache_hash_cookie() because of waiting for cookie unhash. * The dentry is in a loop up state, because the read request is not being processed, another process looking for the same dentry is waiting for the previous lookup to finish, which triggers a hung task panic in d_alloc_parallel(). Can all this be made killable? Thanks, Baokun
Regards, Baokun+ xas_clear_mark(&xas, XA_FREE_MARK); + xas_set_mark(&xas, CACHEFILES_REQ_NEW); + } xas_unlock(&xas); } while (xas_nomem(&xas, GFP_KERNEL));