Hi Trond,
Currently cl_tasks is used to maintain the list of all rpc_task's
for each rpc_clnt.
Under heavy write load, we've seen this list grows to millions
of entries. Even though the list is extremely long, the system
still runs fine until the user wants to get the information of
all active RPC tasks by doing:
# cat /sys/kernel/debug/sunrpc/rpc_clnt/N/tasks
When this happens, tasks_start() is called and it acquires the
rpc_clnt.cl_lock to walk the cl_tasks list, returning one entry
at a time to the caller. The cl_lock is held until all tasks on
this list have been processed.
While the cl_lock is held, completed RPC tasks have to spin wait
in rpc_task_release_client for the cl_lock. If there are millions
of entries in the cl_tasks list it will take a long time before
tasks_stop is called and the cl_lock is released.
Under heavy load condition the rpc_task_release_client threads
will use up all the available CPUs in the system, preventing other
jobs to run and this causes the system to temporarily lock up.
I'm looking for suggestions on how to address this issue. I think
one option is to convert the cl_tasks list to use xarray to eliminate
the contention on the cl_lock and would like to get the opinion
from the community.
Thanks,
-Dai