On 2019-04-12 12:42, Liu, Changcheng wrote:
Hi all,
I'm enabling Ceph/RDMA(iWARP) in Ceph/V14.2.0.
It always hit segmentation fault at querying rdma devices after
quering radma devices succesffully for several times.
I traced the living kernel and found the problem in function
ib_uverbs_write:
1. ib_safe_file_access(filp) is false, then ib_uverbs_write
return -EACCESS.
2. filp->f_cred == current_cred() is false, then
ib_safe_file_access return false.
Could anyone give some suggestion to further check that
filp->f_cred is not equal to current_cred?
Below is the kernel code and traced log.
file: drivers/infiniband/core/uverbs_main.c
712 static ssize_t ib_uverbs_write(struct file *filp, const
char __user *buf,
713 size_t count, loff_t *pos)
714 {
715 +---- 9 lines: struct ib_uverbs_file *file =
filp->private_data;-----
724 if (!ib_safe_file_access(filp)) {
725 pr_err_once("uverbs_write: process %d (%s) changed
security contexts after opening file descriptor, this is not
allowed.\n",
726 task_tgid_vnr(current), current->comm);
727 return -EACCES;
728 }
729 +--- 74 lines: if (count <
sizeof(hdr))-------------------------------
803 }
file: kernel/include/rdma/ib.h
91 /*
92 * The IB interfaces that use write() as bi-directional
ioctl() are
93 * fundamentally unsafe, since there are lots of ways to
trigger "write()"
94 * calls from various contexts with elevated privileges.
That includes the
95 * traditional suid executable error message writes, but
also various kernel
96 * interfaces that can write to file descriptors.
97 *
98 * This function provides protection for the legacy API by
restricting the
99 * calling context.
100 */
101 static inline bool ib_safe_file_access(struct file *filp)
102 {
103 return filp->f_cred == current_cred() &&
!uaccess_kernel();
104 }
Kernel trace log:
root@nstcloudcc1:/sys/kernel/debug/tracing# cat
/sys/kernel/debug/tracing/trace
# tracer: nop
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<...>-87018 [003] .... 15409.847504: rdma_verb_fs:
(ib_uverbs_write+0x3c/0x3d0 [ib_uverbs])
filp_f_cred=0xffff8906bd855b00 current_cred=0xffff8906ad773500
get_fs=0xffffffffffffffff
<...>-87018 [003] d... 15409.847510: rdma_ib_verb:
(__vfs_write+0x1b/0x40 <- ib_uverbs_write) ret=0xfffffffffffffff3
t_name="msgr-worker-0"
Hi Liu,
Let me guess you are trying start the whole cluster with "ms_type =
async+rdma"
option set? If yes, then setting "ms_cluster_type = async+rdma" should
help.
Returning to your question and changed credentials. The problem you hit
is in
the order of RDMA inition, namely opening of "/dev/infiniband/uverbs0",
and
calling setuid(), which changes current->cred pointer inside a kernel
(see
commit_creds() call).
Here is the ftrace where it is clear that uverbs0 is firstly opened,
then setuid() is called and then write() fails:
4050 openat(AT_FDCWD, "/dev/infiniband/uverbs0", O_RDWR|O_CLOEXEC) = 16
...
4050 setuid(167 <unfinished ...>
4050 <... setuid resumed> ) = 0
...
4050 write(16,
"\30\0\0\0\32\0\20\0\300W*\307\221\177\0\0\300R\346bbU\0\0\0\0\0\0\4\0\0\0"...,
104 <unfinished ...>
4050 <... write resumed> ) = -1 EACCES (Permission denied)
Backtraces are the following (in the order we hit them):
Init RDMA connection:
#0 0x00007fffef53bbb0 in AsyncConnection::AsyncConnection(CephContext*,
AsyncMessenger*, DispatchQueue*, Worker*, bool, bool) () from
/usr/lib64/ceph/libceph-common.so.0
#1 0x00007fffef545c49 in
AsyncMessenger::create_connect(entity_addrvec_t const&, int) () from
/usr/lib64/ceph/libceph-common.so.0
#2 0x00007fffef5466ae in AsyncMessenger::connect_to(int,
entity_addrvec_t const&) () from /usr/lib64/ceph/libceph-common.so.0
#3 0x00007fffef5e85af in MonClient::_add_conn(unsigned int, unsigned
long) () from /usr/lib64/ceph/libceph-common.so.0
#4 0x00007fffef5e8ce3 in MonClient::_add_conns(unsigned long) () from
/usr/lib64/ceph/libceph-common.so.0
#5 0x00007fffef5ee1bf in MonClient::_reopen_session(int) () from
/usr/lib64/ceph/libceph-common.so.0
#6 0x00007fffef5efe15 in MonClient::authenticate(double) () from
/usr/lib64/ceph/libceph-common.so.0
#7 0x00007fffef5f0736 in MonClient::get_monmap_and_config() () from
/usr/lib64/ceph/libceph-common.so.0
#8 0x00005555559a68f5 in global_pre_init()
...
#10 0x0000555555663c11 in main ()
and init RDMA:
Thread 4 "msgr-worker-1" hit Breakpoint 7, 0x00007fffeeb727e0 in open64
() from /lib64/libpthread.so.0
$67 = 0x555557240240 "/dev/infiniband/uverbs1"
#0 0x00007fffeeb727e0 in open64 () from /lib64/libpthread.so.0
#1 0x00007fffec80edaa in verbs_open_device () from
/usr/lib64/libibverbs.so.1
#2 0x00007fffef59d940 in Device::Device(CephContext*, ibv_device*,
ibv_context*) () from /usr/lib64/ceph/libceph-common.so.0
#3 0x00007fffef5a1517 in Infiniband::init() () from
/usr/lib64/ceph/libceph-common.so.0
#4 0x00007fffef5b595a in RDMAWorker::connect(entity_addr_t const&,
SocketOptions const&, ConnectedSocket*) () from
/usr/lib64/ceph/libceph-common.so.0
#5 0x00007fffef53f677 in AsyncConnection::process() () from
/usr/lib64/ceph/libceph-common.so.0
#6 0x00007fffef591847 in EventCenter::process_events(unsigned int,
std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) ()
from /usr/lib64/ceph/libceph-common.so.0
#7 0x00007fffef595c88 in ?? () from /usr/lib64/ceph/libceph-common.so.0
#8 0x00007fffee69638f in ?? () from /usr/lib64/libstdc++.so.6
#9 0x00007fffeeb68569 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fffeddc19af in clone () from /lib64/libc.so.6
and only then setuid() is called:
#0 0x00007fffedd90030 in setuid () from /lib64/libc.so.6
#1 0x00005555559a7844 in global_init() ()
#2 0x0000555555663c11 in main ()
It seems the proper solution should be to start mon connections after
setuid()
is invoked.
Also according to the code (global_init.c::global_pre_init()) a simple
workaround
can be to use --no-mon-config option, then no monitor connection is
created inside
global_pre_init() under the "if (!conf->no_mon_config)" path, but I
doubt this is
a good way, just a workaround.
--
Roman