Re: Async Messenger RDMA IB ib_uverbs_write return EACCES

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019-04-12 12:42, Liu, Changcheng wrote:
Hi all,
    I'm enabling Ceph/RDMA(iWARP) in Ceph/V14.2.0.
    It always hit segmentation fault at querying rdma devices after
quering radma devices succesffully for several times.

    I traced the living kernel and found the problem in function
ib_uverbs_write:
      1. ib_safe_file_access(filp) is false, then ib_uverbs_write
return -EACCESS.
      2. filp->f_cred == current_cred() is false, then
ib_safe_file_access return false.

    Could anyone give some suggestion to further check that
filp->f_cred is not equal to current_cred?

    Below is the kernel code and traced log.
    file: drivers/infiniband/core/uverbs_main.c
        712 static ssize_t ib_uverbs_write(struct file *filp, const
char __user *buf,
        713                  size_t count, loff_t *pos)
        714 {
        715 +----  9 lines: struct ib_uverbs_file *file =
filp->private_data;-----
        724     if (!ib_safe_file_access(filp)) {
        725         pr_err_once("uverbs_write: process %d (%s) changed
security contexts after opening file descriptor, this is not
allowed.\n",
        726                 task_tgid_vnr(current), current->comm);
        727         return -EACCES;
        728     }
        729 +--- 74 lines: if (count <
sizeof(hdr))-------------------------------
        803 }

    file: kernel/include/rdma/ib.h
         91 /*
92 * The IB interfaces that use write() as bi-directional ioctl() are
         93  * fundamentally unsafe, since there are lots of ways to
trigger "write()"
         94  * calls from various contexts with elevated privileges.
That includes the
         95  * traditional suid executable error message writes, but
also various kernel
         96  * interfaces that can write to file descriptors.
         97  *
         98  * This function provides protection for the legacy API by
restricting the
         99  * calling context.
        100  */
        101 static inline bool ib_safe_file_access(struct file *filp)
        102 {
103 return filp->f_cred == current_cred() && !uaccess_kernel();
        104 }

    Kernel trace log:
        root@nstcloudcc1:/sys/kernel/debug/tracing# cat
/sys/kernel/debug/tracing/trace
        # tracer: nop
        #
        #                              _-----=> irqs-off
        #                             / _----=> need-resched
        #                            | / _---=> hardirq/softirq
        #                            || / _--=> preempt-depth
        #                            ||| /     delay
        #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
        #              | |       |   ||||       |         |
                   <...>-87018 [003] .... 15409.847504: rdma_verb_fs:
(ib_uverbs_write+0x3c/0x3d0 [ib_uverbs])
filp_f_cred=0xffff8906bd855b00 current_cred=0xffff8906ad773500
                   get_fs=0xffffffffffffffff
                   <...>-87018 [003] d... 15409.847510: rdma_ib_verb:
(__vfs_write+0x1b/0x40 <- ib_uverbs_write) ret=0xfffffffffffffff3
t_name="msgr-worker-0"

Hi Liu,

Let me guess you are trying start the whole cluster with "ms_type = async+rdma" option set? If yes, then setting "ms_cluster_type = async+rdma" should help.

Returning to your question and changed credentials. The problem you hit is in the order of RDMA inition, namely opening of "/dev/infiniband/uverbs0", and calling setuid(), which changes current->cred pointer inside a kernel (see
commit_creds() call).

Here is the ftrace where it is clear that uverbs0 is firstly opened,
then setuid() is called and then write() fails:


4050  openat(AT_FDCWD, "/dev/infiniband/uverbs0", O_RDWR|O_CLOEXEC) = 16
...
4050  setuid(167 <unfinished ...>
4050  <... setuid resumed> )            = 0
...
4050 write(16, "\30\0\0\0\32\0\20\0\300W*\307\221\177\0\0\300R\346bbU\0\0\0\0\0\0\4\0\0\0"..., 104 <unfinished ...>
4050  <... write resumed> )             = -1 EACCES (Permission denied)


Backtraces are the following (in the order we hit them):

Init RDMA connection:

#0 0x00007fffef53bbb0 in AsyncConnection::AsyncConnection(CephContext*, AsyncMessenger*, DispatchQueue*, Worker*, bool, bool) () from /usr/lib64/ceph/libceph-common.so.0 #1 0x00007fffef545c49 in AsyncMessenger::create_connect(entity_addrvec_t const&, int) () from /usr/lib64/ceph/libceph-common.so.0 #2 0x00007fffef5466ae in AsyncMessenger::connect_to(int, entity_addrvec_t const&) () from /usr/lib64/ceph/libceph-common.so.0 #3 0x00007fffef5e85af in MonClient::_add_conn(unsigned int, unsigned long) () from /usr/lib64/ceph/libceph-common.so.0 #4 0x00007fffef5e8ce3 in MonClient::_add_conns(unsigned long) () from /usr/lib64/ceph/libceph-common.so.0 #5 0x00007fffef5ee1bf in MonClient::_reopen_session(int) () from /usr/lib64/ceph/libceph-common.so.0 #6 0x00007fffef5efe15 in MonClient::authenticate(double) () from /usr/lib64/ceph/libceph-common.so.0 #7 0x00007fffef5f0736 in MonClient::get_monmap_and_config() () from /usr/lib64/ceph/libceph-common.so.0
#8  0x00005555559a68f5 in global_pre_init()
...
#10 0x0000555555663c11 in main ()

and init RDMA:

Thread 4 "msgr-worker-1" hit Breakpoint 7, 0x00007fffeeb727e0 in open64 () from /lib64/libpthread.so.0
$67 = 0x555557240240 "/dev/infiniband/uverbs1"
#0  0x00007fffeeb727e0 in open64 () from /lib64/libpthread.so.0
#1 0x00007fffec80edaa in verbs_open_device () from /usr/lib64/libibverbs.so.1 #2 0x00007fffef59d940 in Device::Device(CephContext*, ibv_device*, ibv_context*) () from /usr/lib64/ceph/libceph-common.so.0 #3 0x00007fffef5a1517 in Infiniband::init() () from /usr/lib64/ceph/libceph-common.so.0 #4 0x00007fffef5b595a in RDMAWorker::connect(entity_addr_t const&, SocketOptions const&, ConnectedSocket*) () from /usr/lib64/ceph/libceph-common.so.0 #5 0x00007fffef53f677 in AsyncConnection::process() () from /usr/lib64/ceph/libceph-common.so.0 #6 0x00007fffef591847 in EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) () from /usr/lib64/ceph/libceph-common.so.0
#7  0x00007fffef595c88 in ?? () from /usr/lib64/ceph/libceph-common.so.0
#8  0x00007fffee69638f in ?? () from /usr/lib64/libstdc++.so.6
#9  0x00007fffeeb68569 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fffeddc19af in clone () from /lib64/libc.so.6


and only then setuid() is called:

#0  0x00007fffedd90030 in setuid () from /lib64/libc.so.6
#1  0x00005555559a7844 in global_init() ()
#2  0x0000555555663c11 in main ()


It seems the proper solution should be to start mon connections after setuid()
is invoked.

Also according to the code (global_init.c::global_pre_init()) a simple workaround can be to use --no-mon-config option, then no monitor connection is created inside global_pre_init() under the "if (!conf->no_mon_config)" path, but I doubt this is
a good way, just a workaround.

--
Roman












[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux