On 09/03/2013 02:53 PM, Manfred Spraul wrote: > On 09/03/2013 11:16 AM, Vineet Gupta wrote: >> On 09/03/2013 02:27 PM, Manfred Spraul wrote: >>> On 09/03/2013 10:44 AM, Vineet Gupta wrote: >>>>> b) Could you check that it is not just a performance regression? >>>>> Does ./msgctl08 1000 16 hang, too? >>>> Nope that doesn't hang. The minimal configuration that hangs reliably is msgctl >>>> 50000 2 >>>> >>>> With this config there are 3 processes. >>>> ... >>>> 555 554 root S 1208 0.4 0 0.0 ./msgctl08 50000 2 >>>> 554 551 root S 1208 0.4 0 0.0 ./msgctl08 50000 2 >>>> 551 496 root S 1208 0.4 0 0.0 ./msgctl08 50000 2 >>>> ... >>>> >>>> [ARCLinux]$ cat /proc/551/stack >>>> [<80aec3c6>] do_wait+0xa02/0xc94 >>>> [<80aecad2>] SyS_wait4+0x52/0xa4 >>>> [<80ae24fc>] ret_from_system_call+0x0/0x4 >>>> >>>> [ARCLinux]$ cat /proc/555/stack >>>> [<80c2950e>] SyS_msgrcv+0x252/0x420 >>>> [<80ae24fc>] ret_from_system_call+0x0/0x4 >>>> >>>> [ARCLinux]$ cat /proc/554/stack >>>> [<80c28c82>] do_msgsnd+0x116/0x35c >>>> [<80ae24fc>] ret_from_system_call+0x0/0x4 >>>> >>>> Is this a case of lost wakeup or some such. I'm running with some more diagnostics >>>> and will report soon ... >>> What is the output of ipcs -q? Is the queue full or empty when it hangs? >>> I.e. do we forget to wake up a receiver or forget to wake up a sender? >> / # ipcs -q >> >> ------ Message Queues -------- >> key msqid owner perms used-bytes messages >> 0x72d83160 163841 root 600 0 0 >> >> > Ok, a sender is sleeping - even though there are no messages in the queue. > Perhaps it is the race that I mentioned in a previous mail: >> for (;;) { >> struct msg_sender s; >> >> err = -EACCES; >> if (ipcperms(ns, &msq->q_perm, S_IWUGO)) >> goto out_unlock1; >> >> err = security_msg_queue_msgsnd(msq, msg, msgflg); >> if (err) >> goto out_unlock1; >> >> if (msgsz + msq->q_cbytes <= msq->q_qbytes && >> 1 + msq->q_qnum <= msq->q_qbytes) { >> break; >> } >> > [snip] >> if (!pipelined_send(msq, msg)) { >> /* no one is waiting for this message, enqueue it */ >> list_add_tail(&msg->m_list, &msq->q_messages); >> msq->q_cbytes += msgsz; >> msq->q_qnum++; >> atomic_add(msgsz, &ns->msg_bytes); > The access to msq->q_cbytes is not protected. > > Vineet, could you try to move the test for free space after ipc_lock? > I.e. the lock must not get dropped between testing for free space and > enqueueing the messages. Hmm, the code movement is not trivial. I broke even the simplest of cases (patch attached). This includes the additional change which Linus/Davidlohr had asked for. -Vineet
diff --git a/ipc/msg.c b/ipc/msg.c index 9f29d9e..a512829 100644 --- a/ipc/msg.c +++ b/ipc/msg.c @@ -687,14 +687,6 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext, if (ipcperms(ns, &msq->q_perm, S_IWUGO)) goto out_unlock1; - err = security_msg_queue_msgsnd(msq, msg, msgflg); - if (err) - goto out_unlock1; - - if (msgsz + msq->q_cbytes <= msq->q_qbytes && - 1 + msq->q_qnum <= msq->q_qbytes) { - break; - } /* queue full, wait: */ if (msgflg & IPC_NOWAIT) { @@ -703,6 +695,10 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext, } ipc_lock_object(&msq->q_perm); + err = security_msg_queue_msgsnd(msq, msg, msgflg); + if (err) + goto out_unlock0; + ss_add(msq, &s); if (!ipc_rcu_getref(msq)) { @@ -734,6 +730,12 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext, } ipc_lock_object(&msq->q_perm); + + if (!(msgsz + msq->q_cbytes <= msq->q_qbytes && + 1 + msq->q_qnum <= msq->q_qbytes)) { + goto out_unlock0; + } + msq->q_lspid = task_tgid_vnr(current); msq->q_stime = get_seconds();