Re: ipc-msg broken again on 3.11-rc7?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/03/2013 02:53 PM, Manfred Spraul wrote:
> On 09/03/2013 11:16 AM, Vineet Gupta wrote:
>> On 09/03/2013 02:27 PM, Manfred Spraul wrote:
>>> On 09/03/2013 10:44 AM, Vineet Gupta wrote:
>>>>> b) Could you check that it is not just a performance regression?
>>>>>        Does ./msgctl08 1000 16 hang, too?
>>>> Nope that doesn't hang. The minimal configuration that hangs reliably is msgctl
>>>> 50000 2
>>>>
>>>> With this config there are 3 processes.
>>>> ...
>>>>     555   554 root     S     1208  0.4   0  0.0 ./msgctl08 50000 2
>>>>     554   551 root     S     1208  0.4   0  0.0 ./msgctl08 50000 2
>>>>     551   496 root     S     1208  0.4   0  0.0 ./msgctl08 50000 2
>>>> ...
>>>>
>>>> [ARCLinux]$ cat /proc/551/stack
>>>> [<80aec3c6>] do_wait+0xa02/0xc94
>>>> [<80aecad2>] SyS_wait4+0x52/0xa4
>>>> [<80ae24fc>] ret_from_system_call+0x0/0x4
>>>>
>>>> [ARCLinux]$ cat /proc/555/stack
>>>> [<80c2950e>] SyS_msgrcv+0x252/0x420
>>>> [<80ae24fc>] ret_from_system_call+0x0/0x4
>>>>
>>>> [ARCLinux]$ cat /proc/554/stack
>>>> [<80c28c82>] do_msgsnd+0x116/0x35c
>>>> [<80ae24fc>] ret_from_system_call+0x0/0x4
>>>>
>>>> Is this a case of lost wakeup or some such. I'm running with some more diagnostics
>>>> and will report soon ...
>>> What is the output of ipcs -q? Is the queue full or empty when it hangs?
>>> I.e. do we forget to wake up a receiver or forget to wake up a sender?
>> / # ipcs -q
>>
>> ------ Message Queues --------
>> key        msqid      owner      perms      used-bytes   messages
>> 0x72d83160 163841     root       600        0            0
>>
>>
> Ok, a sender is sleeping - even though there are no messages in the queue.
> Perhaps it is the race that I mentioned in a previous mail:
>>       for (;;) {
>>                 struct msg_sender s;
>>
>>                 err = -EACCES;
>>                 if (ipcperms(ns, &msq->q_perm, S_IWUGO))
>>                         goto out_unlock1;
>>
>>                 err = security_msg_queue_msgsnd(msq, msg, msgflg);
>>                 if (err)
>>                         goto out_unlock1;
>>
>>                 if (msgsz + msq->q_cbytes <= msq->q_qbytes &&
>>                                 1 + msq->q_qnum <= msq->q_qbytes) {
>>                         break;
>>                 }
>>
> [snip]
>>         if (!pipelined_send(msq, msg)) {
>>                 /* no one is waiting for this message, enqueue it */
>>                 list_add_tail(&msg->m_list, &msq->q_messages);
>>                 msq->q_cbytes += msgsz;
>>                 msq->q_qnum++;
>>                 atomic_add(msgsz, &ns->msg_bytes);
> The access to msq->q_cbytes is not protected.
>
> Vineet, could you try to move the test for free space after ipc_lock?
> I.e. the lock must not get dropped between testing for free space and 
> enqueueing the messages.

Hmm, the code movement is not trivial. I broke even the simplest of cases (patch
attached). This includes the additional change which Linus/Davidlohr had asked for.

-Vineet

diff --git a/ipc/msg.c b/ipc/msg.c
index 9f29d9e..a512829 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -687,14 +687,6 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
 		if (ipcperms(ns, &msq->q_perm, S_IWUGO))
 			goto out_unlock1;
 
-		err = security_msg_queue_msgsnd(msq, msg, msgflg);
-		if (err)
-			goto out_unlock1;
-
-		if (msgsz + msq->q_cbytes <= msq->q_qbytes &&
-				1 + msq->q_qnum <= msq->q_qbytes) {
-			break;
-		}
 
 		/* queue full, wait: */
 		if (msgflg & IPC_NOWAIT) {
@@ -703,6 +695,10 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
 		}
 
 		ipc_lock_object(&msq->q_perm);
+		err = security_msg_queue_msgsnd(msq, msg, msgflg);
+		if (err)
+			goto out_unlock0;
+
 		ss_add(msq, &s);
 
 		if (!ipc_rcu_getref(msq)) {
@@ -734,6 +730,12 @@ long do_msgsnd(int msqid, long mtype, void __user *mtext,
 	}
 
 	ipc_lock_object(&msq->q_perm);
+
+	if (!(msgsz + msq->q_cbytes <= msq->q_qbytes &&
+			1 + msq->q_qnum <= msq->q_qbytes)) {
+		goto out_unlock0;
+	}
+
 	msq->q_lspid = task_tgid_vnr(current);
 	msq->q_stime = get_seconds();
 

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]