Re: ipc-msg broken again on 3.11-rc7?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 2, 2013 at 6:29 PM, Manfred Spraul <manfred@xxxxxxxxxxxxxxxx> wrote:
> Hi,
>
> [forgot to cc everyone, thus I'll summarize some mails...]
>
> On 09/02/2013 06:58 AM, Vineet Gupta wrote:
>>
>> On 08/31/2013 11:20 PM, Linus Torvalds wrote:
>>>
>>> Vineet, actual patch for what Davidlohr suggests attached. Can you try
>>> it?
>>>
>>>               Linus
>>
>> Apologies for late in getting back to this - I was away from my computer
>> for a bit.
>>
>> Unfortunately, with a quick test, this patch doesn't help.
>> FWIW, this is latest mainline (.config attached).
>>
>> Let me know what diagnostics I can add to help with this.
>
>
> msgctl08 is a bulk message send/receive test. I had to look at it once
> before, then it was a broken hardware:
> https://lkml.org/lkml/2008/6/12/365
> This can be ruled out, because it works with 3.10.
>
> msgctl08 uses pairs of threads: one thread does msgsnd(), the other one
> msgrcv().
> There is no synchronization, i.e. the msgsnd() can race ahead until the
> kernel buffer is full and then a block with msgrcv() follows or it could be
> pairs of alternating msgsnd()/msgrcv() operations.
> No special features are used: each pair of threads has it's own message
> queues, all messages have type=1.
>
> Vineet ran strace - and just before the signal from killing msgctl08, there
> are only msgsnd()/msgrcv() calls.
> Vineet:
> a) could you run strace tomorrow again, with '-ttt' as an additional option?
> I don't see where exactly it hangs.
> b) Could you check that it is not just a performance regression?
>     Does ./msgctl08 1000 16 hang, too?
>
> In ipc/msg.c, I haven't seen any obvious reason why it should hang.
> The only race I spotted so far is this one:
>>
>>       for (;;) {
>>                 struct msg_sender s;
>>
>>                 err = -EACCES;
>>                 if (ipcperms(ns, &msq->q_perm, S_IWUGO))
>>                         goto out_unlock1;
>>
>>
>>                 err = security_msg_queue_msgsnd(msq, msg, msgflg);
>>                 if (err)
>>                         goto out_unlock1;
>>
>>                 if (msgsz + msq->q_cbytes <= msq->q_qbytes &&
>>                                 1 + msq->q_qnum <= msq->q_qbytes) {
>>                         break;
>>                 }
>>
> [snip]
>>
>>         if (!pipelined_send(msq, msg)) {
>>                 /* no one is waiting for this message, enqueue it */
>>                 list_add_tail(&msg->m_list, &msq->q_messages);
>>                 msq->q_cbytes += msgsz;
>>                 msq->q_qnum++;
>>                 atomic_add(msgsz, &ns->msg_bytes);
>
>
> The access to msq->q_cbytes is not protected. Thus two parallel msgsnd()
> calls could succeed, even if both together brings the queue length above the
> limit.
> But it can't explain why 3.11-rc7 hangs: As explained above, msgctl08 uses
> one queue for each thread pair.
>

Just FYI:

Linux Testing Project (LTP) will do a new release in the 1st September week.
Some IPC test-suites were reworked.
Manfred can you look at them ("...msgctl08 uses one queue for each
thread pair.").
( Might be worth to throw some words at the LTP mailing-list (that
test-case is not ideal, etc.)? )

- Sedat -

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]