Re: pty master not woken up?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/19/2015 02:34 AM, Francesco Ruggeri wrote:
> On Wed, Sep 30, 2015 at 2:35 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
>> On 09/30/2015 05:26 PM, Francesco Ruggeri wrote:
>>> On Wed, Sep 30, 2015 at 12:36 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote:
>>>>
>>>> What arch/platform are you seeing this on?
>>>
>>> x86_64 on Supermicro servers with two Xeon CPUs, 8 or 12 cores each,
>>> hyperthreading.
>>>
>>> I do not know when I will have a chance to try this on 4.X.
>>> I have not worked on this for a while now, but here are some updates
>>> about 3.18.19
>>> since my original post.
>>> - I had to apply the same logic to n_tty_read.
>>> - Using the memory barriers did not help (but I am still curious about
>>> that logic).
>>
>> Yeah, lack of memory barriers would not explain your observation on x86/64
>> because the situation you hypothesized [1] is not possible on x86/64 CPUs.
>> Further, the compiler should not be able to re-order those operations either;
>> what compiler are you using?  For completeness, would you send me a mixed
>> listing of drivers/tty/n_tty.c for that server/kernel/compiler so I can
>> eliminate compiler reordering as the problem.
> 
> I finally got around to running some tests on 4.2.3 and I ran into a
> similar problem in n_tty_read using the script below. I verified that
> the producer (__receive_buf) found the waitqueue inactive, and the
> consumer (n_tty_read) found !input_available_p.
> I see the script hang within a minute or two.

Francesco,

I didn't see that the store to commit_head might be deferred until after the
load of read_wait->next in waitqueue_active().

Kosuke Tatsukawa found the same stall and also recognized that the
wait_woken() scheduler changes added in 3.19 cause the same problem.
His patch, which just removes the waitqueue_active() tests, was added
to 4.3-rc5:

commit e81107d4c6bd098878af9796b24edc8d4a9524fd
Author: Kosuke Tatsukawa <tatsu@xxxxxxxxxxxxx>
Date:   Fri Oct 2 08:27:05 2015 +0000

    tty: fix stall caused by missing memory barrier in drivers/tty/n_tty.c

Regards,
Peter Hurley 


> #!/usr/bin/python
> 
> import os, select, pty, time, sys, fcntl
> 
> n_times=100000000
> m_str = 'String from master'
> s_str = 'String from slave' + '=' * 256 + 'End string from slave'
> 
> def read_n(fd, n):
>    buf = ''
>    left = n
>    while left > 0:
>      buf += os.read(fd, left)
>      left = n - len(buf)
>    return buf
> 
> (pid, pty_fd) = pty.fork()
> 
> if pid == 0:
>    # Slave
> 
>    os.system('stty -echo -echoe -echok -echonl')
> 
>    # Notify master that we started
>    print 'Slave process started with pid %d' % os.getpid()
> 
>    while True:
>       buf = read_n(0, len(m_str) + 1)
>       os.write(1, buf)
>       os.write(1, s_str)
>       os.write(1, '\n')
> 
> else:
>    # Master
>    os.system('sudo taskset -p 0x04 %d' % os.getpid())
>    os.system('sudo taskset -p 0x08 %d' % pid)
> 
>    # Wait for slave to start
>    read_len=100
>    buf = os.read(pty_fd, read_len)
>    print 'Master received: %s' % buf
> 
>    for i in range(n_times):
>       os.write(pty_fd, m_str)
>       os.write(pty_fd, '\n')
>       buf = read_n(pty_fd, (len(m_str) + 2)) # echo
>       buf = read_n(pty_fd, len(s_str))
>       buf = read_n(pty_fd, len('\n') + 1) #account for \r
>       if ((i+1)%10000) == 0:
>          print 'Loop %d' % (i+1)
> 
>    os.system('stty sane')
> 
> 
> FWIW the script completed successfully I after changed the memory
> barrier in __receive_buf as follows:
> 
> Index: linux-4.2.x86_64/drivers/tty/n_tty.c
> ===================================================================
> --- linux-4.2.x86_64.orig/drivers/tty/n_tty.c
> +++ linux-4.2.x86_64/drivers/tty/n_tty.c
> @@ -1663,7 +1679,7 @@ static void __receive_buf(struct tty_str
>   return;
> 
>   /* publish read_head to consumer */
> - smp_store_release(&ldata->commit_head, ldata->read_head);
> + smp_store_mb(ldata->commit_head, ldata->read_head);
> 
>   if ((read_cnt(ldata) >= ldata->minimum_to_wake) || L_EXTPROC(tty)) {
>   kill_fasync(&tty->fasync, SIGIO, POLL_IN);
> 
> 
> Thanks,
> Francesco Ruggeri
> 
> 
>>
>> Regards,
>> Peter Hurley
>>
>>
>> [1]
>>
>> On 08/28/2015 01:06 PM, Francesco Ruggeri wrote:
>>> It looks like the logic used is like this:
>>>
>>> producer (flush_to_ldisc)     consumer (select/n_tty_poll)
>>>
>>> advance index in read_buf     add_wait_queue
>>> (full memory barrier here?)   (full memory barrier here?)
>>> if waitqueue_active()         if !input_available_p()
>>>       wake up consumer                wait
>>

--
To unsubscribe from this list: send the line "unsubscribe linux-serial" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux PPP]     [Linux FS]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Linmodem]     [Device Mapper]     [Linux Kernel for ARM]

  Powered by Linux