On Wed, Sep 30, 2015 at 2:35 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote: > On 09/30/2015 05:26 PM, Francesco Ruggeri wrote: >> On Wed, Sep 30, 2015 at 12:36 PM, Peter Hurley <peter@xxxxxxxxxxxxxxxxxx> wrote: >>> >>> What arch/platform are you seeing this on? >> >> x86_64 on Supermicro servers with two Xeon CPUs, 8 or 12 cores each, >> hyperthreading. >> >> I do not know when I will have a chance to try this on 4.X. >> I have not worked on this for a while now, but here are some updates >> about 3.18.19 >> since my original post. >> - I had to apply the same logic to n_tty_read. >> - Using the memory barriers did not help (but I am still curious about >> that logic). > > Yeah, lack of memory barriers would not explain your observation on x86/64 > because the situation you hypothesized [1] is not possible on x86/64 CPUs. > Further, the compiler should not be able to re-order those operations either; > what compiler are you using? For completeness, would you send me a mixed > listing of drivers/tty/n_tty.c for that server/kernel/compiler so I can > eliminate compiler reordering as the problem. I finally got around to running some tests on 4.2.3 and I ran into a similar problem in n_tty_read using the script below. I verified that the producer (__receive_buf) found the waitqueue inactive, and the consumer (n_tty_read) found !input_available_p. I see the script hang within a minute or two. #!/usr/bin/python import os, select, pty, time, sys, fcntl n_times=100000000 m_str = 'String from master' s_str = 'String from slave' + '=' * 256 + 'End string from slave' def read_n(fd, n): buf = '' left = n while left > 0: buf += os.read(fd, left) left = n - len(buf) return buf (pid, pty_fd) = pty.fork() if pid == 0: # Slave os.system('stty -echo -echoe -echok -echonl') # Notify master that we started print 'Slave process started with pid %d' % os.getpid() while True: buf = read_n(0, len(m_str) + 1) os.write(1, buf) os.write(1, s_str) os.write(1, '\n') else: # Master os.system('sudo taskset -p 0x04 %d' % os.getpid()) os.system('sudo taskset -p 0x08 %d' % pid) # Wait for slave to start read_len=100 buf = os.read(pty_fd, read_len) print 'Master received: %s' % buf for i in range(n_times): os.write(pty_fd, m_str) os.write(pty_fd, '\n') buf = read_n(pty_fd, (len(m_str) + 2)) # echo buf = read_n(pty_fd, len(s_str)) buf = read_n(pty_fd, len('\n') + 1) #account for \r if ((i+1)%10000) == 0: print 'Loop %d' % (i+1) os.system('stty sane') FWIW the script completed successfully I after changed the memory barrier in __receive_buf as follows: Index: linux-4.2.x86_64/drivers/tty/n_tty.c =================================================================== --- linux-4.2.x86_64.orig/drivers/tty/n_tty.c +++ linux-4.2.x86_64/drivers/tty/n_tty.c @@ -1663,7 +1679,7 @@ static void __receive_buf(struct tty_str return; /* publish read_head to consumer */ - smp_store_release(&ldata->commit_head, ldata->read_head); + smp_store_mb(ldata->commit_head, ldata->read_head); if ((read_cnt(ldata) >= ldata->minimum_to_wake) || L_EXTPROC(tty)) { kill_fasync(&tty->fasync, SIGIO, POLL_IN); Thanks, Francesco Ruggeri > > Regards, > Peter Hurley > > > [1] > > On 08/28/2015 01:06 PM, Francesco Ruggeri wrote: >> It looks like the logic used is like this: >> >> producer (flush_to_ldisc) consumer (select/n_tty_poll) >> >> advance index in read_buf add_wait_queue >> (full memory barrier here?) (full memory barrier here?) >> if waitqueue_active() if !input_available_p() >> wake up consumer wait > -- To unsubscribe from this list: send the line "unsubscribe linux-serial" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html