Cause of Cygwin terminal gets stuck during the build

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

If you try the latest version of Cygwin (3.5.x) or Git for Windows (2.47) to build LibreOffice, you may have faced the problem that terminal sometimes gets stuck during the build.

This is the result of my effort to understand the problem:

The issue first appeared in Cygwin shell 3.3 a while ago, and it became worse in Cygwin 3.5, but with the recent update of "Git for Windows", it is also visible on "Git bash".

Running "make -d", one can see a lot of debugging information. The process may hang in many places, but this is one of the hangs, which reaches mktemp utility:

$ make -d
...
[build DEP] LNK:Library/unobootstrapprotector.dll.d
CreateProcess(C:\cygwin64\bin\mktemp.exe,mktemp --tmpdir=C:/cygwin64/tmp gbuild.XXXXXX,...)

[ Terminal hangs here ]

I could find the PID of hanging mktemp with:

$ ps ax | grep mktemp
26144 1 23307 10708 cons1 197609 23:38:01 /usr/bin/mktemp


After several tries, I could get some meaningful backtrace from hanging mktemp by attaching gdb to that. I have installed make-debuginfo and coreutils-debuginfo alongside gdb-14 to be able to get the backtrace:

$ gdb -p 10708
GNU gdb (GDB) (Cygwin 14.2-1) 14.2
...
Attaching to process 10708
[New Thread 10708.0x80b0]
[New Thread 10708.0x81e8]
Reading symbols from /usr/bin/mktemp.exe...
Reading symbols from /usr/lib/debug//usr/bin/mktemp.exe.dbg...
(gdb) interrupt
(gdb) bt
#0 0x00007ff8519e7a36 in fhandler_console::set_input_mode (m=m@entry=tty::cygwin,
    t=0x1a0030028, p=p@entry=0x800008da8)
at /usr/src/debug/cygwin-3.5.4-1/winsup/cygwin/fhandler/console.cc:817 #1 0x00007ff8519f175c in fhandler_console::post_open_setup (this=0x800008ba8, fd=<optimized out>) at /usr/src/debug/cygwin-3.5.4-1/winsup/cygwin/fhandler/console.cc:1910 #2 0x00007ff851948796 in dtable::init_std_file_from_handle (this=this@entry=0x800004870,
    fd=fd@entry=0, handle=0xffffffffffffffff, handle@entry=0x424)
    at /usr/src/debug/cygwin-3.5.4-1/winsup/cygwin/dtable.cc:425
#3  0x00007ff851948a61 in dtable::stdio_init (this=0x800004870)
    at /usr/src/debug/cygwin-3.5.4-1/winsup/cygwin/dtable.cc:162
#4  0x00007ff8519370e7 in dll_crt0_1 ()
    at /usr/src/debug/cygwin-3.5.4-1/winsup/cygwin/dcrt0.cc:929
#5  0x00007ff851935d51 in _cygtls::call2 (this=0x7ffffce00,
func=0x7ff851936f10 <dll_crt0_1(void*)>, arg=0x0, buf=buf@entry=0x7ffffcdf0)
    at /usr/src/debug/cygwin-3.5.4-1/winsup/cygwin/cygtls.cc:41
#6 0x00007ff851935dca in _cygtls::call (func=<optimized out>, arg=<optimized out>)
    at /usr/src/debug/cygwin-3.5.4-1/winsup/cygwin/cygtls.cc:28
#7  0x0000000000000000 in ?? ()

As visible in the backtrace, the problem is in fhandler_console::set_input_mode() in console.cc:817. Console hangs after attach_console() is invoked. There were known issues around multiple processes trying to access console at the same time, and this issue seems to be because of the exact same problem. Multiple processes want to write on the console at the same time, and then this concurrency problem happens, maybe a deadlock.

This is one of the patches that was supposed to fix the problem:

[PATCH] Cygwin: console: Fix race issue on allocating console simultaneously.
https://cygwin.com/pipermail/cygwin-patches/2024q3/012722.html

More "race" issues can be seen by searching "race" in newlib-cygwin:

https://cygwin.com/cgit/newlib-cygwin/log/?qt=grep&q=race

Looking into the sources of Cygwin 3.5.4-1 locally, one may see that fixes b160b690b6ace93ee4225f14a9287549e37f4a71 and 10477d95ec401213d5bded5ae3600ab0d2d5ed94 are already applied, but the problem still persists.

Also, the issue is not limited to Cygwin, and also happens in the recent version of "Git for Windows" shell. To describe the same issue on git bash, you can try 'uname -a' on git bash, which shares some sources with Cygwin.

On git bash version 2.46, you'll get 3.4.10-2e2ef940.x86_64, but with the latest, 2.47, you'll get 3.5.4-1e8cf1a5.x86_64. On git bash 2.46, you may not face the problem, but on git bash 2.47, you may face it immediately after invoking "make" for LibreOffice core source code.

This is from Git for Windows v2.47.0 release notes:
"Comes with the MSYS2 runtime (Git for Windows flavor) based on Cygwin v3.5.4, which drops Windows 7 and Windows 8 support."
https://github.com/git-for-windows/build-extra/blob/main/ReleaseNotes.md

One other observation from a LibreOffice developer, Michael W, is the character by character interleaving of the output on the terminal from different processes, which should not happen in a buffered STDOUT (standard output).

One last note is that the more parallelism you use, the more probable is that you see the build gets stuck. I use 20 parallel processes, but you may use --with-parallelism=1 to avoid issues, or you can set a larger value to reproduce the problem.

This is a more detailed backtrace:

(gdb) backtrace full
#0 0x00007ff8519e7a36 in fhandler_console::set_input_mode (m=m@entry=tty::cygwin,
    t=0x1a0030028, p=p@entry=0x800008da8)
at /usr/src/debug/cygwin-3.5.4-1/winsup/cygwin/fhandler/console.cc:817
        unit = 0
        oflags = 4294967295
        resume_pid = <optimized out>
        flags = <optimized out>
#1 0x00007ff8519f175c in fhandler_console::post_open_setup (this=0x800008ba8, fd=<optimized out>) at /usr/src/debug/cygwin-3.5.4-1/winsup/cygwin/fhandler/console.cc:1910
No locals.
#2 0x00007ff851948796 in dtable::init_std_file_from_handle (this=this@entry=0x800004870,
    fd=fd@entry=0, handle=0xffffffffffffffff, handle@entry=0x424)
    at /usr/src/debug/cygwin-3.5.4-1/winsup/cygwin/dtable.cc:425
        fh = 0x800008ba8
io = {{Status = 6, Pointer = 0x6}, Information = 140704499238208}
        fai = {AccessFlags = 8}
        openflags = 65538
        tp = {c_buf_old = 0, w_buf_old = 0}
buf = {dwSize = {X = 69, Y = 0}, dwCursorPosition = {X = 7, Y = 0}, wAttributes = 16576, srWindow = {Left = 20929, Top = 32760, Right = 0, Bottom = 1024}, dwMaximumWindowSize = {X = 20914, Y = 32760}} dcb = {DCBlength = 4294953784, BaudRate = 7, fBinary = 1, fParity = 0, fOutxCtsFlow = 0, fOutxDsrFlow = 0, fDtrControl = 0, fDsrSensitivity = 0, fTXContinueOnXoff = 0, fOutX = 1, fInX = 0, fErrorChar = 1, fNull = 0, fRtsControl = 0, fAbortOnError = 0, fDummy2 = 0, wReserved = 0, XonLim = 1280, XoffLim = 21, ByteSize = 0 '\000', Parity = 0 '\000', StopBits = 45 '-', XonChar = -58 '\306', XoffChar = -22 '\352', ErrorChar = 54 '6', EofChar = -89 '\247', EvtChar = -53 '\313', wReserved1 = 24591}
        bin = 65536
dev = {<_device> = {_name = 0x7ff851b5c2f5 <msg1+11749> "/dev/console", d = { devn = 327681, devn_fh_devices = FH_CONSOLE, {minor = 1, major = 5}},
            _native = 0x7ff851b5c2f5 <msg1+11749> "/dev/console",
exists_func = 0x7ff8519412bc <exists_console(device const&)>, _mode = 8192, lives_in_dev = true, dev_on_fs = false, name_allocated = false,
            native_allocated = false}, <No data fields>}
        access = <optimized out>
        ft = <optimized out>
        name = <optimized out>
__PRETTY_FUNCTION__ = "void dtable::init_std_file_from_handle(int, HANDLE)"
#3  0x00007ff851948a61 in dtable::stdio_init (this=0x800004870)
    at /usr/src/debug/cygwin-3.5.4-1/winsup/cygwin/dtable.cc:162
        in = 0x424
        out = 0x18c
        err = 0x230
        __PRETTY_FUNCTION__ = "void dtable::stdio_init()"
#4  0x00007ff8519370e7 in dll_crt0_1 ()
    at /usr/src/debug/cygwin-3.5.4-1/winsup/cygwin/dcrt0.cc:929
        __PRETTY_FUNCTION__ = "void dll_crt0_1(void*)"
#5  0x00007ff851935d51 in _cygtls::call2 (this=0x7ffffce00,
func=0x7ff851936f10 <dll_crt0_1(void*)>, arg=0x0, buf=buf@entry=0x7ffffcdf0)
    at /usr/src/debug/cygwin-3.5.4-1/winsup/cygwin/cygtls.cc:41
        res = <optimized out>
#6 0x00007ff851935dca in _cygtls::call (func=<optimized out>, arg=<optimized out>)
    at /usr/src/debug/cygwin-3.5.4-1/winsup/cygwin/cygtls.cc:28
buf = '\000' <repeats 16 times>, "\r\000\000\000\000\000\000\000`=\301Q\370\177\000\000\030>\301Q\370\177\000\000\320>\301Q\370\177", '\000' <repeats 58 times>, "pX\223Q\370\177", '\000' <repeats 138 times>...
        protect = <optimized out>
#7  0x0000000000000000 in ?? ()
No symbol table info available.


Regards,
Hossein

--
Hossein Nourikhah, Ph.D., Developer Community Architect
Tel: +49 30 5557992-65 | Email: hossein@xxxxxxxxxxxxxxx
The Document Foundation, Winterfeldtstraße 52, 10781 Berlin, DE
Gemeinnützige rechtsfähige Stiftung des bürgerlichen Rechts
Legal details: https://www.documentfoundation.org/imprint



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux