Re: More information about hung Jenkins builds

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19/06/2020 14:51, Stephan Bergmann wrote:
On 28/05/2020 22:19, Stephan Bergmann wrote:
For now, I have updated <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/> to use the new kill-wrapper timeout feature instead of Jenkins' "Abort the build if it's stuck" option.  (And am planning to roll it out to other Linux Jenkins jobs that could benefit from it, once it has proven sufficiently stable.)

I have rolled out the kill-wrapper and its timeout feature now also for <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil_branch/>, <https://ci.libreoffice.org/job/gerrit_linux_gcc_release/>, and <https://ci.libreoffice.org/job/lo_ubsan/>.

Just to note down the semi-obvious somewhere: One scenario that kill-wrapper apparently doesn't prevent is leftover processes after Jenkins "has lost the connection" (for whatever reason, maybe a bug in Jenkins itself?).

<https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/62736/> had gone down with

[...]
[build JUT] linguistic_unoapi
FATAL: command execution failed
java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2738)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3213)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:896)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:142)
	at hudson.remoting.Command.readFrom(Command.java:128)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: java.io.IOException: Backing channel 'tb75-lilith' is disconnected.
	at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
	at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
	at com.sun.proxy.$Proxy66.isAlive(Unknown Source)
	at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1147)
	at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1139)
	at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
	at hudson.model.Build$BuildExecution.build(Build.java:206)
	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
	at hudson.model.Run.execute(Run.java:1880)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
	at hudson.model.ResourceController.execute(ResourceController.java:97)
	at hudson.model.Executor.run(Executor.java:428)
FATAL: Unable to delete script file /tmp/jenkins3180341342272089625.sh
java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2738)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3213)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:896)
	at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
	at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
	at hudson.remoting.Command.readFrom(Command.java:142)
	at hudson.remoting.Command.readFrom(Command.java:128)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: hudson.remoting.ChannelClosedException: Channel "hudson.remoting.Channel@629ec1e9:tb75-lilith": Remote call on tb75-lilith failed. The channel is closing down or has closed down
	at hudson.remoting.Channel.call(Channel.java:991)
	at hudson.FilePath.act(FilePath.java:1069)
	at hudson.FilePath.act(FilePath.java:1058)
	at hudson.FilePath.delete(FilePath.java:1543)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:123)
	at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
	at hudson.model.Build$BuildExecution.build(Build.java:206)
	at hudson.model.Build$BuildExecution.doRun(Build.java:163)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
	at hudson.model.Run.execute(Run.java:1880)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
	at hudson.model.ResourceController.execute(ResourceController.java:97)
	at hudson.model.Executor.run(Executor.java:428)
Build step 'Execute shell' marked build as failure
Finished: FAILURE

leaving behind some pstree forest of

oosplash─┬─soffice.bin─┬─soffice.bin
         │             └─182*[{soffice.bin}]
         └─{oosplash}

sh───sh───python.bin─┬─oosplash─┬─soffice.bin─┬─soffice.bin
                     │          │             └─294*[{soffice.bin}]
                     │          └─{oosplash}
                     └─2*[{python.bin}]

sh───sh───python.bin───oosplash

sh───sh───gdb-core-bt.sh───gdb

sh───sh───python.bin───oosplash

on tb75, where each of those processes belonged to the above build as demonstrated with a respective

$ cat /proc/$PID/environ | tr '\0' '\n' | grep BUILD_NUMBER
BUILD_NUMBER=62736

That caused later builds like <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/62758/> on tb75 to fail with "the test UITest_calc_demo failed".

_______________________________________________
LibreOffice mailing list
LibreOffice@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/libreoffice




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux