Re: Reiser4 Linux 4.17.19-1 hangs in Google cloud VM, too.

Edward Shishkin <edward.shishkin@xxxxxxxxx> · Wed, 24 Oct 2018 12:10:37 +0200

On 10/24/2018 08:11 AM, Jose R R wrote:
On Tue, Oct 23, 2018 at 3:16 PM Edward Shishkin
<edward.shishkin@xxxxxxxxx> wrote:
On 10/23/2018 08:06 AM, Edward Shishkin wrote:
On 10/23/2018 06:28 AM, Jose R R wrote:
Thank you for replying, Al-

On Mon, Oct 22, 2018 at 8:38 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
On Mon, Oct 22, 2018 at 03:19:12AM -0700, Metztli Information
Technology wrote:
I installed reiser4 -enhanced Linux kernel 4.17.19-1 --thus
replacing the prior hung reiser4 -patched kernel 4.18.15-1 in the
Google Compute Engine (GCE) cloud instance. After less than 24
hours the 4.17.19-1 hung in similar way to the 4.18.15-1.

Please note that I had been running my custom Metztli Reiser4
Debian Stretch image with reiser4 linux 4.14.20-1 without issues
for several months
<
https://github.com/Metztli/reiser4-debian-kernel-packaging-4.14.20
  --until I decided to upgrade to newer kernel(s).
Hello.

Looks like a regression because of VFS/block-layer changes (I don't
test new releases carefully enough). Once I am back from vacations,
I'll have a look at this..

I don't confirm any regression though. Most likely it's the old bug
(slow progress in committing a transaction), which is not always
reproducible. See (*) for example. One needs to add profiling and
accumulate statistics to understand what is wrong. Personally for me
it is not a task of high priority.
Fact of matter, Sir, is that *something* changed in upstream kernel
source development that has rendered your reiser4 patches for Linux
4.17.x  --and which continues into 4.18.y series-- unusable.

That "fact of matter" doesn't help. Hence, unusable.

  For
instance, in my local development 1.3TB machine, reiser4 enhanced
Linux 4.18.11/12/13/15 hung so frequently and so bad that I had to
turn off the machine as many times. I did not lose data in my reiser4
1.3TB root fs partition *notwithstanding* in one instance I had to
reformat the ext2 boot partition because hung instance /turn off
machine/ reboot, corrupted one of GRUB files and I was unable to
delete it.

Accordingly, the serial output provided from my custom Google cloud
hung instances of both 4.17.x and 4.18.y is all I can provide --hoping
it may elucidate a clue. This hung issue is more serious than what you
are assuming, Ed. Notwithstanding, the hung failure pattern may be
familiar to other kernel developers interacting with the upstream
source that triggered such an output -- although initially I was not
sure if it was reiser4 per se -- hence my decision to ask for help at
those lists.

In your case the watchdog complaints on no-progress of the following
3 tasks:

1)  kworker/u2:1:2363
2)  freshclam:3158
3)  ktxnmgrd:sda5:r:198

(2) is waiting for (1) to complete page writeback.
(3) also is also waiting for (1) to complete atom flushing.
(1) tries to lock a page with no success: somebody (but not (2), or (3))
keeps the page locked. From your output it is impossible to understand
who is the culprit. Could you dump and send the list of all current 
tasks (*)
for possible hints?

(*) https://www.kernel.org/doc/html/v4.17/admin-guide/sysrq.html

Thanks,
Edward.