Re: Strange "flush" process bahaviour

Vyacheslav Dubeyko <slava@xxxxxxxxxxx> · Fri, 29 Mar 2013 12:15:10 +0400

On Fri, 2013-03-29 at 08:42 +0100, Piotr Szymaniak wrote:
> On Thu, Mar 28, 2013 at 10:22:52PM +0300, Vyacheslav Dubeyko wrote:
> > Thank you for additional details. Unfortunately, your sysrq-trigger
> > output is not complete. So, I can't make conclusion about what
> > operation was a reason of issue on your side. Could you send to me the
> > full log of sysrq-trigger output?
> 
> How to generete more verbose log?
> 

I meant that sysrq-trigger output ends with resume about runnable tasks
and locks in the system (for example):

[ 8670.960040] runnable tasks:
[ 8670.960040]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
[ 8670.960040] ----------------------------------------------------------------------------------------------------------
[ 8670.960040]      migration/1    14         0.002037      2223     0         0.002037      2308.610315         0.000000 /
[ 8670.960040]      kworker/1:1    29   2713578.453275     43727   120   2713578.453275      2098.177537   8659787.039783 /
[ 8670.960040] R           bash 11089        61.102602       142   120        61.102602        39.773082     46581.905519 /autogroup-171
[ 8670.960040] 
[ 8670.960040] 
[ 8670.960040] Showing all locks held in the system:
[ 8670.960040] 1 lock held by getty/892:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by getty/901:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by getty/927:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by getty/930:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by getty/937:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by getty/1214:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by bash/9572:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by bash/9797:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by bash/9879:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by bash/10021:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 1 lock held by bash/10224:
[ 8670.960040]  #0:  (&ldata->atomic_read_lock){+.+.+.}, at: [<ffffffff81460139>] n_tty_read+0x399/0x950
[ 8670.960040] 2 locks held by bash/11089:
[ 8670.960040]  #0:  (sysrq_key_table_lock){......}, at: [<ffffffff814689c2>] __handle_sysrq+0x32/0x190
[ 8670.960040]  #1:  (tasklist_lock){.?.?.-}, at: [<ffffffff810bf4e4>] debug_show_all_locks+0x44/0x1e0
[ 8670.960040] 
[ 8670.960040] =============================================

So, I need in full sysrq-trigger output for understanding situation on
your side. Could you share it?

> Btw looking at ps aux output it seems, that this flush is hanging there
> almost from first boot:
> root       937 88.2  0.0      0     0 ?        S    Feb17 50160:39 [flush-8:0]
> 

Could you share "cat /proc/partitions" and "mount" outputs? I need to
understand what partition is processed by [flush-8:0].

> And an uptime:
> 08:25:18 up 39 days, 10:57,  4 users,  load average: 0.72, 0.89, 0.98
> 

Yes, it can define a reason of the issue on your side.

> Also I don't know if this isn't some kind of regression. I'm using nilfs
> for, well, some time now (looking at fs creation date Aug 2011) and
> didn't noticed any strange behaviour before. I think I won't be able to
> check this, but before that last boot I used (95% sure) vanilla kernel
> 3.4.4 and it was not flushing things or I didn't noticed. I could go to
> some older LTS kernel on other machine and check that.
> 
> 
> > I can easily reproduce the issue by big file (100 - 500 GB) deletion
> > or truncation. Please, find description in:
> > http://www.mail-archive.com/linux-nilfs@xxxxxxxxxxxxxxx/msg01504.html.
> 
> If this brings something new - I'm not using huge files like that (this
> flush above is a 3.7G device). But if this reproduces the issue, it
> could be related.  (:
> 

Yes, it is really important to understand the situation on your side.
Because you can have another reason of the issue with similar symptoms.
So, we need to investigate your case more deeply, I think.

With the best regards,
Vyacheslav Dubeyko.

> 
> Piotr Szymaniak.

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html