Re: System CPU increasing on idle 2.6.36

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 08, 2010 at 11:37:17PM -0500, Trond Myklebust wrote:

> On Wed, 2010-12-08 at 14:36 -0800, Simon Kirby wrote:
> > On Wed, Dec 08, 2010 at 04:53:09PM -0500, Trond Myklebust wrote:
> >
> > > Possibly a side effect of the fs/fs-writeback.c changes in 2.6.36? You
> > > do appear to be hitting a lot of spinlock contention, but I suspect that
> > > a lot of it is coming from writeback_sb_inodes, writeback_single_inode
> > > and queue_io, all of which seem unnaturally high on your stats above.
> > >
> > > I don't see how you can be seeing no traffic on the wire. You are
> > > certainly hitting some page writeout (0.2% nfs_pageio_doio).
> >
> > It really seems to not be doing anything.  I ran nfsstat -Zcv for 5
> > minutes, and the only non-zero counters are rpc calls and authrefrsh,
> > even though perf top shows similar profiles the whole time.
>
> That sounds like a bug in nfsstat, then. As I said, your trace results
> definitely indicate that you are doing page writeout. I'd cross check
> using wireshark...

I know it's weird, but it really isn't writing anything.  Confirmed with
tcpdump and tshark.  Maybe it writes a single thing after a lot of
trying, or something.

2.6.36.1 is also doing the same thing when completely idle, after
finishing a bunch of log crunching.  I'm just about to try
2.6.37-rc5-git3 on there plus your NFS fixes (which Linus seems to have
half-merged and uploaded as -git3 but not pushed to his public git) and
your NFSv4 UID patch, which I will be testing shortly.  If it works, this
should show if the same flush issue exists there or not.

Also, I just saw something similar in from 2.6.37-rc4-git1 plus your leak
fixes (this ver of top's %sys cpu is broken, ignore):

top - 16:54:44 up 8 days, 16:04,  1 user,  load average: 40.11, 43.36, 24.71
Tasks: 490 total,  28 running, 461 sleeping,   0 stopped,   1 zombie
Cpu(s): 17.8%us,  6.9%sy,  0.0%ni, 60.2%id, 13.1%wa,  0.0%hi,  2.0%si,  0.0%st
Mem:   4051256k total,  1717004k used,  2334252k free,    17112k buffers
Swap:   979960k total,   268756k used,   711204k free,   362940k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2221 root      20   0     0    0    0 R   16  0.0   4:37.71 flush-0:65
10557 root      20   0     0    0    0 R   16  0.0   2:54.90 flush-0:46
 7240 root      20   0     0    0    0 R   15  0.0   7:46.64 flush-0:31
 8682 root      20   0     0    0    0 R   15  0.0   3:04.89 flush-0:34
13915 root      20   0     0    0    0 R   15  0.0   4:37.43 flush-0:27
17778 root      20   0     0    0    0 R   15  0.0   4:53.57 flush-0:83
19281 root      20   0     0    0    0 R   15  0.0   3:59.55 flush-0:56
19484 root      20   0     0    0    0 R   15  0.0  17:12.80 flush-0:77
19693 root      20   0     0    0    0 R   15  0.0   6:16.20 flush-0:71
25272 root      20   0     0    0    0 R   15  0.0   3:47.70 flush-0:68
 5494 root      20   0     0    0    0 R   13  0.0  28:23.03 flush-0:75
 7095 root      20   0     0    0    0 R   13  0.0   1:55.52 flush-0:42
10298 root      20   0     0    0    0 R   13  0.0   3:39.00 flush-0:61
13900 root      20   0     0    0    0 R   13  0.0   8:38.14 flush-0:80
14389 root      20   0     0    0    0 R   13  0.0   4:52.59 flush-0:81
16412 root      20   0     0    0    0 R   13  0.0   2:52.46 flush-0:33
17225 root      20   0     0    0    0 R   13  0.0   4:01.92 flush-0:78
20985 root      20   0     0    0    0 R   13  0.0   3:09.37 flush-0:29
24424 root      20   0     0    0    0 R   13  0.0   2:01.69 flush-0:20
25273 root      20   0     0    0    0 R   13  0.0   4:08.11 flush-0:70
25560 root      20   0     0    0    0 R   13  0.0   3:40.00 flush-0:69
27578 root      20   0     0    0    0 R   13  0.0   6:17.51 flush-0:57
25270 root      20   0     0    0    0 R   12  0.0   5:15.30 flush-0:26
25392 root      20   0     0    0    0 R   12  0.0   2:39.36 flush-0:82
27488 root      20   0     0    0    0 R   12  0.0   3:35.94 flush-0:76
31283 root      20   0     0    0    0 R   12  0.0   2:58.57 flush-0:18
28461 root      20   0 29556 2404 1544 R    4  0.1   0:00.04 top
 6426 root      20   0     0    0    0 S    1  0.0   1:57.74 flush-0:73
14639 root      20   0     0    0    0 S    1  0.0   2:41.38 flush-0:58
20081 root      20   0     0    0    0 S    1  0.0   1:51.78 flush-0:60
24845 root      20   0     0    0    0 S    1  0.0   2:55.30 flush-0:32
    1 root      20   0 10364  332  328 S    0  0.0   0:10.35 init

   PerfTop:    3886 irqs/sec  kernel:98.8%  exact:  0.0% [1000Hz cycles],  (all, 4 CPUs)
----------------------------------------------------------------------------------------

             samples  pcnt function                        DSO
             _______ _____ _______________________________ __________________________

            17772.00 80.6% _raw_spin_lock                  [kernel.kallsyms]
              770.00  3.5% nfs_writepages                  [kernel.kallsyms]
              530.00  2.4% writeback_sb_inodes             [kernel.kallsyms]
              463.00  2.1% writeback_single_inode          [kernel.kallsyms]
              364.00  1.7% queue_io                        [kernel.kallsyms]
              283.00  1.3% nfs_write_inode                 [kernel.kallsyms]
              210.00  1.0% bit_waitqueue                   [kernel.kallsyms]
              185.00  0.8% redirty_tail                    [kernel.kallsyms]
              182.00  0.8% do_writepages                   [kernel.kallsyms]
              155.00  0.7% __iget                          [kernel.kallsyms]
              129.00  0.6% iput                            [kernel.kallsyms]
              122.00  0.6% __mark_inode_dirty              [kernel.kallsyms]
              102.00  0.5% write_cache_pages               [kernel.kallsyms]
               89.00  0.4% file_softmagic                  /usr/lib/libmagic.so.1.0.0
               67.00  0.3% _atomic_dec_and_lock            [kernel.kallsyms]
               53.00  0.2% is_bad_inode                    [kernel.kallsyms]
               48.00  0.2% page_fault                      [kernel.kallsyms]
               44.00  0.2% __wake_up_bit                   [kernel.kallsyms]
               32.00  0.1% find_get_pages_tag              [kernel.kallsyms]
               25.00  0.1% wake_up_bit                     [kernel.kallsyms]
               17.00  0.1% copy_page_c                     [kernel.kallsyms]
               17.00  0.1% clear_page_c                    [kernel.kallsyms]
               13.00  0.1% radix_tree_gang_lookup_tag_slot [kernel.kallsyms]
               12.00  0.1% ktime_get                       [kernel.kallsyms]
               12.00  0.1% symbol_filter                   /usr/local/bin/perf
               12.00  0.1% pagevec_lookup_tag              [kernel.kallsyms]
               11.00  0.0% mapping_tagged                  [kernel.kallsyms]
               10.00  0.0% do_page_fault                   [kernel.kallsyms]
               10.00  0.0% unmap_vmas                      [kernel.kallsyms]
                9.00  0.0% find_get_page                   [kernel.kallsyms]
                9.00  0.0% dso__load_sym                   /usr/local/bin/perf
                9.00  0.0% __phys_addr                     [kernel.kallsyms]
                8.00  0.0% nfs_pageio_init                 [kernel.kallsyms]
                8.00  0.0% __d_lookup                      [kernel.kallsyms]
                8.00  0.0% filemap_fault                   [kernel.kallsyms]
                7.00  0.0% sched_clock_local               [kernel.kallsyms]
                7.00  0.0% memset                          /lib/libc-2.7.so
                7.00  0.0% number                          [kernel.kallsyms]
                6.00  0.0% handle_mm_fault                 [kernel.kallsyms]
                6.00  0.0% symbols__insert                 /usr/local/bin/perf
                6.00  0.0% radix_tree_tagged               [kernel.kallsyms]
                5.00  0.0% perf_event_task_tick            [kernel.kallsyms]
                5.00  0.0% native_write_msr_safe           [kernel.kallsyms]
                5.00  0.0% kfree                           [kernel.kallsyms]
                5.00  0.0% __switch_to                     [kernel.kallsyms]
                5.00  0.0% strlen                          /lib/libc-2.7.so
                5.00  0.0% _cond_resched                   [kernel.kallsyms]
                5.00  0.0% format_decode                   [kernel.kallsyms]
                5.00  0.0% nfs_pageio_doio                 [kernel.kallsyms]

# cat /proc/5494/stack
[<ffffffff81052025>] __cond_resched+0x25/0x40
[<ffffffff81135459>] writeback_sb_inodes+0xb9/0x140
[<ffffffff81135efc>] writeback_inodes_wb+0xcc/0x140
[<ffffffff8113629b>] wb_writeback+0x32b/0x370
[<ffffffff811363b4>] wb_do_writeback+0xd4/0x1d0
[<ffffffff8113658b>] bdi_writeback_thread+0xdb/0x240
[<ffffffff81075046>] kthread+0x96/0xb0
[<ffffffff8100bca4>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff

# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
24  0 256484 2346524  26168 385584    0    1    33    26    5   14 18  9 60 13
25  0 256320 2346528  26168 385676    4    0   156     0 2041 2900  2 98  0  0
24  0 256144 2348916  26168 385968   32    0   172     8 2030 2879  3 97  0  0
24  0 255984 2349260  26168 386136    8    0   156     4 2080 2970  1 99  0  0
24  0 255360 2351216  26176 386168  128    0   272    44 2082 2942  0 100  0  0
25  0 255152 2351352  26184 386248   52    0   196    44 2071 3030  2 98  0  0
23  0 254880 2353400  26192 386660  144    0   296    84 2249 2960  3 97  0  0
23  0 254660 2354028  26192 386756   40    0   200     8 2060 3114  1 100  0  0

Gradually, the flush processes go back to sleeping, and then everything
works again, as if nothing happened.  This is in contrast to what we
are seeing on the log crunching server, where system time just keeps
increasing forever with each log cruching run.  These two cases may or
may not be the same problem.

See longer vmstat, nfsstat output, etc. here:

http://0x.ca/sim/ref/2.6.37-rc4-git1_flushy_nfs/

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux