Re: 5.4.20 - high load - lots of incoming data - small data read.

Jesper Krogh <jesper.krogh@xxxxxxxxx> · Sat, 11 Apr 2020 20:07:43 +0200

Ok, I set the mount options to 32MB and after a fresh VM reboot I got
a page-allocation error again.
This time it was the "first" kernel stacktrace after reboot - no "hung
tasks for more than 120s prior".

[Sat Apr 11 17:52:58 2020] kworker/7:1: page allocation failure:
order:0, mode:0xa20(GFP_ATOMIC),
nodemask=(null),cpuset=/,mems_allowed=0
[Sat Apr 11 17:52:58 2020] CPU: 7 PID: 2638 Comm: kworker/7:1 Not
tainted 5.4.30 #5
[Sat Apr 11 17:52:58 2020] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[Sat Apr 11 17:52:58 2020] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[Sat Apr 11 17:52:58 2020] Call Trace:
[Sat Apr 11 17:52:58 2020]  <IRQ>
[Sat Apr 11 17:52:58 2020]  dump_stack+0x6d/0x95
[Sat Apr 11 17:52:58 2020]  warn_alloc+0x10c/0x170
[Sat Apr 11 17:52:58 2020]  __alloc_pages_slowpath+0xe6c/0xef0
[Sat Apr 11 17:52:58 2020]  ? ip_local_deliver_finish+0x48/0x50
[Sat Apr 11 17:52:58 2020]  ? ip_local_deliver+0x6f/0xe0
[Sat Apr 11 17:52:58 2020]  ? tcp_v4_early_demux+0x11c/0x150
[Sat Apr 11 17:52:58 2020]  __alloc_pages_nodemask+0x2f3/0x360
[Sat Apr 11 17:52:58 2020]  page_frag_alloc+0x129/0x150
[Sat Apr 11 17:52:58 2020]  __napi_alloc_skb+0x86/0xd0
[Sat Apr 11 17:52:58 2020]  page_to_skb+0x67/0x350 [virtio_net]
[Sat Apr 11 17:52:58 2020]  receive_buf+0xe47/0x16c0 [virtio_net]
[Sat Apr 11 17:52:58 2020]  virtnet_poll+0xf2/0x364 [virtio_net]
[Sat Apr 11 17:52:58 2020]  net_rx_action+0x265/0x3e0
[Sat Apr 11 17:52:58 2020]  __do_softirq+0xf9/0x2aa
[Sat Apr 11 17:52:58 2020]  irq_exit+0xae/0xb0
[Sat Apr 11 17:52:58 2020]  do_IRQ+0x59/0xe0
[Sat Apr 11 17:52:58 2020]  common_interrupt+0xf/0xf
[Sat Apr 11 17:52:58 2020]  </IRQ>
[Sat Apr 11 17:52:58 2020] RIP: 0010:__check_object_size+0x40/0x1b0
[Sat Apr 11 17:52:58 2020] Code: 4c 8d 34 3e 41 55 41 54 44 0f b6 ea
49 8d 46 ff 53 49 89 f4 48 89 fb 48 39 c7 0f 87 48 01 00 00 48 83 ff
10 0f 86 54 01 00 00 <e8> 0b fe ff ff 85 c0 74 27 78 0f 83 f8 02 7f 0a
5b 41 5c 41 5d 41
[Sat Apr 11 17:52:58 2020] RSP: 0018:ffffb96c00c479f8 EFLAGS: 00000286
ORIG_RAX: ffffffffffffffdc
[Sat Apr 11 17:52:58 2020] RAX: ffff92b9f36a7aca RBX: ffff92b9f36a7ab6
RCX: ffffb96c00c47c58
[Sat Apr 11 17:52:58 2020] RDX: 0000000000000001 RSI: 0000000000000015
RDI: ffff92b9f36a7ab6
[Sat Apr 11 17:52:58 2020] RBP: ffffb96c00c47a18 R08: 0000000000000000
R09: 0000000000000000
[Sat Apr 11 17:52:58 2020] R10: 0000000000000015 R11: ffff92bbea8957f8
R12: 0000000000000015
[Sat Apr 11 17:52:58 2020] R13: 0000000000000001 R14: ffff92b9f36a7acb
R15: 0000000000000005
[Sat Apr 11 17:52:58 2020]  simple_copy_to_iter+0x2a/0x50
[Sat Apr 11 17:52:58 2020]  __skb_datagram_iter+0x162/0x280
[Sat Apr 11 17:52:58 2020]  ? skb_kill_datagram+0x70/0x70
[Sat Apr 11 17:52:58 2020]  skb_copy_datagram_iter+0x40/0xa0
[Sat Apr 11 17:52:58 2020]  tcp_recvmsg+0x6f7/0xbc0
[Sat Apr 11 17:52:58 2020]  ? aa_sk_perm+0x43/0x190
[Sat Apr 11 17:52:58 2020]  inet_recvmsg+0x64/0xf0
[Sat Apr 11 17:52:58 2020]  sock_recvmsg+0x66/0x70
[Sat Apr 11 17:52:58 2020]  ceph_tcp_recvmsg+0x6f/0xa0 [libceph]
[Sat Apr 11 17:52:58 2020]  read_partial.isra.26+0x50/0x80 [libceph]
[Sat Apr 11 17:52:58 2020]  read_partial_message+0x198/0x7c0 [libceph]
[Sat Apr 11 17:52:58 2020]  ceph_con_workfn+0xa6a/0x23d0 [libceph]
[Sat Apr 11 17:52:58 2020]  ? __dir_lease_try_check+0x90/0x90 [ceph]
[Sat Apr 11 17:52:58 2020]  process_one_work+0x167/0x400
[Sat Apr 11 17:52:58 2020]  worker_thread+0x4d/0x460
[Sat Apr 11 17:52:58 2020]  kthread+0x105/0x140
[Sat Apr 11 17:52:58 2020]  ? rescuer_thread+0x370/0x370
[Sat Apr 11 17:52:58 2020]  ? kthread_destroy_worker+0x50/0x50
[Sat Apr 11 17:52:58 2020]  ret_from_fork+0x35/0x40
[Sat Apr 11 17:52:58 2020] Mem-Info:
[Sat Apr 11 17:52:58 2020] active_anon:6795 inactive_anon:8914 isolated_anon:0
                            active_file:6364 inactive_file:3914973
isolated_file:0
                            unevictable:0 dirty:25 writeback:0 unstable:0
                            slab_reclaimable:52533 slab_unreclaimable:45778
                            mapped:5289 shmem:564 pagetables:1216 bounce:0
                            free:24896 free_pcp:5276 free_cma:0
[Sat Apr 11 17:52:58 2020] Node 0 active_anon:27180kB
inactive_anon:35656kB active_file:25456kB inactive_file:15659892kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:21156kB
dirty:100kB writeback:0kB shmem:2256kB shmem_thp: 0kB shmem_pmdmapped:
0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[Sat Apr 11 17:52:58 2020] Node 0 DMA free:15908kB min:64kB low:80kB
high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB
managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB
free_pcp:0kB local_pcp:0kB free_cma:0kB
[Sat Apr 11 17:52:58 2020] lowmem_reserve[]: 0 3444 15930 15930 15930
[Sat Apr 11 17:52:58 2020] Node 0 DMA32 free:55068kB min:14596kB
low:18244kB high:21892kB active_anon:1940kB inactive_anon:2316kB
active_file:1748kB inactive_file:3386696kB unevictable:0kB
writepending:56kB present:3653608kB managed:3588072kB mlocked:0kB
kernel_stack:16kB pagetables:28kB bounce:0kB free_pcp:11168kB
local_pcp:1552kB free_cma:0kB
[Sat Apr 11 17:52:58 2020] lowmem_reserve[]: 0 0 12485 12485 12485
[Sat Apr 11 17:52:58 2020] Node 0 Normal free:28608kB min:77492kB
low:90720kB high:103948kB active_anon:25240kB inactive_anon:33340kB
active_file:23708kB inactive_file:12273660kB unevictable:0kB
writepending:44kB present:13107200kB managed:12793540kB mlocked:0kB
kernel_stack:7504kB pagetables:4836kB bounce:0kB free_pcp:9936kB
local_pcp:1360kB free_cma:0kB
[Sat Apr 11 17:52:58 2020] lowmem_reserve[]: 0 0 0 0 0
[Sat Apr 11 17:52:58 2020] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB
(U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB
(M) 3*4096kB (M) = 15908kB
[Sat Apr 11 17:52:58 2020] Node 0 DMA32: 17*4kB (MEH) 22*8kB (UMH)
18*16kB (MH) 2*32kB (ME) 147*64kB (UME) 70*128kB (UME) 16*256kB (UM)
15*512kB (UME) 4*1024kB (UME) 2*2048kB (M) 4*4096kB (M) = 55316kB
[Sat Apr 11 17:52:58 2020] Node 0 Normal: 187*4kB (UMEH) 62*8kB (UMEH)
280*16kB (UMEH) 318*32kB (UMEH) 98*64kB (UMEH) 45*128kB (UMEH) 4*256kB
(UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 28956kB
[Sat Apr 11 17:52:58 2020] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
[Sat Apr 11 17:52:58 2020] Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
[Sat Apr 11 17:52:58 2020] 3922168 total pagecache pages
[Sat Apr 11 17:52:58 2020] 279 pages in swap cache
[Sat Apr 11 17:52:58 2020] 279 pages in swap cache
[Sat Apr 11 17:52:58 2020] Swap cache stats: add 4764, delete 4485, find 347/500
[Sat Apr 11 17:52:58 2020] Free swap  = 4019440kB
[Sat Apr 11 17:52:58 2020] Total swap = 4038652kB
[Sat Apr 11 17:52:58 2020] 4194200 pages RAM
[Sat Apr 11 17:52:58 2020] 0 pages HighMem/MovableOnly
[Sat Apr 11 17:52:58 2020] 94820 pages reserved
[Sat Apr 11 17:52:58 2020] 0 pages cma reserved
[Sat Apr 11 17:52:58 2020] 0 pages hwpoisoned
[Sat Apr 11 17:57:01 2020] warn_alloc: 50 callbacks suppressed

On Sat, Apr 11, 2020 at 3:20 PM Jesper Krogh <jesper.krogh@xxxxxxxxx> wrote:
>
> Ok, i'll change the mount options and report back in a few days on
> that topic. This is fairly reproducible, so I would expect to see the
> effect if it works.
>
> On Fri, Apr 10, 2020 at 6:32 PM Tony Lill <ajlill@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> >
> >
> > On 4/10/20 3:13 AM, Jesper Krogh wrote:
> > > Hi. What is the suggested change? - is it Ceph that has an rsize,wsize
> > > of 64MB ?
> > >
> >
> > Sorry, set rsize and wsize in the mount options for your cephfs to
> > something smaller.
> >
> > My problem with this is that I use autofs to mount my filesystem.
> > Starting with 4.14.82, after a few mount/unmount cycles, the mount would
> > fail with order 4 allocation error, and I'd have to reboot.
> >
> > I traced it to a change that doubled CEPH_MSG_MAX_DATA_LEN from 16M to
> > 32M. Later in the 5 series kernels, this was doubled again, and that
> > caused an order 5 allocation failure. This define is used to set the max
> > and default rsize and wsize.
> >
> > Reducing the rsize and wsize in the mount option fixed the problem for
> > me. This may do nothing for you, but, if it clears your allocation issue...
> >
> >
> > > On Fri, Apr 10, 2020 at 12:47 AM Tony Lill <ajlill@xxxxxxxxxxxxxxxxxxx> wrote:
> > >>
> > >>
> > >>
> > >> On 4/9/20 12:30 PM, Jeff Layton wrote:
> > >>> On Thu, 2020-04-09 at 18:00 +0200, Jesper Krogh wrote:
> > >>>> Thanks Jeff - I'll try that.
> > >>>>
> > >>>> I would just add to the case that this is a problem we have had on a
> > >>>> physical machine - but too many "other" workloads at the same time -
> > >>>> so we isolated it off to a VM - assuming that it was the mixed
> > >>>> workload situation that did cause us issues. I cannot be sure that it
> > >>>> is "excactly" the same problem we're seeing but symptoms are
> > >>>> identical.
> > >>>>
> > >>>
> > >>> Do you see the "page allocation failure" warnings on bare metal hosts
> > >>> too? If so, then maybe we're dealing with a problem that isn't
> > >>> virtio_net specific. In any case, let's get some folks more familiar
> > >>> with that area involved first and take it from there.
> > >>>
> > >>> Feel free to cc me on the bug report too.
> > >>>
> > >>> Thanks,
> > >>>
> > >>
> > >> In 5.4.20, the default rsize and wsize is 64M. This has caused me page
> > >> allocation failures in a different context. Try setting it to something
> > >> sensible.
> > >> --
> > >> Tony Lill, OCT,                     ajlill@xxxxxxxxxxxxxxxxxxx
> > >> President, A. J. Lill Consultants               (519) 650 0660
> > >> 539 Grand Valley Dr., Cambridge, Ont. N3H 2S2   (519) 241 2461
> > >> -------------- http://www.ajlc.waterloo.on.ca/ ---------------
> > >>
> > >>
> > >>
> >
> > --
> > Tony Lill, OCT,                     ajlill@xxxxxxxxxxxxxxxxxxx
> > President, A. J. Lill Consultants               (519) 650 0660
> > 539 Grand Valley Dr., Cambridge, Ont. N3H 2S2   (519) 241 2461
> > -------------- http://www.ajlc.waterloo.on.ca/ ---------------
> >
> >
> >