On Wed, Sep 26, 2018 at 2:32 PM Fengguang Wu <fengguang.wu@xxxxxxxxx> wrote: > > CC cephfs list. > > On Wed, Sep 26, 2018 at 02:17:39PM +0200, Peter Zijlstra wrote: > >On Wed, Sep 26, 2018 at 08:02:58PM +0800, Fengguang Wu wrote: > >> CC LKP team. > >> > >> On Wed, Sep 26, 2018 at 01:36:23PM +0200, Peter Zijlstra wrote: > > > >> > I've been slow posting these, because the 0-day bot seems to be having trouble > >> > and I've not been getting the regular cross-build green light emails that I > >> > otherwise rely upon. > >> > >> Hi Philip, when will cephfs recover? > > > >Ah, you guys had massive fs trouble? Best of luck with that. > > > >I figured I'd prod you guys a bit, because I've heard a lot of > >'complaining' about 0day bot not working on IRC. It seems quite a lot of > >people are 'silently' relying on that thing :-) > > Yeah sorry. The build queues all turn red in our monitor tool. > > dmesg shows lots of high order (10) allocation failures. > It's interesting cephfs will rely on such high order allocations. > The allocation size seems directly come from user space read request size. ceph_alloc_page_vector() kmallocs an array of page pointers. For it to ask for an order 10 allocation, is that a 2G read request? > > [798565.679727] pstree[48300]: segfault at 0 ip 00007fd53a906ae4 sp 00007fff28c06200 error 4 in libc-2.24.so[7fd53a8a0000+1 > 95000] > [799491.313741] pxz: page allocation failure: order:10, mode:0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null) > [799491.327456] pxz cpuset=/ mems_allowed=0-1 > [799491.333252] CPU: 15 PID: 34827 Comm: pxz Not tainted 4.16.0-0.bpo.2-amd64 #1 Debian 4.16.16-2~bpo9+1 > [799491.344737] Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0022.062820171903 06/28/2017 > [799491.357971] Call Trace: > [799491.362046] dump_stack+0x5c/0x85 > [799491.367047] warn_alloc+0xfc/0x180 > [799491.372757] __alloc_pages_slowpath+0xded/0xe00 > [799491.379105] ? __cap_is_valid+0x1c/0xa0 [ceph] > [799491.385316] __alloc_pages_nodemask+0x212/0x250 > [799491.391578] kmalloc_order+0x14/0x40 > [799491.396732] kmalloc_order_trace+0x1d/0xa0 > [799491.402451] ceph_alloc_page_vector+0x1d/0x80 [libceph] > [799491.409392] ceph_read_iter+0x448/0x930 [ceph] > [799491.415497] ? __lru_cache_add+0x52/0x60 > [799491.420957] ? new_sync_read+0xe9/0x140 > [799491.426288] ? ceph_write_iter+0xb50/0xb50 [ceph] > [799491.432620] new_sync_read+0xe9/0x140 > [799491.437710] vfs_read+0x91/0x130 > [799491.442287] SyS_read+0x52/0xc0 Thanks, Ilya