Hi Richard, Marc, >>> [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0 Do you still have the backtraces that show the function call stack for errors that look like this? %s: page allocation failure: order:%d, mode:0x%x Please send as many relevant OOM failure traces that you can. I would like to see which memory allocation(s) are failing and if they are always the same stack trace. In the example above, order 7 means 2^7 of 4k pages, so it means the kernel can't find 512k of contiguous memory that can be allocated. It looks like the OOM is triggered in bch_cache_set_alloc, but might be cache_alloc too. I'm not sure if an alternate allocation mechanism can be used safely, but thats what I want to look into. Thanks! -Eric -- Eric Wheeler On Wed, 9 Mar 2016, Eric Wheeler wrote: > On Wed, 9 Mar 2016, Richard Bade wrote: > > > Hi Guys, > > I've also seen this issue when registering a device. > > [Fri Feb 5 14:06:59 2016] bcache: register_cache() error opening > > dm-6: cannot allocate memory > > > > For me the situation happens because I'm running Ceph OSD's on the > > host. These are using all the available memory and for some reason the > > register cannot allocate memory. > > I've reproduced this several times, but basically if I register on a > > system that has been up for a while and therefore using all it's RAM > > this will happen. > > Here's two specific time's it's happened: > > when creating using "make-bcache --cache /dev/vg-raid0/lv-ceph_j19 > > --bdev /dev/sdf1". It failed due to xfs still remaining on /dev/sdf1. > > Do you get a kernel failure, or is it just make-bcache being safe? > > > Then run wipefs and "make-bcache --cache /dev/vg-raid0/lv-ceph_j19 > > --bdev /dev/sdf1" again. CRASH. > > make-bcache will auto register. (Maybe there's an option to stop that, > I've not looked.) > > > when manually creating a cache device using "make-bcache --cache > > /dev/vg-raid0/lv-bcache". The cache set didn't show up in > > /sys/fs/bcache. echo "/dev/vg-raid0/lv-bcache" > > > /sys/fs/bcache/register. CRASH. > > I was able to work around this issue by stopping the osd's before > > registering and therefore freeing up memory. > > I would be curious if setting /proc/sys/vm/min_free_kbytes to 256*1024 > fixes that. > > > Because I am able to reproduce this relatively easily, I'm happy to > > test any patches. > > Yes, thank you, please do: > > Try the three patches at the top of these two branches and see if you > crash. I'm hoping it fails gracefully on OOM: > > https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc6-bcache-fixes > commits 84a2edd, a704484 > https://bitbucket.org/ewheelerinc/linux/branch/v4.5-rc7-bcache-fixes > commit 87264e9 > > If it stops crashing, then maybe we can work out a way to allocate memory > such that it succeeds. Please send any dmesg output. > > -Eric > > > Apart from the register crash, the cache sets have been up and stable > > for over 1 month. > > > > Regards, > > Richard > > > > On Mon, 7 Mar 2016, Marc MERLIN wrote: > > > > > I started a btrfs copy from a bcached device to another one (both are in > > > the 5 to 15TB range). > > > A minute later, my system that had been up for several days, crashed. > > > > > > Ultimately, it crashed on this: > > > bcache: register_cache() error opening sdl2: cannot allocate memory > > > BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8 > > > IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache] > > > > Strange about memory allocation issues. Do you have > > /proc/sys/vm/min_free_kbytes set to something like $((256*1024)) ? Is this > > a multi-socket machine with all memory plugged into only one CPU? > > > > I'm curious though, why was registration called a second time? Was the > > drive external? Could udev be re-registering the device? > > > > You might find where the registration is being done and prevent it from > > running automatically. At least that might solve the re-registration > > problem. > > > > As for the memory allocation issue, the backtrace indicates that this is a > > registration-time problem, not a runtime issue. I'm guessing it is one of > > the threads attempting to proceed after a memory allocation error similar > > to the writeback thread issue you had last time which was fixed by adding > > some locking around the initialization. > > > > I'll look and see if I can come up with a reasonable patch to cleanup the > > allocator or gc thread under OOM conditions at registration time. > > > > -Eric > > > > > > > > I'm not sure why it tried to register sdl2 then when it was already > > > registered from boot. Did it time out and had to be registered again? > > > > > > At this point, things with bcache have been bad enough, that I'm > > > considering removing it. My system has been very unstable with it :( > > > Or maybe I can just remove the caching devices, leave the backing ones, > > > and run without cache for now, giving me the option to re-add the cache later. > > > > > > I actually just added 16GB of RAM to the server after adding bcache, and > > > clearly going from 8 to 24GB made no difference. I'm wondering if there > > > is a memory leak somewhere? > > > > > > Just before the system crashed, memory looked like this, which isn't bad: > > > total used free shared buffers cached > > > Mem: 24392440 24241440 151000 0 88 20116748 > > > -/+ buffers/cache: 4124604 20267836 > > > Swap: 15616764 244452 15372312 > > > > > > I'm using btrfs on top of dmcrypt on top of bcache. > > > btrfs on top of dmcrypt was stable, I had been using it for years. Adding bcache is what > > > caused all my recent problems to start. > > > > > > Any ideas? > > > > > > Thanks, > > > Marc > > > > > > > > > [290569.458707] Process accounting resumed > > > [290623.673871] bcache-register: page allocation failure: order:7, mode:0x24080c0 > > > [290623.697143] CPU: 0 PID: 1284 Comm: bcache-register Tainted: G W 4.4.2-amd64-i915-volpreempt-20160214bc3 #5 > > > [290623.732143] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013 > > > [290623.762428] 0000000000000000 ffff88001df7bb08 ffffffff8134ae0a 0000000000000001 > > > [290623.787470] ffff88001df7bba0 ffffffff81124ab6 024080c01df7bc44 024080c000000040 > > > [290623.811775] 0000000700000001 0000000000000007 0000000000000007 0000000000000040 > > > [290623.836443] Call Trace: > > > [290623.846280] [<ffffffff8134ae0a>] dump_stack+0x44/0x55 > > > [290623.864123] [<ffffffff81124ab6>] warn_alloc_failed+0x114/0x12c > > > [290623.883425] [<ffffffff811274b8>] __alloc_pages_nodemask+0x7cb/0x84c > > > [290623.904159] [<ffffffff8107df80>] ? try_to_wake_up+0x1d9/0x1eb > > > [290623.923173] [<ffffffff8115f6d7>] alloc_pages_current+0xa9/0xcd > > > [290623.942435] [<ffffffff8112377e>] __get_free_pages+0xe/0x3c > > > [290623.960621] [<ffffffffc05192f0>] bch_cache_set_alloc+0x3c7/0x479 [bcache] > > > [290623.983825] [<ffffffffc051a5e3>] register_bcache+0x1241/0x1452 [bcache] > > > [290624.005456] [<ffffffff8134c853>] kobj_attr_store+0x10/0x1a > > > [290624.023729] [<ffffffff811df9df>] sysfs_kf_write+0x39/0x3b > > > [290624.041739] [<ffffffff811df2e1>] kernfs_fop_write+0xed/0x130 > > > [290624.060394] [<ffffffff8117d97d>] __vfs_write+0x26/0xa5 > > > [290624.077452] [<ffffffff810b2969>] ? current_kernel_time64+0x10/0x36 > > > [290624.097921] [<ffffffff812d6050>] ? security_file_permission+0x3b/0x42 > > > [290624.119197] [<ffffffff810915ea>] ? percpu_down_read+0x12/0x41 > > > [290624.138945] [<ffffffff8117fb61>] ? __sb_start_write+0x2b/0x48 > > > [290624.158291] [<ffffffff8117dfe8>] vfs_write+0x9d/0xe8 > > > [290624.174919] [<ffffffff8117e7bd>] SyS_write+0x4d/0x78 > > > [290624.191455] [<ffffffff810039c3>] do_fast_syscall_32+0xb3/0xf3 > > > [290624.210331] [<ffffffff816e3e32>] sysenter_flags_fixed+0x8/0x12 > > > [290624.229355] Mem-Info: > > > [290624.237432] active_anon:607145 inactive_anon:208571 isolated_anon:0 > > > [290624.237432] active_file:2103407 inactive_file:1565209 isolated_file:0 > > > [290624.237432] unevictable:1307 dirty:188846 writeback:138775 unstable:0 > > > [290624.237432] slab_reclaimable:180112 slab_unreclaimable:73373 > > > [290624.237432] mapped:419640 shmem:466581 pagetables:5778 bounce:0 > > > [290624.237432] free:988612 free_pcp:2130 free_cma:14 > > > [290624.351715] Node 0 DMA free:15884kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes > > > [290624.483875] lowmem_reserve[]: 0 3201 23800 23800 > > > [290624.500998] Node 0 DMA32 free:581000kB min:2648kB low:3308kB high:3972kB active_anon:229220kB inactive_anon:248156kB active_file:1102468kB inactive_file:832264kB unevictable:948kB isolated(anon):0kB isolated(file):68kB present:3362068kB managed:3283032kB mlocked:68719477684kB dirty:94316kB writeback:65996kB mapped:253400kB shmem:282168kB slab_reclaimable:96344kB slab_unreclaimable:41308kB kernel_stack:2480kB pagetables:2952kB unstable:0kB bounce:0kB free_pcp:4008kB local_pcp:140kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > > > [290624.655741] lowmem_reserve[]: 0 0 20599 20599 > > > [290624.674232] Node 0 Normal free:3368136kB min:17048kB low:21308kB high:25572kB active_anon:2203780kB inactive_anon:586104kB active_file:7311280kB inactive_file:5430068kB unevictable:4280kB isolated(anon):0kB isolated(file):0kB present:21485568kB managed:21093516kB mlocked:481036341432kB dirty:662172kB writeback:472472kB mapped:1425288kB shmem:1584132kB slab_reclaimable:624104kB slab_unreclaimable:251680kB kernel_stack:11696kB pagetables:20180kB unstable:0kB bounce:0kB free_pcp:4304kB local_pcp:588kB free_cma:56kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > > > [290624.834378] lowmem_reserve[]: 0 0 0 0 > > > [290624.848459] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15884kB > > > [290624.895688] Node 0 DMA32: 30635*4kB (UME) 23925*8kB (UME) 7478*16kB (UME) 2420*32kB (UME) 794*64kB (UME) 139*128kB (UME) 10*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 582196kB > > > [290624.947950] Node 0 Normal: 241499*4kB (UME) 161875*8kB (UME) 33424*16kB (UME) 10915*32kB (UME) 2795*64kB (UME) 334*128kB (UME) 15*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3370532kB > > > [290625.003724] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > > > [290625.030463] 4142937 total pagecache pages > > > [290625.043969] 5782 pages in swap cache > > > [290625.056118] Swap cache stats: add 102837, delete 97055, find 627323/635133 > > > [290625.078329] Free swap = 15372324kB > > > [290625.090190] Total swap = 15616764kB > > > [290625.102186] 6215903 pages RAM > > > [290625.112399] 0 pages HighMem/MovableOnly > > > [290625.125747] 117793 pages reserved > > > [290625.137495] 4096 pages cma reserved > > > [290625.149199] 0 pages hwpoisoned > > > [290625.159961] bcache: register_cache() error opening sdl2: cannot allocate memory > > > [290625.183410] BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8 > > > [290625.208648] IP: [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache] > > > [290625.230734] PGD 0 > > > [290625.238446] Oops: 0000 [#1] SMP > > > [290625.249857] Modules linked in: udp_diag tcp_diag inet_diag veth ip6table_filter ip6_tables ebtable_nat ebtables ppdev lp xt_addrtype tun bridge stp llc autofs4 softdog binfmt_misc ftdi_sio nfsd nfs_acl auth_rpcgss nfs fscache lockd grace sunrpc ipt_REJECT nf_reject_ipv4 xt_conntrack xt_mark xt_nat xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG iptable_mangle iptable_filter bcache lm85 hwmon_vid pl2303 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables nf_conntrack_ftp ipt_MASQUERADE x_tables nf_nat_masquerade_ipv4 nf_nat nf_conntrack sg st snd_pcm_oss snd_mixer_oss kvm_intel snd_hda_codec_realtek snd_hda_codec_generic snd_opl3_synth kvm snd_seq_midi_emul snd_hda_intel snd_cmipci gameport snd_hda_codec snd_opl3_lib snd_mpu401_uart snd_seq_midi irqbypass snd_seq_midi_event snd_seq snd_hda_core snd_hwdep snd_rawmidi asix coretemp eeepc_wmi tpm_infineon rc_ati_x10 snd_pcm libphy snd_seq_device asus_wmi snd_timer sparse_keymap rfkill intel_rapl 8250_fintek at i_ > re > > mo > > > te usbnet tpm_tis snd rc_core battery tpm evdev iosf_mbi wmi pcspkr input_leds intel_powerclamp parport_pc processor i2c_i801 xhci_pci x86_pkg_temp_thermal usbserial soundcore parport xhci_hcd lpc_ich e1000e ptp pps_core fuse raid456 multipath mmc_block mmc_core dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_crypt dm_mod async_raid6_recov async_pq async_xor async_memcpy async_tx blowfish_x86_64 blowfish_common ecb xts ansi_cprng drbg aesni_intel ablk_helper lrw gf128mul glue_helper aes_x86_64 crct10dif_pclmul crc32_pclmul crc32c_intel r8169 ehci_pci fjes ehci_hcd thermal mii sata_mv fan cryptd usbcore usb_common sata_sil24 [last unloaded: ftdi_sio] > > > [290625.713109] CPU: 0 PID: 25842 Comm: kworker/0:0 Tainted: G W 4.4.2-amd64-i915-volpreempt-20160214bc3 #5 > > > [290625.747597] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013 > > > [290625.776918] Workqueue: events cache_set_flush [bcache] > > > [290625.794260] task: ffff880467caa0c0 ti: ffff8805af184000 task.ti: ffff8805af184000 > > > [290625.818544] RIP: 0010:[<ffffffffc0516e8d>] [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache] > > > [290625.848082] RSP: 0018:ffff8805af187e10 EFLAGS: 00010202 > > > [290625.866890] RAX: 0000000000000001 RBX: ffff880153a40ce8 RCX: 0000000000000282 > > > [290625.891136] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > [290625.914365] RBP: ffff8805af187e30 R08: 00000000000000fa R09: ffffe8ffffc01900 > > > [290625.937588] R10: ffff880088430838 R11: ffff880088430838 R12: 0000000000000000 > > > [290625.960781] R13: ffff880153a40de8 R14: 0000000000000000 R15: ffffe8ffffc01900 > > > [290625.983951] FS: 0000000000000000(0000) GS:ffff88061e200000(0000) knlGS:0000000000000000 > > > [290626.010918] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [290626.030800] CR2: 00000000000009b8 CR3: 0000000001e0a000 CR4: 00000000001406f0 > > > [290626.053914] Stack: > > > [290626.061687] ffff8805d6183280 ffff88061e216040 ffff880153a40ce8 0000000000000000 > > > [290626.085725] ffff8805af187e78 ffffffff81070fcf ffff8805d61832b0 000000001e216058 > > > [290626.110510] ffff8805d6183280 ffff88061e216040 ffff8805d61832b0 ffff88061e216058 > > > [290626.135329] Call Trace: > > > [290626.145108] [<ffffffff81070fcf>] process_one_work+0x15b/0x260 > > > [290626.164241] [<ffffffff81071574>] worker_thread+0x1f0/0x29d > > > [290626.182561] [<ffffffff81071384>] ? rescuer_thread+0x281/0x281 > > > [290626.201721] [<ffffffff810758e9>] kthread+0xa5/0xad > > > [290626.217966] [<ffffffff81075844>] ? kthread_parkme+0x24/0x24 > > > [290626.236505] [<ffffffff816e1c0f>] ret_from_fork+0x3f/0x70 > > > [290626.255052] [<ffffffff81075844>] ? kthread_parkme+0x24/0x24 > > > [290626.274460] Code: 89 1c c1 4d 8b a4 24 f0 01 00 00 49 81 ec f0 01 00 00 eb b4 45 31 e4 0f b7 83 4c f7 ff ff 4a 8b 94 e3 58 ff ff ff 44 39 e0 76 16 <48> 8b ba b8 09 00 00 48 85 ff 74 05 e8 61 e8 b5 c0 49 ff c4 eb > > > [290626.337443] RIP [<ffffffffc0516e8d>] cache_set_flush+0x102/0x15c [bcache] > > > [290626.359833] RSP <ffff8805af187e10> > > > [290626.372050] CR2: 00000000000009b8 > > > [290626.383646] ---[ end trace f97d88b8bb456800 ]--- > > > [290626.399977] Kernel panic - not syncing: Fatal exception > > > [290626.418169] Kernel Offset: disabled > > > [290626.430278] Rebooting in 20 seconds.. > > > [290646.378903] ACPI MEMORY or I/O RESET_REG. > > > -- > > > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > > > Microsoft is to operating systems .... > > > .... what McDonalds is to gourmet cooking > > > Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html