On 27/11/2019 16:43, Christoph Hellwig wrote: > On Fri, Nov 08, 2019 at 08:36:31AM +0100, Arkadiusz Miśkiewicz wrote: >>> The WARN_ON means that conversion of delalloc blocks failed to find >>> free space. Something that should not be possible due to the delalloc >>> reservations. What as the last kernel where you did not see something >>> like this? >>> >> >> 4.20.13 looks ok >> 5.1.15 bad (that warn_on triggered) >> >> on both machines according to my old logs > > Sorry for dropping the ball. We have some pretty significant changes > between those version as 5.3 has a pretty big rewrite of the high level > writeback code, and this happens in code called from that. Hm, 5.3 but I saw this on 5.1.15, too. See below. (or did you mean 5.1 was with big changes?) For now I'm on 4.19.XX for over 2 weeks without this issue, on both servers. > Oct 29 19:04:47 backup3 kernel: [1573435.718506][ T6332] WARNING: CPU: 2 PID: 6332 at fs/xfs/libxfs/xfs_bmap.c:4513 xfs_bmapi_convert_delalloc+0x44d/0x4b0 [xfs] > Oct 29 19:04:47 backup3 kernel: [1573435.718507][ T6332] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_sfq bonding nf_log_ipv4 nf_log_common xt_LOG xt_comment xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables bpfilter ext4 crc16 mbcache jbd2 input_leds joydev hid_generic usbhid hid coretemp intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel raid456 kvm async_raid6_recov irqbypass async_memcpy igb async_pq async_xor async_tx xor cmdlinepart intel_spi_platform intel_spi spi_nor mtd iTCO_wdt iTCO_vendor_support evdev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid1 aesni_intel aes_x86_64 crypto_simd cryptd raid6_pq glue_helper hwmon dca intel_cstate pcspkr i2c_algo_bit intel_uncore i2c_i801 intel_pch_thermal intel_rapl_perf lpc_ich ipmi_ssif i2c_core xhci_pci xhci_hcd ses enclosure ehci_pci ehci_hcd usbcore ipmi_si ipmi_devintf ipmi_msghandler fan thermal button video ie31200_edac pcc_cpufreq sch_fq_codel xfs libcrc32c crc32c_generic > Oct 29 19:04:47 backup3 kernel: [1573435.718530][ T6332] crc32c_intel sd_mod mpt3sas raid_class scsi_transport_sas raid10 md_mod ahci libahci libata scsi_mod > Oct 29 19:04:47 backup3 kernel: [1573435.718536][ T6332] CPU: 2 PID: 6332 Comm: kworker/u8:9 Tainted: G T 5.1.15-1 #1 > Oct 29 19:04:47 backup3 kernel: [1573435.718537][ T6332] Hardware name: Supermicro X10SL7-F/X10SL7-F, BIOS 2.00 04/24/2014 > Oct 29 19:04:47 backup3 kernel: [1573435.718540][ T6332] Workqueue: writeback wb_workfn (flush-9:5) > Oct 29 19:04:47 backup3 kernel: [1573435.718555][ T6332] RIP: 0010:xfs_bmapi_convert_delalloc+0x44d/0x4b0 [xfs] > Oct 29 19:04:47 backup3 kernel: [1573435.718556][ T6332] Code: 8b 43 38 48 89 44 24 10 e9 3e fc ff ff 80 bb e1 00 00 00 03 48 8d 43 48 74 ce c7 84 24 c8 00 00 00 01 00 00 00 e9 85 fd ff ff <0f> 0b 41 bf e4 ff ff ff 4c 89 e7 44 89 fa 44 89 f6 e8 2d 71 ff ff > Oct 29 19:04:47 backup3 kernel: [1573435.718556][ T6332] RSP: 0018:ffffb8eb4384b858 EFLAGS: 00010246 > Oct 29 19:04:47 backup3 kernel: [1573435.718557][ T6332] RAX: 0000000000000000 RBX: ffff8d81fa9261c0 RCX: 0000000000000000 > Oct 29 19:04:47 backup3 kernel: [1573435.718558][ T6332] RDX: 0000000000001f8e RSI: ffffffffffffffff RDI: ffff8d83e8d66600 > Oct 29 19:04:47 backup3 kernel: [1573435.718558][ T6332] RBP: ffffb8eb4384b960 R08: 0000000000000000 R09: 000000000001bd06 > Oct 29 19:04:47 backup3 kernel: [1573435.718558][ T6332] R10: ffff8d7f5fafaaa8 R11: 0000000000000001 R12: ffffb8eb4384b890 > Oct 29 19:04:47 backup3 kernel: [1573435.718559][ T6332] R13: ffff8d83dfe8e000 R14: 0000000000000000 R15: 0000000000000000 > Oct 29 19:04:47 backup3 kernel: [1573435.718560][ T6332] FS: 0000000000000000(0000) GS:ffff8d841fd00000(0000) knlGS:0000000000000000 > Oct 29 19:04:47 backup3 kernel: [1573435.718560][ T6332] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Oct 29 19:04:47 backup3 kernel: [1573435.718561][ T6332] CR2: 00000000021b3fc0 CR3: 000000061020e005 CR4: 00000000001606e0 > Oct 29 19:04:47 backup3 kernel: [1573435.718561][ T6332] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Oct 29 19:04:47 backup3 kernel: [1573435.718562][ T6332] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Oct 29 19:04:47 backup3 kernel: [1573435.718562][ T6332] Call Trace: > Oct 29 19:04:47 backup3 kernel: [1573435.718582][ T6332] xfs_map_blocks+0x1ae/0x450 [xfs] > Oct 29 19:04:47 backup3 kernel: [1573435.718599][ T6332] xfs_do_writepage+0x114/0x420 [xfs] > Oct 29 19:04:47 backup3 kernel: [1573435.718603][ T6332] write_cache_pages+0x1bd/0x460 > Oct 29 19:04:47 backup3 kernel: [1573435.718620][ T6332] ? xfs_vm_writepages+0x90/0x90 [xfs] > Oct 29 19:04:47 backup3 kernel: [1573435.718637][ T6332] xfs_vm_writepages+0x5b/0x90 [xfs] > Oct 29 19:04:47 backup3 kernel: [1573435.718639][ T6332] do_writepages+0x1c/0x60 > Oct 29 19:04:47 backup3 kernel: [1573435.718641][ T6332] __writeback_single_inode+0x41/0x340 > Oct 29 19:04:47 backup3 kernel: [1573435.718644][ T6332] ? __wake_up_common_lock+0x88/0xc0 > Oct 29 19:04:47 backup3 kernel: [1573435.718645][ T6332] writeback_sb_inodes+0x1dd/0x460 > Oct 29 19:04:47 backup3 kernel: [1573435.718646][ T6332] wb_writeback+0x12a/0x310 > Oct 29 19:04:47 backup3 kernel: [1573435.718648][ T6332] wb_workfn+0xde/0x460 > Oct 29 19:04:47 backup3 kernel: [1573435.718650][ T6332] process_one_work+0x1fb/0x3f0 > Oct 29 19:04:47 backup3 kernel: [1573435.718651][ T6332] worker_thread+0x246/0x3d0 > Oct 29 19:04:47 backup3 kernel: [1573435.718652][ T6332] ? process_one_work+0x3f0/0x3f0 > Oct 29 19:04:47 backup3 kernel: [1573435.718654][ T6332] kthread+0x113/0x130 > Oct 29 19:04:47 backup3 kernel: [1573435.718655][ T6332] ? kthread_park+0x80/0x80 > Oct 29 19:04:47 backup3 kernel: [1573435.718658][ T6332] ret_from_fork+0x35/0x40 > Oct 29 19:04:47 backup3 kernel: [1573435.718659][ T6332] ---[ end trace 37cbd88cae263afa ]--- > Oct 29 19:04:47 backup3 kernel: [1573435.718663][ T6332] XFS (md5): page discard on page 000000007bcb59fa, inode 0x101f3041f, offset 234819584. > Oct 29 19:04:47 backup3 kernel: [1573435.718685][ T6332] XFS (md5): page discard on page 00000000b1b6fee1, inode 0x101f3041f, offset 234823680. > Oct 29 19:04:47 backup3 kernel: [1573435.718703][ T6332] XFS (md5): page discard on page 00000000f7525550, inode 0x101f3041f, offset 234827776. > But it > doesn't actually change (or should change) how we use the reservations, > which leaves me a little puzzled. If I gave you an instrumented kernel > with new trace points, could you provide traces based on that? Probably it will be easier to just bisect and I plan to do that after backup catches up with missing data (and it takes days). -- Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )