Applied that single line on top of 5.5.0-rc3 fs/btrfs/compression.c:449:17: error: implicit declaration of function ‘bio_set_bev’; did you mean ‘bio_set_dev’? [-Werror=implicit-function-declaration] If I use bio_set_dev ... CC [M] fs/btrfs/compression.o fs/btrfs/compression.o: warning: objtool: end_compressed_bio_read.cold()+0x11: unreachable instruction LD [M] fs/btrfs/btrfs.o GEN .version ... Despite that, it seems to work, and no crash with the reproducer. On Wed, Dec 11, 2019 at 8:59 AM David Sterba <dsterba@xxxxxxx> wrote: > > On Wed, Dec 11, 2019 at 04:55:53PM +0100, David Sterba wrote: > > On Wed, Dec 11, 2019 at 09:58:45AM -0500, Josef Bacik wrote: > > > On 12/10/19 11:00 PM, Chris Murphy wrote: > > > > Could continue to chat in one application, the desktop environment was > > > > responsive, but no shells worked and I couldn't get to a tty and I > > > > couldn't ssh into remotely. Looks like the journal has everything up > > > > until I pressed and held down the power button. > > > > > > > > > > > > /dev/nvme0n1p7 on / type btrfs > > > > (rw,noatime,seclabel,compress=zstd:1,ssd,space_cache=v2,subvolid=274,subvol=/root) > > > > > > > > dmesg pretty > > > > https://pastebin.com/pvG3ERnd > > > > > > > > dmesg (likely MUA stomped) > > > > [10224.184137] flap.local kernel: perf: interrupt took too long (2522 > > > >> 2500), lowering kernel.perf_event_max_sample_rate to 79000 > > > > [14712.698184] flap.local kernel: perf: interrupt took too long (3153 > > > >> 3152), lowering kernel.perf_event_max_sample_rate to 63000 > > > > [17903.211976] flap.local kernel: Lockdown: systemd-logind: > > > > hibernation is restricted; see man kernel_lockdown.7 > > > > [22877.667177] flap.local kernel: BUG: kernel NULL pointer > > > > dereference, address: 00000000000006c8 > > > > [22877.667182] flap.local kernel: #PF: supervisor read access in kernel mode > > > > [22877.667184] flap.local kernel: #PF: error_code(0x0000) - not-present page > > > > [22877.667187] flap.local kernel: PGD 0 P4D 0 > > > > [22877.667191] flap.local kernel: Oops: 0000 [#1] SMP PTI > > > > [22877.667194] flap.local kernel: CPU: 2 PID: 14747 Comm: kworker/u8:7 > > > > Not tainted 5.5.0-0.rc1.git0.1.fc32.x86_64+debug #1 > > > > [22877.667196] flap.local kernel: Hardware name: HP HP Spectre > > > > Notebook/81A0, BIOS F.43 04/16/2019 > > > > [22877.667226] flap.local kernel: Workqueue: btrfs-delalloc > > > > btrfs_work_helper [btrfs] > > > > [22877.667233] flap.local kernel: RIP: > > > > 0010:bio_associate_blkg_from_css+0x1c/0x3b0 > > > > > > This looks like the extent_map bdev cleanup thing that was supposed to be fixed, > > > did you send the patch without the fix for it Dave? Thanks, > > > > The fix for NULL bdev was added in 429aebc0a9a063667dba21 (and tested > > with cgroups v2) and it's in a different function than the one that > > appears on the stacktrace. > > > > This seems to be another instance where the bdev is needed right after > > the bio is created but way earlier than it's actually known for real, > > yet still needed for the blkcg thing. > > > > 443 bio = btrfs_bio_alloc(first_byte); > > 444 bio->bi_opf = REQ_OP_WRITE | write_flags; > > 445 bio->bi_private = cb; > > 446 bio->bi_end_io = end_compressed_bio_write; > > 447 > > 448 if (blkcg_css) { > > 449 bio->bi_opf |= REQ_CGROUP_PUNT; > > 450 bio_associate_blkg_from_css(bio, blkcg_css); > > 451 } > > > > Strange that it takes so long to reproduce, meaning the 'if' branch is > > not taken often. > > Compile tested only: > > --- a/fs/btrfs/compression.c > +++ b/fs/btrfs/compression.c > @@ -446,6 +446,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, > bio->bi_end_io = end_compressed_bio_write; > > if (blkcg_css) { > + bio_set_bev(bio, fs_info->fs_devices->latest_bdev); > bio->bi_opf |= REQ_CGROUP_PUNT; > bio_associate_blkg_from_css(bio, blkcg_css); > } > -- Chris Murphy