Hi Ted, Zhang, On 30/06/2021 21:49, Theodore Ts'o wrote: > The following changes since commit 614124bea77e452aa6df7a8714e8bc820b489922: > > Linux 5.13-rc5 (2021-06-06 15:47:27 -0700) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tags/ext4_for_linus > > for you to fetch changes up to 16aa4c9a1fbe763c147a964cdc1f5be8ed98ed13: > > jbd2: export jbd2_journal_[un]register_shrinker() (2021-06-30 11:05:00 -0400) > > ---------------------------------------------------------------- > In addition to bug fixes and cleanups, there are two new features for > ext4 in 5.14: > - Allow applications to poll on changes to /sys/fs/ext4/*/errors_count > - Add the ioctl EXT4_IOC_CHECKPOINT which allows the journal to be > checkpointed, truncated and discarded or zero'ed. > > ---------------------------------------------------------------- ... > Zhang Yi (12): > ext4: cleanup in-core orphan list if ext4_truncate() failed to get a transaction handle > ext4: remove check for zero nr_to_scan in ext4_es_scan() > ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit > jbd2: remove the out label in __jbd2_journal_remove_checkpoint() > jbd2: ensure abort the journal if detect IO error when writing original buffer back > jbd2: don't abort the journal when freeing buffers > jbd2: remove redundant buffer io error checks > jbd2,ext4: add a shrinker to release checkpointed buffers I have noticed that with next-20210701 that one of our eMMC tests started failing on all our ARM and ARM64 platforms and bisect is pointing to commit 4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers"). Today I am seeing the same failure on the mainline. Looking at the kernel logs I see the following crash ... [ 74.430365] Unable to handle kernel paging request at virtual address ffff8001e353a000 [ 74.438304] Mem abort info: [ 74.441110] ESR = 0x96000005 [ 74.444226] EC = 0x25: DABT (current EL), IL = 32 bits [ 74.449548] SET = 0, FnV = 0 [ 74.452595] EA = 0, S1PTW = 0 [ 74.455740] FSC = 0x05: level 1 translation fault [ 74.460620] Data abort info: [ 74.463504] ISV = 0, ISS = 0x00000005 [ 74.467343] CM = 0, WnR = 0 [ 74.470314] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081adc000 [ 74.477013] [ffff8001e353a000] pgd=10000002771ff803, p4d=10000002771ff803, pud=0000000000000000 [ 74.485718] Internal error: Oops: 96000005 [#1] PREEMPT SMP [ 74.491284] Modules linked in: tegra_drm snd_soc_tegra186_dspk cec snd_soc_tegra210_dmic snd_soc_tegra210_admaif snd_soc_tegra_pcm snd_soc_tegra210_i2s drm_kms_helper drm snd_soc_tegra210_ahub tegra210_adma crct10dif_ce snd_hda_codec_hdmi snd_soc_tegra_audio_graph_card snd_soc_audio_graph_card snd_hda_tegra snd_soc_simple_card_utils snd_hda_codec at24 tegra_bpmp_thermal snd_hda_core tegra_aconnect tegra_xudc ina3221 host1x ip_tables x_tables ipv6 [ 74.530804] CPU: 0 PID: 936 Comm: umount Tainted: G S 5.13.0-next-20210701-gfb0ca446157a #1 [ 74.540446] Hardware name: NVIDIA Jetson TX2 Developer Kit (DT) [ 74.546354] pstate: a0000005 (NzCv daif -PAN -UAO -TCO BTYPE=--) [ 74.552354] pc : percpu_counter_add_batch+0x30/0x118 [ 74.557317] lr : __jbd2_journal_remove_checkpoint+0x70/0x170 [ 74.562972] sp : ffff800013923b90 [ 74.566278] x29: ffff800013923b90 x28: ffff000080ba8d80 x27: 0000000000000000 [ 74.573408] x26: 0000000000000001 x25: 0000000000000006 x24: ffff000080ba8d80 [ 74.580536] x23: ffff00008965a450 x22: ffff800011ce9000 x21: ffff00008965a380 [ 74.587665] x20: ffffffffffffffff x19: ffff00008a9d8000 x18: 0000000000000011 [ 74.594792] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000038d [ 74.601921] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 74.609048] x11: 0000000000000001 x10: 0000000000000960 x9 : ffff800013923b90 [ 74.616175] x8 : ffff000080ba9740 x7 : 0000000000000400 x6 : ffff00008965a0b0 [ 74.623304] x5 : ffff00008965a0b0 x4 : ffff8001e353a000 x3 : ffff000080ba8d80 [ 74.630430] x2 : 0000000000000020 x1 : 0000000000000000 x0 : ffff00008965a380 [ 74.637558] Call trace: [ 74.640000] percpu_counter_add_batch+0x30/0x118 [ 74.644610] __jbd2_journal_remove_checkpoint+0x70/0x170 [ 74.649914] jbd2_log_do_checkpoint+0xa8/0x398 [ 74.654351] jbd2_journal_destroy+0x100/0x2a8 [ 74.658703] ext4_put_super+0x7c/0x388 [ 74.662449] generic_shutdown_super+0x70/0xf8 [ 74.666802] kill_block_super+0x1c/0x60 [ 74.670633] deactivate_locked_super+0x6c/0x98 [ 74.675071] deactivate_super+0x84/0x90 [ 74.678901] cleanup_mnt+0x8c/0x110 [ 74.682385] __cleanup_mnt+0x10/0x18 [ 74.685953] task_work_run+0x78/0x150 [ 74.689612] do_notify_resume+0x31c/0x498 [ 74.693618] work_pending+0xc/0x328 [ 74.697103] Code: 11000484 b9000864 d538d084 f9401001 (b8a46833) [ 74.703186] ---[ end trace e18485293afc06e4 ]--- Is this causing problems for anyone else? Thanks Jon -- nvpublic