On Fri, 2025-02-28 at 16:44 +0000, Jon Hunter wrote: > Hi James, > > On 07/01/2025 21:31, James Bottomley wrote: > > Hibernation allows other OSs to boot and thus the variable state > > might > > be altered by the time the hibernation image is resumed. Resync > > the > > variable state by looping over all the dentries and update the size > > (in case of alteration) delete any which no-longer exist. Finally, > > loop over all efi variables creating any which don't have > > corresponding dentries. > > > > Signed-off-by: James Bottomley > > <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> > > --- > > fs/efivarfs/internal.h | 3 +- > > fs/efivarfs/super.c | 151 > > ++++++++++++++++++++++++++++++++++++++++- > > fs/efivarfs/vars.c | 5 +- > > 3 files changed, 155 insertions(+), 4 deletions(-) > > ... > > > +static int efivarfs_pm_notify(struct notifier_block *nb, unsigned > > long action, > > + void *ptr) > > +{ > > + struct efivarfs_fs_info *sfi = container_of(nb, struct > > efivarfs_fs_info, > > + pm_nb); > > + struct path path = { .mnt = NULL, .dentry = sfi->sb- > > >s_root, }; > > + struct efivarfs_ctx ectx = { > > + .ctx = { > > + .actor = efivarfs_actor, > > + }, > > + .sb = sfi->sb, > > + }; > > + struct file *file; > > + static bool rescan_done = true; > > + > > + if (action == PM_HIBERNATION_PREPARE) { > > + rescan_done = false; > > + return NOTIFY_OK; > > + } else if (action != PM_POST_HIBERNATION) { > > + return NOTIFY_DONE; > > + } > > + > > + if (rescan_done) > > + return NOTIFY_DONE; > > + > > + pr_info("efivarfs: resyncing variable state\n"); > > + > > + /* O_NOATIME is required to prevent oops on NULL mnt */ > > + file = kernel_file_open(&path, O_RDONLY | O_DIRECTORY | > > O_NOATIME, > > + current_cred()); > > + if (!file) > > + return NOTIFY_DONE; > > + > > + rescan_done = true; > > + > > + /* > > + * First loop over the directory and verify each entry > > exists, > > + * removing it if it doesn't > > + */ > > + file->f_pos = 2; /* skip . and .. */ > > + do { > > + ectx.dentry = NULL; > > + iterate_dir(file, &ectx.ctx); > > + if (ectx.dentry) { > > + pr_info("efivarfs: removing variable > > %pd\n", > > + ectx.dentry); > > + simple_recursive_removal(ectx.dentry, > > NULL); > > + dput(ectx.dentry); > > + } > > + } while (ectx.dentry); > > + fput(file); > > + > > + /* > > + * then loop over variables, creating them if there's no > > matching > > + * dentry > > + */ > > + efivar_init(efivarfs_check_missing, sfi->sb, false); > > + > > + return NOTIFY_OK; > > +} > > > With the current mainline I have observed the following crash when > testing suspend on one of our Tegra devices ... > > rtcwake: wakeup from "mem" using /dev/rtc0 at Fri Feb 28 16:25:55 > 2025 > [ 246.593485] Unable to handle kernel NULL pointer dereference at > virtual address 0000000000000068 > [ 246.602601] Mem abort info: > [ 246.602603] ESR = 0x0000000096000004 > [ 246.602605] EC = 0x25: DABT (current EL), IL = 32 bits > [ 246.602608] SET = 0, FnV = 0 > [ 246.602610] EA = 0, S1PTW = 0 > [ 246.602612] FSC = 0x04: level 0 translation fault > [ 246.602615] Data abort info: > [ 246.602617] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 > [ 246.634959] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > [ 246.634961] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > [ 246.634964] user pgtable: 4k pages, 48-bit VAs, > pgdp=0000000105205000 > [ 246.634967] [0000000000000068] pgd=0000000000000000, > p4d=0000000000000000 > [ 246.634974] Internal error: Oops: 0000000096000004 [#1] PREEMPT > SMP > [ 246.665796] Modules linked in: qrtr bridge stp llc usb_f_ncm > usb_f_mass_storage usb_f_acm u_serial usb_f_rndis u_ether > libcomposite tegra_drm btusb btrtl drm_dp_aux_bus btintel cec nvme > btmtk nvme_core btbcm drm_display_helper bluetoot > h snd_soc_tegra210_admaif drm_client_lib snd_soc_tegra210_dmic > snd_soc_tegra186_asrc snd_soc_tegra210_mixer snd_soc_tegra_pcm > snd_soc_tegra210_mvc snd_soc_tegra210_ope snd_soc_tegra210_adx > snd_soc_tegra210_amx snd_soc_tegra210_sfc snd_soc > _tegra210_i2s drm_kms_helper ecdh_generic tegra_se ucsi_ccg ecc > snd_hda_codec_hdmi typec_ucsi snd_soc_tegra_audio_graph_card > snd_soc_tegra210_ahub rfkill tegra210_adma snd_hda_tegra > crypto_engine snd_soc_audio_graph_card typec snd_hda_cod > ec snd_soc_simple_card_utils tegra_aconnect arm_dsu_pmu > snd_soc_rt5640 snd_hda_core tegra_xudc ramoops phy_tegra194_p2u > snd_soc_rl6231 at24 pcie_tegra194 tegra_bpmp_thermal host1x > reed_solomon lm90 pwm_tegra ina3221 pwm_fan fuse drm backl > ight dm_mod ip_tables x_tables ipv6 > [ 246.756182] CPU: 9 UID: 0 PID: 1255 Comm: rtcwake Not tainted > 6.14.0-rc4-g8d538b296d56 #61 > [ 246.764677] Hardware name: NVIDIA NVIDIA Jetson AGX Orin Developer > Kit/Jetson, BIOS 00.0.0-dev-main_88214_5a0f5_a213e 02/26/2025 > [ 246.776569] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS > BTYPE=--) > [ 246.783718] pc : efivarfs_pm_notify+0x48/0x180 > [ 246.788285] lr : blocking_notifier_call_chain_robust+0x78/0xe4 > [ 246.794286] sp : ffff800085cebb60 > [ 246.797684] x29: ffff800085cebb60 x28: ffff000093f1b480 x27: > 0000000000000000 > [ 246.805021] x26: 0000000000000004 x25: ffff8000828d3638 x24: > 0000000000000003 > [ 246.812355] x23: 0000000000000000 x22: 0000000000000005 x21: > 0000000000000006 > [ 246.819694] x20: ffff000087f698c0 x19: 0000000000000003 x18: > 000000007f19bcda > [ 246.827029] x17: 00000000c42545a5 x16: 000000000000001c x15: > 000000001709e0a9 > [ 246.834372] x14: 00000000ac6b3a37 x13: 18286bf36c021b08 x12: > 00000039694cf81c > [ 246.841713] x11: 00000000f1e0faad x10: 0000000000000001 x9 : > 000000004ff99d57 > [ 246.849046] x8 : 00000000bb51f9d6 x7 : 00000001f4d4185c x6 : > 00000001f4d4185c > [ 246.856382] x5 : ffff800085cebb58 x4 : ffff800080952930 x3 : > ffff8000804cba44 > [ 246.863713] x2 : 0000000000000000 x1 : 0000000000000003 x0 : > ffff8000804cbf84 > [ 246.871045] Call trace: > [ 246.873550] efivarfs_pm_notify+0x48/0x180 (P) > [ 246.878119] blocking_notifier_call_chain_robust+0x78/0xe4 > [ 246.883753] pm_notifier_call_chain_robust+0x28/0x48 > [ 246.888852] pm_suspend+0x138/0x1c8 > [ 246.892438] state_store+0x8c/0xfc > [ 246.895931] kobj_attr_store+0x18/0x2c > [ 246.899791] sysfs_kf_write+0x44/0x54 > [ 246.903553] kernfs_fop_write_iter+0x118/0x1a8 > [ 246.908113] vfs_write+0x2b0/0x35c > [ 246.911608] ksys_write+0x68/0xfc > [ 246.915013] __arm64_sys_write+0x1c/0x28 > [ 246.919038] invoke_syscall+0x48/0x110 > [ 246.922897] el0_svc_common.constprop.0+0x40/0xe8 > [ 246.927731] do_el0_svc+0x20/0x2c > [ 246.931127] el0_svc+0x30/0xd0 > [ 246.934265] el0t_64_sync_handler+0x144/0x168 > [ 246.938737] el0t_64_sync+0x198/0x19c > [ 246.942505] Code: f9400682 f90027ff a906ffe2 f100043f (f9403440) > [ 246.948767] ---[ end trace 0000000000000000 ]--- > > > Bisect is pointing to this commit. I had a quick look at this and the > following fixes it for me ... Yes, this occurs because the notifier can be triggered before the superblock information is filled. Ard fixed this: https://web.git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git/commit/?h=urgent&id=cb6ae457bc6af58c84a7854df5e7e32ba1c6a715 Regards, James