On 12/13/22 8:29 AM, Petr Mladek wrote: > On Tue 2022-12-13 00:13:46, Song Liu wrote: >> )() ()On Mon, Dec 12, 2022 at 9:12 AM Petr Mladek <pmladek@xxxxxxxx> wrote: >>> >>> On Fri 2022-12-09 11:59:35, Song Liu wrote: >>>> On Fri, Dec 9, 2022 at 3:41 AM Petr Mladek <pmladek@xxxxxxxx> wrote: >>>>> On Mon 2022-11-28 17:57:06, Song Liu wrote: >>>>>> On Fri, Nov 18, 2022 at 8:24 AM Petr Mladek <pmladek@xxxxxxxx> wrote: >>>>>>> >>>>>>>> --- a/arch/powerpc/kernel/module_64.c >>>>>>>> +++ b/arch/powerpc/kernel/module_64.c >>>>>>>> +#ifdef CONFIG_LIVEPATCH >>>>>>>> +void clear_relocate_add(Elf64_Shdr *sechdrs, >>>>>>>> + const char *strtab, >>>>>>>> + unsigned int symindex, >>>>>>>> + unsigned int relsec, >>>>>>>> + struct module *me) >>>>>>>> +{ >>> >>> [...] >>> >>>>>>>> + >>>>>>>> + instruction = (u32 *)location; >>>>>>>> + if (is_mprofile_ftrace_call(symname)) >>>>>>>> + continue; >>>>> >>>>> Why do we ignore these symbols? >>>>> >>>>> I can't find any counter-part in apply_relocate_add(). It looks super >>>>> tricky. It would deserve a comment. >>>>> >>>>> And I have no idea how we could maintain these exceptions. >>>>> >>>>>>>> + if (!instr_is_relative_link_branch(ppc_inst(*instruction))) >>>>>>>> + continue; >>>>> >>>>> Same here. It looks super tricky and there is no explanation. >>>> >>>> The two checks are from restore_r2(). But I cannot really remember >>>> why we needed them. It is probably an updated version from an earlier >>>> version (3 year earlier..). >>> >>> This is a good sign that it has to be explained in a comment. >>> Or even better, it should not by copy pasted. >>> >>>>>>>> + instruction += 1; >>>>>>>> + patch_instruction(instruction, ppc_inst(PPC_RAW_NOP())); >>> >>> I believe that this is not enough. apply_relocate_add() does this: >>> >>> int apply_relocate_add(Elf64_Shdr *sechdrs, >>> [...] >>> struct module *me) >>> { >>> [...] >>> case R_PPC_REL24: >>> /* FIXME: Handle weak symbols here --RR */ >>> if (sym->st_shndx == SHN_UNDEF || >>> sym->st_shndx == SHN_LIVEPATCH) { >>> [...] >>> if (!restore_r2(strtab + sym->st_name, >>> (u32 *)location + 1, me)) >>> [...] return -ENOEXEC; >>> >>> ---> if (patch_instruction((u32 *)location, ppc_inst(value))) >>> return -EFAULT; >>> >>> , where restore_r2() does: >>> >>> static int restore_r2(const char *name, u32 *instruction, struct module *me) >>> { >>> [...] >>> /* ld r2,R2_STACK_OFFSET(r1) */ >>> ---> if (patch_instruction(instruction, ppc_inst(PPC_INST_LD_TOC))) >>> return 0; >>> [...] >>> } >>> >>> By other words, apply_relocate_add() modifies two instructions: >>> >>> + patch_instruction() called in restore_r2() writes into "location + 1" >>> + patch_instruction() called in apply_relocate_add() writes into "location" >>> >>> IMHO, we have to clear both. >>> >>> IMHO, we need to implement a function that reverts the changes done >>> in restore_r2(). Also we need to revert the changes done in >>> apply_relocate_add(). >> >> I finally got time to read all the details again and recalled what >> happened with the code. >> >> The failure happens when we >> 1) call apply_relocate_add() on klp load (or module first load, >> if klp was loaded first); >> 2) do nothing when the module is unloaded; >> 3) call apply_relocate_add() on module reload, which failed. >> >> The failure happens at this check in restore_r2(): >> >> if (*instruction != PPC_RAW_NOP()) { >> pr_err("%s: Expected nop after call, got %08x at %pS\n", >> me->name, *instruction, instruction); >> return 0; >> } >> >> Therefore, apply_relocate_add only fails when "location + 1" >> is not NOP. And to make it not fail, we only need to write NOP to >> "location + 1" in clear_relocate_add(). > > Yes, this should be enough to pass the existing check. > >> IIUC, you want clear_relocate_add() to undo everything we did >> in apply_relocate_add(); while I was writing clear_relocate_add() >> to make the next apply_relocate_add() not fail. >> >> I agree that, based on the name, clear_relocate_add() should >> undo everything by apply_relocate_add(). But I am not sure how >> to handle some cases. For example, how do we undo >> >> case R_PPC64_ADDR32: >> /* Simply set it */ >> *(u32 *)location = value; >> break; >> >> Shall we just write zeros? I don't think this matters. > > I guess that it would be zeros as we do in x86_64. > > >> I think this is the question we should answer first: >> What shall clear_relocate_add() do? >> 1) undo everything by apply_relocate_add(); >> 2) only do things needed to make the next >> apply_relocate_add succeed; >> 3) something between 1) and 2). > > Good question. > > Hmm, the commit a443bf6e8a7674b86221f49 ("powerpc/modules: Add REL24 > relocation support of livepatch symbols") suggests that all symbols > in the section SHN_LIVEPATCH have the type R_PPC_REL24. AFAIK, the > kernel livepatches are the only user of the clear_relocate_add() > feature. > > If the above is correct then it might be enough to clear only > R_PPC_REL24 type. And it might be enough to warn when clear_relocate_add() > is called for another type so that we know when the relocations > were not cleared properly. > > Good question. We might need some input from people familiar > with the architecture and creating the livepatches. > Adding Russell to the to CC list as he worked some of recent ppc64le livepatch klp-relocation threads [1] [2]. Maybe it would simpler to first organize a cleanup of the code, then add the capability to undo the relocations? According to [2] and the last comment on [3], it sounded like the Power folks had a "full"(er) solution in mind depending on our requirements. Finally, I'll try to finish my v6.1 rebase of the klp-convert patchset this week. That includes a bunch of kselftests that generate all manner of klp-relocation types and sections. (More than I've ever seen out of kpatch-build.) [1] https://lore.kernel.org/linuxppc-dev/YX9UUBeudSUuJs01@xxxxxxxxxx/ [2] https://lore.kernel.org/linuxppc-dev/YxAc87dTmclHGCUy@xxxxxxxxxx/ [3] https://github.com/linuxppc/issues/issues/375 -- Joe