On Mon, May 13, 2019 at 12:43:18PM +0200, Petr Mladek wrote: > On Tue 2019-05-07 16:24:25, Josh Poimboeuf wrote: > > On Tue, May 07, 2019 at 04:28:47PM +0200, Petr Mladek wrote: > > > > > > > Also this check is effectively the same as the klp_have_reliable_stack() > > > > > > > check which is done in kernel/livepatch/core.c. So I think it would be > > > > > > > clearer and more consistent if the same check is done here: > > > > > > > > > > > > > > if (!klp_have_reliable_stack()) > > > > > > > return -ENOSYS; > > > > > > > > > > Huh, it smells with over engineering to me. > > > > > > > > How so? It makes the code more readable and the generated code should > > > > be much better because it becomes a build-time check. > > > > > > save_stack_trace_tsk_reliable() returns various error codes. > > > We catch a specific one because otherwise the message below > > > might be misleading. > > > > > > I do not see why we should prevent this error by calling > > > a custom hack: klp_have_reliable_stack()? > > > > I wouldn't call it a hack. It's a simple build-time check. > > > > > Regarding reliability. If anyone changes semantic of > > > save_stack_trace_tsk_reliable() error codes, they would likely > > > check if all users (one at the moment) handle it correctly. > > > > > > On the other hand, the dependency between the -ENOSYS > > > return value and klp_have_reliable_stack() is far from > > > obvious. > > > > I don't follow your point. > > We implement klp_have_reliable_stack() in livepatch subsystem. > It checks config options that defines behavior of the > stacktrace subsystem. > > We use the above livepatch-specific function to warn about > that a function from stacktrace subsustem will not work. > You even suggest to use it to ignore result from > the stacktrace subsystem. > > OK, using klp_have_reliable_stack() on both locations > would keep it consistent. > > My point is that the check itself is not reliable because > it is "hard" to maintain. I don't see how it would be "hard" to maintain. If an arch implements reliable stack tracing, but forgets to set HAVE_RELIABLE_STACKTRACE then they will notice the warning immediately when they try to livepatch. > Instead, I suggest to remove klp_have_reliable_stack() and use > the following in klp_enable_patch(). > > > if (stack_trace_save_tsk_reliable(current, NULL, 0) == -ENOSYS) { > pr_warn("This architecture doesn't have support for the livepatch consistency model.\n"); > pr_warn("The livepatch transition may never complete.\n"); > } It seems confusing to call that function where it isn't needed, since klp_enable_patch() isn't doing stack tracing yet at that location. Calling klp_have_reliable_stack() is more readable, and is exactly the check we want to make. > Also I suggest to remove the check in klp_check_stack() completely. > We will always print that the stack is not reliable but only when > the debug message is enabled. It is slightly misleading > message for -ENOSYS. But I doubt that it could cause much > troubles in reality. This situation should be really rare > and easy to debug. If you want to remove the specific check in klp_check_stack(), that's fine with me. I slightly prefer just reverting Kamalesh's commit, but I don't have a strong feeling either way. > > > If we want to discuss and fix this to the death. We should change > > > the return value from -ENOSYS to -EOPNOTSUPP. The reason > > > is the same as in the commit 375bfca3459db1c5596 > > > ("livepatch: core: Return EOPNOTSUPP instead of ENOSYS"). > > > > > > Note that EOPNOTSUPP is the same errno as ENOTSUP, see > > > man 3 errno. > > > > Sure, but that's a separate issue. > > I just wanted to show that we might spend even more time with > making this code briliant. > > > > > > But I think Miroslav's suggestion to revert 1d98a69e5cef would be even > > > > better. > > > > > > AFAIK, Miroslav wanted to point out that your opinion was inconsistent. > > > > How is my opinion inconsistent? I don't guarantee that I will always be consistent with my past self. My thinking may evolve (or devolve?) over time. And there's nothing wrong with that. But none of the below is actually inconsistent. > 1. You have already acked the removal of WARN_ON() in > klp_have_reliable_stack(), > see https://lkml.kernel.org/r/20190424155154.h62wc3nt7ib2phdy@treble > > You even suggested it, see > https://lkml.kernel.org/r/20190418135858.n3lzq2oxkj52m6bi@treble > > But you suggest to put it back now. Maybe I wasn't clear. If we're not planning on calling the weak version of the function, then the warning makes sense. > 2. You suggested to remove the warning when klp_check_stack() because > it was superfluous. There was a discussion about keeping > the check for -ENOSYS there and you did not react, see > https://lkml.kernel.org/r/20190424155532.3uyxyxwm4c5dqsf5@treble But then (I thought) Miroslav suggested reverting 1d98a69e5cef, which is a better idea. > You even acked the commit 1d98a69e5cef3aeb68bcef ("livepatch: > Remove reliable stacktrace check in klp_try_switch_task()"). > > And you suddenly want to revert it. The circumstances have changed since that commit: now we are going back to allowing non-reliable arches to load livepatches. > > Is there something wrong with the original approach, which was reverted > > with > > > > 1d98a69e5cef ("livepatch: Remove reliable stacktrace check in klp_try_switch_task()") > > > > ? > > > > As I mentioned, that has some advantages: > > > > - The generated code is simpler/faster because it uses a build-time > > check. > > > > - The code is more readable IMO. Especially if the check is done higher > > up the call stack by reverting 1d98a69e5cef. That way the arch > > support short-circuiting is done in the logical place, before doing > > any more unnecessary work. It's faster, but also, more importantly, > > it's less surprising. > > > > - It's also more consistent with other code, since the intent of this > > check is the same as the klp_have_reliable_stack() use in > > klp_enabled_patch(). > > > > If you disagree with those, please explain why. > > As I said. I think that it is less reliable because we check config > options of an unrelated subsystem. There's no harm in checking the config option of another subsystem. Livepatch very much relies on stacktrace. > Also I think that it is overengineered. > save_stack_trace_tsk_reliable() is able to tell when it failed. > This particular failure is superfast. IMHO, it is not worth such > an optimization. Sure, performance isn't a concern, but code simplicity, readability, and consistency certainly are. > In fact, it is a compile time check as well because the inline > from include/linux/stacktrace.h > > > > > PS: This is my last mail in the thread this week. I will eventually > > > return to it with a clear head next week. It is all nitpicking from > > > my POV and I have better things to do. > > > > I don't think it's helpful to characterize it as nitpicking. The > > details of the code are important. > > 1. You had issues with almost all my printk() messages, comments, > and commit messages. I know that my English is not perfect. > And that you might want to highlight another information. > But was is it really that bad? It's just code review. > 2. This entire patchset is about adding and removing messages > and checks. We have 3rd version and you are still not happy. > Not to say that you suggest something else each time. That's how review works. If the code improves with each iteration then how is that a problem? > Frankly, I do mind too much about this code path, which and how > many warnings are printed. I am not aware about any complains. > IMHO, only people adding support for new architecture > might go this way. > > I just wanted to switch the error to warning because Thomas > Gleixner wondered about unused code on s390 when refactoring > the stacktrace code. > > I really did not expect that I would spend hours/days on this. > I do not think that it is worth it. > > I consider most of the requests as nitpicking because > they requests extra work just because it looks like > a good idea. > > My take is that we should accept changes when they improve > the situation and go in the right direction. Further clean up > might be done later by anyone who does not have anything > more important on the plate or gets annoyed enough. > > Yes, you have many great ideas. But I am not a trained > monkey. And I do not know how to stop this when it is > looking endless. I'm just trying to help improve the code. That's how open source works. And it's my responsibility as a maintainer. If you're not open to review, don't bother posting the code in the first place. -- Josh