On Mon, 2023-06-05 at 11:11 +0300, Mike Rapoport wrote: > On Sun, Jun 04, 2023 at 10:52:44PM -0400, Steven Rostedt wrote: > > On Thu, 1 Jun 2023 16:54:36 -0700 > > Nadav Amit <nadav.amit@xxxxxxxxx> wrote: > > > > > > The way text_poke() is used here, it is creating a new writable > > > > alias > > > > and flushing it for *each* write to the module (like for each > > > > write of > > > > an individual relocation, etc). I was just thinking it might > > > > warrant > > > > some batching or something. > > > > I am not advocating to do so, but if you want to have many > > > efficient > > > writes, perhaps you can just disable CR0.WP. Just saying that if > > > you > > > are about to write all over the memory, text_poke() does not > > > provide > > > too much security for the poking thread. > > Heh, this is definitely and easier hack to implement :) I don't know the details, but previously there was some strong dislike of CR0.WP toggling. And now there is also the problem of CET. Setting CR0.WP=0 will #GP if CR4.CET is 1 (as it currently is for kernel IBT). I guess you might get away with toggling them both in some controlled situation, but it might be a lot easier to hack up then to be made fully acceptable. It does sound much more efficient though. > > > Batching does exist, which is what the text_poke_queue() thing > > does. > > For module loading text_poke_queue() will still be much slower than a > bunch > of memset()s for no good reason because we don't need all the > complexity of > text_poke_bp_batch() for module initialization because we are sure we > are > not patching live code. > > What we'd need here is a new batching mode that will create a > writable > alias mapping at the beginning of apply_relocate_*() and > module_finalize(), > then it will use memcpy() to that writable alias and will tear the > mapping > down in the end. It's probably only a tiny bit faster than keeping a separate writable allocation and text_poking it in at the end. > > Another option is to teach alternatives to update a writable copy > rather > than do in place changes like Song suggested. My feeling is that it > will be > more intrusive change though. You mean keeping a separate RW allocation and then text_poking() the whole thing in when you are done? That is what I was trying to say at the beginning of this thread. The other benefit is you don't make the intermediate loading states of the module, executable. I tried this technique previously [0], and I thought it was not too bad. In most of the callers it looks similar to what you have in do_text_poke(). Sometimes less, sometimes more. It might need enlightening of some of the stuff currently using text_poke() during module loading, like jump labels. So that bit is more intrusive, yea. But it sounds so much cleaner and well controlled. Did you have a particular trouble spot in mind? [0] https://lore.kernel.org/lkml/20201120202426.18009-5-rick.p.edgecombe@xxxxxxxxx/