Michal Hocko <mhocko@xxxxxxxxxx> writes: > On Wed 25-03-20 17:20:40, Eric W. Biederman wrote: >> Vlastimil Babka <vbabka@xxxxxxx> writes: > [...] >> > + if (strncmp(param, "sysctl.", sizeof("sysctl.") - 1)) >> > + return 0; >> >> Is there any way we can use a slash separated path. I know >> in practice there are not any sysctl names that don't have >> a '.' in them but why should we artifically limit ourselves? > > Because this is the normal userspace interface? Why should it be any > different from calling sysctl? > [...] Why should the kernel command line implement userspace whims? I was thinking something like: "sysctl/kernel/max_lock_depth=2048" doesn't look too bad and it makes things like reusing our kernel internal helpers much easier. Plus it suggest that we could do the same for sysfs files: "sysfs/kernel/fscaps=1" And the code could be same for both cases except for the filesystem prefix. >> Further it will be faster to lookup the sysctls using the code from >> proc_sysctl.c as it constructs an rbtree of all of the entries in >> a directory. The code might as well take advantage of that for large >> directories. > > Sounds like a good fit for a follow up patch to me. Let's make this > as simple as possible for the initial version. But up to Vlastimil of course. I would argue that reusing proc_sysctl.c:lookup_entry() should make the code simpler, and easier to reason about. Especially given the bugs in the first version with a sysctl path. A clean separation between separating the path from into pieces and looking up those pieces should make the code more robust. That plus I want to get very far away from the incorrect idea that you can have sysctls without compiling in proc support. That is not how the code works, that is not how the code is tested. It is also worth pointing out that: proc_mnt = kern_mount(proc_fs_type); for_each_sysctl_cmdline() { ... file = file_open_root(proc_mnt->mnt_root, proc_mnt, sysctl_path, O_WRONLY, 0); kernel_write(file, value, value_len); } kern_umount(proc_mnt); Is not an unreasonable implementation. There are problems with a persistent mount of proc in that it forces userspace not to use any proc mount options. But a temporary mount of proc to deal with command line options is not at all unreasonable. Plus it looks like we can have kern_write do all of the kernel/user buffer silliness. > [...] > >> Hmm. There is a big gotcha in here and I think it should be mentioned. >> This code only works because no one has done set_fs(KERNEL_DS). Which >> means this only works with strings that are kernel addresses essentially >> by mistake. A big fat comment documenting why it is safe to pass in >> kernel addresses to a function that takes a "char __user*" pointer >> would be very good. > > Agreed Eric