Thanks for reviewing! On 15-Sep 00:16, Yonghong Song wrote: > > > On 9/10/19 12:55 PM, KP Singh wrote: > > From: KP Singh <kpsingh@xxxxxxxxxx> > > This patch cannot apply cleanly. > > -bash-4.4$ git apply ~/p12.txt > error: patch failed: include/uapi/linux/bpf.h:2715 > error: include/uapi/linux/bpf.h: patch does not apply > error: patch failed: tools/include/uapi/linux/bpf.h:2715 > error: tools/include/uapi/linux/bpf.h: patch does not apply > -bash-4.4$ I am not sure why this is happening, I tried: git clone \ https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git && \ cd linux-next && \ git checkout -b review v5.3-rc6 && \ wget -P /tmp https://lore.kernel.org/patchwork/series/410101/mbox && \ git am /tmp/mbox and it worked. This seems to work too: patch -p1 < <file>.patch Can you try with "git am" please? > > > > > The helper returns the value of the environment variable in the buffer > > that is passed to it. If the var is set multiple times, the helper > > returns all the values as null separated strings. > > > > If the buffer is too short for these values, the helper tries to fill it > > the best it can and guarantees that the value returned in the buffer > > is always null terminated. After the buffer is filled, the helper keeps > > counting the number of times the environment variable is set in the > > envp. > > > > The return value of the helper is an u64 value which carries two pieces > > of information. > > > > * The upper 32 bits are a u32 value signifying the number of times > > the environment variable is set in the envp. > > Not sure how useful this 'upper 32' bit value is. What user expected to do? > > Another option is to have upper 32 bits encode the required buffer size > to hold all values. This may cause some kind of user space action, e.g., > to replace the program with new program with larger per cpu map value size? > The upper 32-bit value is actually an important part of the LSM's MAC policy. It allows the user to: - Return an -EPERM when if the environment variable is set more than once. - Log a warning (this is what we are doing in the example) so this is flagged as a potential malicious actor. > > * The lower 32 bits are a s32 value signifying the number of bytes > > written to the buffer or an error code. > > > Since the value of the environment variable can be very long and exceed > > what can be allocated on the BPF stack, a per-cpu array can be used > > instead: > > > > struct bpf_map_def SEC("maps") env_map = { > > .type = BPF_MAP_TYPE_PERCPU_ARRAY, > > .key_size = sizeof(u32), > > .value_size = 4096, > > .max_entries = 1, > > }; > > Could you use use map definition with SEC(".maps")? Sure, I added this example program in the commit message. Will update it to be more canonical. Thanks! > > > > > SEC("prgrm") > > int bpf_prog1(void *ctx) > > { > > u32 map_id = 0; > > u64 times_ret; > > s32 ret; > > char name[48] = "LD_PRELOAD"; > > Reverse Christmas tree coding style, here and other places? Will happily fix it. However, I did not find it mentioned in the style guide: https://www.kernel.org/doc/html/v4.10/process/coding-style.html https://elixir.bootlin.com/linux/v4.6/source/Documentation/CodingStyle Is there one specific to BPF? > > > > > char *map_value = bpf_map_lookup_elem(&env_map, &map_id); > > if (!map_value) > > return 0; > > > > // Read the lower 32 bits for the return value > > times_ret = krsi_get_env_var(ctx, name, 48, map_value, 4096); > > ret = times_ret & 0xffffffff; > > if (ret < 0) > > return ret; > > return 0; > > } > > > > Signed-off-by: KP Singh <kpsingh@xxxxxxxxxx> > > --- > > include/uapi/linux/bpf.h | 42 ++++++- > > security/krsi/ops.c | 129 ++++++++++++++++++++++ > > tools/include/uapi/linux/bpf.h | 42 ++++++- > > tools/testing/selftests/bpf/bpf_helpers.h | 3 + > > 4 files changed, 214 insertions(+), 2 deletions(-) > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index 32ab38f1a2fe..a4ef07956e07 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -2715,6 +2715,45 @@ union bpf_attr { > > * **-EPERM** if no permission to send the *sig*. > > * > > * **-EAGAIN** if bpf program can try again. > > + * > > + * u64 krsi_get_env_var(void *ctx, char *name, char *buf, > > + * size_t name_len, size_t buf_len) > > This signature is not the same as the later > krsi_get_env_var(...) helper definition. > BPF_CALL_5(krsi_get_env_var, struct krsi_ctx *, ctx, char *, name, u32, > n_size, > char *, dest, u32, size) > I did this because the krsi_ctx is not exposed to the userspace and allows KRSI to modify the context without worrying about breaking userspace. That said, I could mark it as a (void *) here and cast it internally. I guess that would be better/cleaner? > > + * Description > > + * This helper can be used as a part of the > > + * process_execution hook of the KRSI LSM in > > + * programs of type BPF_PROG_TYPE_KRSI. > > + * > > + * The helper returns the value of the environment > > + * variable with the provided "name" for process that's > > + * going to be executed in the passed buffer, "buf". If the var > > + * is set multiple times, the helper returns all > > + * the values as null separated strings. > > + * > > + * If the buffer is too short for these values, the helper > > + * tries to fill it the best it can and guarantees that the value > > + * returned in the buffer is always null terminated. > > + * After the buffer is filled, the helper keeps counting the number > > + * of times the environment variable is set in the envp. > > + * > > + * Return: > > + * > > + * The return value of the helper is an u64 value > > + * which carries two pieces of information: > > + * > > + * The upper 32 bits are a u32 value signifying > > + * the number of times the environment variable > > + * is set in the envp. > > + * The lower 32 bits are an s32 value signifying > > + * the number of bytes written to the buffer or an error code: > > + * > > + * **-ENOMEM** if the kernel is unable to allocate memory > > + * for pinning the argv and envv. > > + * > > + * **-E2BIG** if the value is larger than the size of the > > + * destination buffer. The higher bits will still > > + * the number of times the variable was set in the envp. > > The -E2BIG is returned because buffer sizee is not big enough. > Another possible error code is -ENOSPC, which typically indicates > buffer size not big enough. Sure, I am fine with using either. > > > + * > > + * **-EINVAL** if name is not a NULL terminated string. > > */ > > #define __BPF_FUNC_MAPPER(FN) \ > > FN(unspec), \ > > @@ -2826,7 +2865,8 @@ union bpf_attr { > > FN(strtoul), \ > > FN(sk_storage_get), \ > > FN(sk_storage_delete), \ > > - FN(send_signal), > > + FN(send_signal), \ > > + FN(krsi_get_env_var), > > > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > > * function eBPF program intends to call > > diff --git a/security/krsi/ops.c b/security/krsi/ops.c > > index 1f4df920139c..1db94dfaac15 100644 > > --- a/security/krsi/ops.c > > +++ b/security/krsi/ops.c > > @@ -6,6 +6,8 @@ > > #include <linux/bpf.h> > > #include <linux/security.h> > > #include <linux/krsi.h> > > +#include <linux/binfmts.h> > > +#include <linux/highmem.h> > > > > #include "krsi_init.h" > > #include "krsi_fs.h" > > @@ -162,6 +164,131 @@ static bool krsi_prog_is_valid_access(int off, int size, > > return false; > > } > > > > +static char *array_next_entry(char *array, unsigned long *offset, > > + unsigned long end) > > +{ > > + char *entry; > > + unsigned long current_offset = *offset; > > + > > + if (current_offset >= end) > > + return NULL; > > + > > + /* > > + * iterate on the array till the null byte is encountered > > + * and check for any overflows. > > + */ > > + entry = array + current_offset; > > + while (array[current_offset]) { > > + if (unlikely(++current_offset >= end)) > > + return NULL; > > + } > > + > > + /* > > + * Point the offset to the next element in the array. > > + */ > > + *offset = current_offset + 1; > > + > > + return entry; > > +} > > + > > +static u64 get_env_var(struct krsi_ctx *ctx, char *name, char *dest, > > + u32 n_size, u32 size) > > +{ > > + s32 ret = 0; > > + u32 num_vars = 0; > > + int i, name_len; > > + struct linux_binprm *bprm = ctx->bprm_ctx.bprm; > > + int argc = bprm->argc; > > + int envc = bprm->envc; > > + unsigned long end = ctx->bprm_ctx.max_arg_offset; > > + unsigned long offset = bprm->p % PAGE_SIZE; > > why we need bprm->p % PAGE_SIZE instead of bprm->p? bprm->p points to the top of the memory and it's not an offset. The pinned buffer contains the pages for the (argv+env) and the brpm->p % PAGE_SIZE is the offset into the first page where the (argv+envv) starts. > > > + char *buf = ctx->bprm_ctx.arg_pages; > > + char *curr_dest = dest; > > + char *entry; > > + > > + if (unlikely(!buf)) > > + return -ENOMEM; > > + > > + for (i = 0; i < argc; i++) { > > + entry = array_next_entry(buf, &offset, end); > > + if (!entry) > > + return 0; > > + } > > + > > + name_len = strlen(name); > > + for (i = 0; i < envc; i++) { > > + entry = array_next_entry(buf, &offset, end); > > + if (!entry) > > + return 0; > > If the buf is "LD_PRELOAD=a.so\0LD_PRELOAD=b.so" and argc=0, > we may skip the first entry? I think I need to rename the "array_next_entry" function / document it better. The function updates the offset to the next location and the returns the entry at the current offset. So, in the first instance: // offset is the offset into the first page. entry = buf + offset; offset = <updated value to the next entry>. > > > > + > > + if (!strncmp(entry, name, name_len)) { > > + num_vars++; > > There helper permits n_size = 0 (ARG_CONST_SIZE_OR_ZERO). > in this case, name_len = 0, strncmp(entry, name, name_len) will be always 0. Thanks, you are right, It does not make sense to have name_len = 0. I will change it to ARG_CONST_SIZE. > > > + > > + /* > > + * There is no need to do further copying > > + * if the buffer is already full. Just count how many > > + * times the environment variable is set. > > + */ > > + if (ret == -E2BIG) > > + continue; > > + > > + if (entry[name_len] != '=') > > + continue; > > + > > + /* > > + * Move the buf pointer by name_len + 1 > > + * (for the "=" sign) > > + */ > > + entry += name_len + 1; > > + ret = strlcpy(curr_dest, entry, size); > > + > > + if (ret >= size) { > > + ret = -E2BIG; > > Here, we have a partial copy. Should you instead nullify (memset) it as > it is not really invalid one? The function does specify that the it will return a null terminated value even if an -E2BIG is returned so that user does get a truncated value. It's better to give the user some data. (I mentioned this in the documentation for the helper). > > > + continue; > > + } > > + > > + /* > > + * strlcpy just returns the length of the string copied. > > + * The remaining space needs to account for the added > > + * null character. > > + */ > > + curr_dest += ret + 1; > > + size -= ret + 1; > > + /* > > + * Update ret to be the current number of bytes written > > + * to the destination > > + */ > > + ret = curr_dest - dest; > > + } > > + } > > + > > + return (u64) num_vars << 32 | (u32) ret; > > +} > > + > > +BPF_CALL_5(krsi_get_env_var, struct krsi_ctx *, ctx, char *, name, u32, n_size, > > + char *, dest, u32, size) > > +{ > > + char *name_end; > > + > > + name_end = memchr(name, '\0', n_size); > > + if (!name_end) > > + return -EINVAL; > > + > > + memset(dest, 0, size); This memset ensures the buffer is zeroed out (incase the buffer is fully / partially empty). > > + return get_env_var(ctx, name, dest, n_size, size); > > +} > > + > > +static const struct bpf_func_proto krsi_get_env_var_proto = { > > + .func = krsi_get_env_var, > > + .gpl_only = true, > > + .ret_type = RET_INTEGER, > > + .arg1_type = ARG_PTR_TO_CTX, > > + .arg2_type = ARG_PTR_TO_MEM, > > + .arg3_type = ARG_CONST_SIZE_OR_ZERO, > > + .arg4_type = ARG_PTR_TO_UNINIT_MEM, > > + .arg5_type = ARG_CONST_SIZE_OR_ZERO, > > +}; > > + > > BPF_CALL_5(krsi_event_output, void *, log, > > struct bpf_map *, map, u64, flags, void *, data, u64, size) > > { > > @@ -192,6 +319,8 @@ static const struct bpf_func_proto *krsi_prog_func_proto(enum bpf_func_id > > return &bpf_map_lookup_elem_proto; > > case BPF_FUNC_get_current_pid_tgid: > > return &bpf_get_current_pid_tgid_proto; > > + case BPF_FUNC_krsi_get_env_var: > > + return &krsi_get_env_var_proto; > > case BPF_FUNC_perf_event_output: > > return &krsi_event_output_proto; > > default: > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > > index 32ab38f1a2fe..a4ef07956e07 100644 > > --- a/tools/include/uapi/linux/bpf.h > > +++ b/tools/include/uapi/linux/bpf.h > > @@ -2715,6 +2715,45 @@ union bpf_attr { > > * **-EPERM** if no permission to send the *sig*. > > * > > * **-EAGAIN** if bpf program can try again. > > + * > > + * u64 krsi_get_env_var(void *ctx, char *name, char *buf, > > + * size_t name_len, size_t buf_len) > > Inconsistent helper definitions. As discussed above, I can change the BPF_CALL_5 declaration to have a (void *) and cast to the krsi_ctx in the helper itself. - KP > > > + * Description > > + * This helper can be used as a part of the > > + * process_execution hook of the KRSI LSM in > > + * programs of type BPF_PROG_TYPE_KRSI. > > + * > > + * The helper returns the value of the environment > > + * variable with the provided "name" for process that's > > + * going to be executed in the passed buffer, "buf". If the var > > + * is set multiple times, the helper returns all > > + * the values as null separated strings. > > + * > > + * If the buffer is too short for these values, the helper > > + * tries to fill it the best it can and guarantees that the value > > + * returned in the buffer is always null terminated. > > + * After the buffer is filled, the helper keeps counting the number > > + * of times the environment variable is set in the envp. > > + * > > + * Return: > > + * > > + * The return value of the helper is an u64 value > > + * which carries two pieces of information: > > + * > > + * The upper 32 bits are a u32 value signifying > > + * the number of times the environment variable > > + * is set in the envp. > > + * The lower 32 bits are an s32 value signifying > > + * the number of bytes written to the buffer or an error code: > > + * > > + * **-ENOMEM** if the kernel is unable to allocate memory > > + * for pinning the argv and envv. > > + * > > + * **-E2BIG** if the value is larger than the size of the > > + * destination buffer. The higher bits will still > > + * the number of times the variable was set in the envp. > > + * > > + * **-EINVAL** if name is not a NULL terminated string. > > */ > > #define __BPF_FUNC_MAPPER(FN) \ > > FN(unspec), \ > > @@ -2826,7 +2865,8 @@ union bpf_attr { > > FN(strtoul), \ > > FN(sk_storage_get), \ > > FN(sk_storage_delete), \ > > - FN(send_signal), > > + FN(send_signal), \ > > + FN(krsi_get_env_var), > > > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > > * function eBPF program intends to call > > diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h > > index f804f210244e..ecebdb772a9d 100644 > > --- a/tools/testing/selftests/bpf/bpf_helpers.h > > +++ b/tools/testing/selftests/bpf/bpf_helpers.h > > @@ -303,6 +303,9 @@ static int (*bpf_get_numa_node_id)(void) = > > static int (*bpf_probe_read_str)(void *ctx, __u32 size, > > const void *unsafe_ptr) = > > (void *) BPF_FUNC_probe_read_str; > > +static unsigned long long (*krsi_get_env_var)(void *ctx, > > + void *name, __u32 n_size, void *buf, __u32 size) = > > + (void *) BPF_FUNC_krsi_get_env_var; > > static unsigned int (*bpf_get_socket_uid)(void *ctx) = > > (void *) BPF_FUNC_get_socket_uid; > > static unsigned int (*bpf_set_hash)(void *ctx, __u32 hash) = > >