> Hi Elena, > > I'm super happy to help you with anything you want. We could talk on > zoom if that's a useful thing. Hi Dan, Thank you for such a prompt reply! Yes, I think it would be great to talk if possible, I will drop you a separate email on time/arrangements. I can explain in more details what we are trying to do, etc. I can merge this patch but it feels like > maybe it should be turned off by default. There should be some command > line argument for it. Yes, it is definitely should be off by default since it is way too noisy for any generic setting. But I think the pattern might become really simple once the actual taint propagation is implemented outside of it. > > On Tue, Mar 08, 2022 at 10:25:13AM +0200, Elena Reshetova wrote: > > check_host_input pattern helps identifying locations > > where a guest kernel can take a potentially malicious > > input from the untrusted host in a confidential computing > > threat model. The output of the pattern helps to > > facilitate manual code audit and cross verify fuzzing > > coverage. > > > > Signed-off-by: Elena Reshetova <elena.reshetova@xxxxxxxxx> > > --- > > check_host_input.c | 886 +++++++++++++++++++++++++++++++++++++++++++++ > > check_list.h | 2 + > > 2 files changed, 888 insertions(+) > > create mode 100644 check_host_input.c > > > > diff --git a/check_host_input.c b/check_host_input.c > > new file mode 100644 > > index 00000000..dc18ab34 > > --- /dev/null > > +++ b/check_host_input.c > > @@ -0,0 +1,886 @@ > > +/* > > + * Smatch pattern to facilitate the hardening of the Linux guest kernel > > + * for Confidential Cloud Computing threat model. > > + * In this model the Linux guest kernel cannot trust the values > > + * it obtains using low level IO functions because they can be provided > > + * by a potentially malicious host or VMM. Instead it needs to make > > + * sure the code that handles processing of such values is hardened, > > + * free of memory safety issues and other potential security issues. > > + * > > + * This smatch pattern helps to indentify such places. > > + * Currently it covers most of MSR, portIO, MMIO and cpuid reading primitives. > > + * The full list of covered functions is stored in host_input_funcs array. > > + * The output of the pattern can be used to facilitate code audit, as > > + * well as to verify that utilized fuzzing strategy can reach all the > > + * code paths that can take a low-level input from a potentially malicious host. > > + * > > + * When ran, the pattern produces two types of findings: errors and warnings. > > + * This is done to help prioritizing the issues for the manual code audit. > > + * However if time permits all locations reported by the pattern should be > checked. > > + * > > + * Written based on existing smatch patterns. > > + * > > + * Author: Elena Reshetova <elena.reshetova@xxxxxxxxx> > > + * Copyright (c) 2021-22, Intel Corporation > > + * > > + * This program is free software; you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License version 2, as > > + * published by the Free Software Foundation. > > + * > > + */ > > +#include "smatch.h" > > +#include "smatch_slist.h" > > +#include "smatch_extra.h" > > +#include <math.h> > > + > > +static int my_id; > > + > > +STATE(local_variables); > > +STATE(local_from_host); > > +STATE(tainted_from_host); > > +STATE(called_funcs); > > My instinct is that these are too many states. Try to limit it to one > or two states per file and the code becomes a lot simpler. > > The "local_variables" state is not required. Yes, that's true probably, thank you for the pointer! > > static bool is_local(struct expression *expr) > { > expr = strip_expr(expr); > if (!expr) > return false; > if (expr->type != EXPR_SYMBOL) > return false; > /* FIXME: why not just check if MOD_AUTO is set instead? */ > if (expr->symbol->ctype.modifiers & (MOD_NONLOCAL | MOD_STATIC | > MOD_ADDRESSABLE)) > return false; > return true; > } > > called_funcs is not required either. You can use the "cur_func_sym" > global variable for that. Thank you! > > static int get_func_start_lineno(void) > { > if (!cur_func_sym) > return -1; > return cur_func_sym->pos.line; > } > > I've looked through this file and there are ways that it could probably > be updated to newer better APIs but it sounds like the other in progress > patch is the more interesting problem. Yes, this is my thinking also. I wanted to show this one because it has been our starting point and smth that we still use to verify and cross check both audit and fuzzing. But I would like to implement the base tainting externally and then add a simpler patterns for finding real things on top of it. Best Regards, Elena.