On Sat, Mar 20, 2021 at 04:34:06PM +0100, John Wood wrote: > On Wed, Mar 17, 2021 at 07:57:10PM -0700, Kees Cook wrote: > > On Sun, Mar 07, 2021 at 12:30:26PM +0100, John Wood wrote: > > > @@ -74,7 +84,7 @@ static struct brute_stats *brute_new_stats(void) > > > { > > > struct brute_stats *stats; > > > > > > - stats = kmalloc(sizeof(struct brute_stats), GFP_KERNEL); > > > + stats = kmalloc(sizeof(struct brute_stats), GFP_ATOMIC); > > > > Why change this here? I'd just start with this in the patch that > > introduces it. > > To be coherent in the previous patch. In the previous patch the kmalloc > could use GFP_KERNEL due to the call was made out of an atomic context. > Now, with the new lock it needs GFP_ATOMIC. So the question: > > If finally it need to use GFP_ATOMIC, the first patch need to use it even if > it is not necessary? It's probably not a big deal, but for me, I'd just do GFP_ATOMIC from the start, maybe add a comment that says "some LSM hooks are from atomic context" or something. > > > if (!stats) > > > return NULL; > > > > > > @@ -99,16 +109,17 @@ static struct brute_stats *brute_new_stats(void) > > > * It's mandatory to disable interrupts before acquiring the brute_stats::lock > > > * since the task_free hook can be called from an IRQ context during the > > > * execution of the task_alloc hook. > > > + * > > > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock > > > + * held. > > > */ > > > static void brute_share_stats(struct brute_stats *src, > > > struct brute_stats **dst) > > > { > > > - unsigned long flags; > > > - > > > - spin_lock_irqsave(&src->lock, flags); > > > + spin_lock(&src->lock); > > > refcount_inc(&src->refc); > > > *dst = src; > > > - spin_unlock_irqrestore(&src->lock, flags); > > > + spin_unlock(&src->lock); > > > } > > > > I still don't think any locking is needed here; the whole function can > > go away, IMO. > > In this case I think this is possible: > > Scenario 1: cpu 1 writes the stats pointer and cpu 2 is navigating the > processes tree reading the same stats pointer. > > Scenario 2: cpu 1 is navigating the processes tree reading the stats > pointer and in IRQ the same stats pointer is wrote. > > So, we need locking. Am I wrong? But only the refcount is being incremented, yes? That doesn't need a lock because it's already an atomic. > > > > /** > > > @@ -126,26 +137,36 @@ static void brute_share_stats(struct brute_stats *src, > > > * this task and the new one being allocated. Otherwise, share the statistics > > > * that the current task already has. > > > * > > > + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock > > > + * and brute_stats::lock since the task_free hook can be called from an IRQ > > > + * context during the execution of the task_alloc hook. > > > + * > > > * Return: -ENOMEM if the allocation of the new statistics structure fails. Zero > > > * otherwise. > > > */ > > > static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags) > > > { > > > struct brute_stats **stats, **p_stats; > > > + unsigned long flags; > > > > > > stats = brute_stats_ptr(task); > > > p_stats = brute_stats_ptr(current); > > > + write_lock_irqsave(&brute_stats_ptr_lock, flags); > > > > > > if (likely(*p_stats)) { > > > brute_share_stats(*p_stats, stats); > > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags); > > > return 0; > > > } > > > > > > *stats = brute_new_stats(); > > > - if (!*stats) > > > + if (!*stats) { > > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags); > > > return -ENOMEM; > > > + } > > > > > > brute_share_stats(*stats, p_stats); > > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags); > > > return 0; > > > } > > > > I'd much prefer that whatever locking is needed be introduced in the > > initial patch: this transformation just double the work to review. :) > > So, IIUC I need to introduce all the locks in the initial patch even if > they are not necessary. Am I right? I would find it easier to follow. Perhaps other reviewers would have a different opinion. > > > > > > > @@ -167,9 +188,9 @@ static int brute_task_alloc(struct task_struct *task, unsigned long clone_flags) > > > * only one task (the task that calls the execve function) points to the data. > > > * In this case, the previous allocation is used but the statistics are reset. > > > * > > > - * It's mandatory to disable interrupts before acquiring the brute_stats::lock > > > - * since the task_free hook can be called from an IRQ context during the > > > - * execution of the bprm_committing_creds hook. > > > + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock > > > + * and brute_stats::lock since the task_free hook can be called from an IRQ > > > + * context during the execution of the bprm_committing_creds hook. > > > */ > > > static void brute_task_execve(struct linux_binprm *bprm) > > > { > > > @@ -177,24 +198,33 @@ static void brute_task_execve(struct linux_binprm *bprm) > > > unsigned long flags; > > > > > > stats = brute_stats_ptr(current); > > > - if (WARN(!*stats, "No statistical data\n")) > > > + read_lock_irqsave(&brute_stats_ptr_lock, flags); > > > + > > > + if (WARN(!*stats, "No statistical data\n")) { > > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags); > > > return; > > > + } > > > > > > - spin_lock_irqsave(&(*stats)->lock, flags); > > > + spin_lock(&(*stats)->lock); > > > > > > if (!refcount_dec_not_one(&(*stats)->refc)) { > > > /* execve call after an execve call */ > > > (*stats)->faults = 0; > > > (*stats)->jiffies = get_jiffies_64(); > > > (*stats)->period = 0; > > > - spin_unlock_irqrestore(&(*stats)->lock, flags); > > > + spin_unlock(&(*stats)->lock); > > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags); > > > return; > > > } > > > > > > /* execve call after a fork call */ > > > - spin_unlock_irqrestore(&(*stats)->lock, flags); > > > + spin_unlock(&(*stats)->lock); > > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags); > > > + > > > + write_lock_irqsave(&brute_stats_ptr_lock, flags); > > > *stats = brute_new_stats(); > > > WARN(!*stats, "Cannot allocate statistical data\n"); > > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags); > > > } > > > > Again, I don't see a need for locking -- this is just managing the > > lifetime which is entirely handled by the implicit locking of "current" > > and the refcount_t. > > Here I can see the same two scenarios noted before. So I think the locking > is needed. Am I right? > > > > /** > > > @@ -204,9 +234,9 @@ static void brute_task_execve(struct linux_binprm *bprm) > > > * The statistical data that is shared between all the fork hierarchy processes > > > * needs to be freed when this hierarchy disappears. > > > * > > > - * It's mandatory to disable interrupts before acquiring the brute_stats::lock > > > - * since the task_free hook can be called from an IRQ context during the > > > - * execution of the task_free hook. > > > + * It's mandatory to disable interrupts before acquiring brute_stats_ptr_lock > > > + * and brute_stats::lock since the task_free hook can be called from an IRQ > > > + * context during the execution of the task_free hook. > > > */ > > > static void brute_task_free(struct task_struct *task) > > > { > > > @@ -215,17 +245,446 @@ static void brute_task_free(struct task_struct *task) > > > bool refc_is_zero; > > > > > > stats = brute_stats_ptr(task); > > > - if (WARN(!*stats, "No statistical data\n")) > > > + read_lock_irqsave(&brute_stats_ptr_lock, flags); > > > + > > > + if (WARN(!*stats, "No statistical data\n")) { > > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags); > > > return; > > > + } > > > > > > - spin_lock_irqsave(&(*stats)->lock, flags); > > > + spin_lock(&(*stats)->lock); > > > refc_is_zero = refcount_dec_and_test(&(*stats)->refc); > > > - spin_unlock_irqrestore(&(*stats)->lock, flags); > > > + spin_unlock(&(*stats)->lock); > > > + read_unlock_irqrestore(&brute_stats_ptr_lock, flags); > > > > > > if (refc_is_zero) { > > > + write_lock_irqsave(&brute_stats_ptr_lock, flags); > > > kfree(*stats); > > > *stats = NULL; > > > + write_unlock_irqrestore(&brute_stats_ptr_lock, flags); > > > + } > > > +} > > > > Same; I would expect this to be simply: > > No comment. I think I am missing something. I need to clarify the previous > cases before to work on the next ones. Sorry and thanks for the guidance. Right -- so, there are a few concurrency cases you need to worry about, AIUI: 1- stats lifetime (based on creation/death of tasks) 2- stats value being written vs read 3- stats values being written/read vs stats lifetime Using refcount_t in the standard pattern (as you're doing) should entirely cover "1". Since the values read from stats are mostly independent, it should be possible to use READ_ONCE() in the readers and WRITE_ONCE() under a lock in the writers (this is case "2"). For "3", I think the implicit locking of "current" keeps you safe (as in, the stats can't go away because "current" will always have a reference on it). I see two places where stats are written. One appears to be the brute_task_execve() case where only 1 thread exists, so there's no lock needed, and the other case is brute_update_crash_period(), which makes sense to me to lock: two tasks might be sharing a stats as they crash. Of course, I could easily be missing something here, but it looks like much less locking is needed. > > > stats = brute_stats_ptr(task); > > if (WARN_ON_ONCE(!*stats)) > > return; > > if (refcount_dec_and_test(&(*stats)->refc)) { > > kfree(*stats); > > *stats = NULL; > > } > > > > > + > > > +/* > > > + * BRUTE_EMA_WEIGHT_NUMERATOR - Weight's numerator of EMA. > > > + */ > > > +static const u64 BRUTE_EMA_WEIGHT_NUMERATOR = 7; > > > + > > > +/* > > > + * BRUTE_EMA_WEIGHT_DENOMINATOR - Weight's denominator of EMA. > > > + */ > > > +static const u64 BRUTE_EMA_WEIGHT_DENOMINATOR = 10; > > > > Should these be externally configurable (via sysfs)? > > No problem. I think this is easier than locking :) Heh, for the most part, yes. ;) Though I have my own nightmares[1] about sysfs. > > > +/** > > > + * brute_update_crash_period() - Update the application crash period. > > > + * @stats: Statistics that hold the application crash period to update. > > > + * @now: The current timestamp in jiffies. > > > + * > > > + * The application crash period must be a value that is not prone to change due > > > + * to spurious data and follows the real crash period. So, to compute it, the > > > + * exponential moving average (EMA) is used. > > > + * > > > + * This kind of average defines a weight (between 0 and 1) for the new value to > > > + * add and applies the remainder of the weight to the current average value. > > > + * This way, some spurious data will not excessively modify the average and only > > > + * if the new values are persistent, the moving average will tend towards them. > > > + * > > > + * Mathematically the application crash period's EMA can be expressed as > > > + * follows: > > > + * > > > + * period_ema = period * weight + period_ema * (1 - weight) > > > + * > > > + * If the operations are applied: > > > + * > > > + * period_ema = period * weight + period_ema - period_ema * weight > > > + * > > > + * If the operands are ordered: > > > + * > > > + * period_ema = period_ema - period_ema * weight + period * weight > > > + * > > > + * Finally, this formula can be written as follows: > > > + * > > > + * period_ema -= period_ema * weight; > > > + * period_ema += period * weight; > > > + * > > > + * The statistics that hold the application crash period to update cannot be > > > + * NULL. > > > + * > > > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock > > > + * since the task_free hook can be called from an IRQ context during the > > > + * execution of the task_fatal_signal hook. > > > + * > > > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock > > > + * held. > > > + * Return: The last crash timestamp before updating it. > > > + */ > > > +static u64 brute_update_crash_period(struct brute_stats *stats, u64 now) > > > +{ > > > + u64 current_period; > > > + u64 last_crash_timestamp; > > > + > > > + spin_lock(&stats->lock); > > > + current_period = now - stats->jiffies; > > > + last_crash_timestamp = stats->jiffies; > > > + stats->jiffies = now; > > > + > > > + stats->period -= brute_mul_by_ema_weight(stats->period); > > > + stats->period += brute_mul_by_ema_weight(current_period); > > > + > > > + if (stats->faults < BRUTE_MAX_FAULTS) > > > + stats->faults += 1; > > > + > > > + spin_unlock(&stats->lock); > > > + return last_crash_timestamp; > > > +} > > > > Now *here* locking makes sense, and it only needs to be per-stat, not > > global, since multiple processes may be operating on the same stat > > struct. To make this more no-reader-locking-friendly, I'd also update > > everything at the end, and use WRITE_ONCE(): > > > > u64 current_period, period; > > u64 last_crash_timestamp; > > u64 faults; > > > > spin_lock(&stats->lock); > > current_period = now - stats->jiffies; > > last_crash_timestamp = stats->jiffies; > > > > WRITE_ONCE(stats->period, > > stats->period - brute_mul_by_ema_weight(stats->period) + > > brute_mul_by_ema_weight(current_period)); > > > > if (stats->faults < BRUTE_MAX_FAULTS) > > WRITE_ONCE(stats->faults, stats->faults + 1); > > > > WRITE_ONCE(stats->jiffies, now); > > > > spin_unlock(&stats->lock); > > return last_crash_timestamp; > > > > That way readers can (IIUC) safely use READ_ONCE() on jiffies and faults > > without needing to hold the &stats->lock (unless they need perfectly matching > > jiffies, period, and faults). > > Thanks for the refactory. I will work on it (if I can understand locking). :( It may be worth reading Documentation/memory-barriers.txt which has some more details. > > > +/** > > > + * brute_manage_fork_attack() - Manage a fork brute force attack. > > > + * @stats: Statistical data shared by all the fork hierarchy processes. > > > + * @now: The current timestamp in jiffies. > > > + * > > > + * For a correct management of a fork brute force attack it is only necessary to > > > + * update the statistics and test if an attack is happening based on these data. > > > + * > > > + * The statistical data shared by all the fork hierarchy processes cannot be > > > + * NULL. > > > + * > > > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock > > > + * since the task_free hook can be called from an IRQ context during the > > > + * execution of the task_fatal_signal hook. > > > + * > > > + * Context: Must be called with interrupts disabled and brute_stats_ptr_lock > > > + * held. > > > + * Return: The last crash timestamp before updating it. > > > + */ > > > +static u64 brute_manage_fork_attack(struct brute_stats *stats, u64 now) > > > +{ > > > + u64 last_fork_crash; > > > + > > > + last_fork_crash = brute_update_crash_period(stats, now); > > > + if (brute_attack_running(stats)) > > > + print_fork_attack_running(); > > > + > > > + return last_fork_crash; > > > +} > > > + > > > +/** > > > + * brute_get_exec_stats() - Get the exec statistics. > > > + * @stats: When this function is called, this parameter must point to the > > > + * current process' statistical data. When this function returns, this > > > + * parameter points to the parent process' statistics of the fork > > > + * hierarchy that hold the current process' statistics. > > > + * > > > + * To manage a brute force attack that happens through the execve system call it > > > + * is not possible to use the statistical data hold by this process due to these > > > + * statistics disappear when this task is finished. In this scenario this data > > > + * should be tracked by the statistics of a higher fork hierarchy (the hierarchy > > > + * that contains the process that forks before the execve system call). > > > + * > > > + * To find these statistics the current fork hierarchy must be traversed up > > > + * until new statistics are found. > > > + * > > > + * Context: Must be called with tasklist_lock and brute_stats_ptr_lock held. > > > + */ > > > +static void brute_get_exec_stats(struct brute_stats **stats) > > > +{ > > > + const struct task_struct *task = current; > > > + struct brute_stats **p_stats; > > > + > > > + do { > > > + if (!task->real_parent) { > > > + *stats = NULL; > > > + return; > > > + } > > > + > > > + p_stats = brute_stats_ptr(task->real_parent); > > > + task = task->real_parent; > > > + } while (*stats == *p_stats); > > > + > > > + *stats = *p_stats; > > > +} > > > > See Yama's task_is_descendant() for how to walk up the process tree > > (and I think the process group stuff will save some steps too); you > > don't need tasklist_lock held, just rcu_read_lock held, AIUI: > > Documentation/RCU/listRCU.rst > > > > And since you're passing this stats struct back up, and it would be outside of rcu read lock, you'd want to do a "get" on it first: > > > > rcu_read_lock(); > > loop { > > ... > > } > > refcount_inc_not_zero(&(*p_stats)->refc); > > rcu_read_unlock(); > > > > *stats = *p_stats > > Thanks for the suggestions. I will work on it for the next version. > Anyway, in the first version Kees Cook and Jann Horn noted that some tasks > could escape the rcu read lock and that alternate locking were needed. > > Extract from the RFC: > > [Kees Cook] > Can't newly created processes escape this RCU read lock? I think this > need alternate locking, or something in the task_alloc hook that will > block any new process from being created within the stats group. > > [Jann Horn] > Good point; the proper way to deal with this would probably be to take > the tasklist_lock in read mode around this loop (with > read_lock(&tasklist_lock) / read_unlock(&tasklist_lock)), which pairs > with the write_lock_irq(&tasklist_lock) in copy_process(). Thanks to > the fatal_signal_pending() check while holding the lock in > copy_process(), that would be race-free - any fork() that has not yet > inserted the new task into the global task list would wait for us to > drop the tasklist_lock, then bail out at the fatal_signal_pending() > check. > > I think that this scenario is still possible. So the tasklist_lock is > necessary. Am I right? Oops, yeah, best to listen to Jann and past-me. :) Were these comments about finding the parent or killing offenders? > > > + > > > + spin_lock(&(*stats)->lock); > > > + if (!(*stats)->faults) > > > + (*stats)->jiffies = last_fork_crash; > > > + spin_unlock(&(*stats)->lock); > > > + > > > + brute_update_crash_period(*stats, now); > > > > and then you can add: > > > > if (refcount_dec_and_test(&(*stats)->refc)) > > kfree(*stats); > > > > (or better yet, make that a helper) named something like > > "put_brute_stats". > > Sorry, but I don't understand why we need to free the stats here. > What is the rationale behind this change? Err, I think I may have quoted the wrong chunk of your patch! Sorry; I was talking about the place where you did a free, I think? Disregard this for now. :) > > > +/** > > > + * brute_manage_exec_attack() - Manage an exec brute force attack. > > > + * @stats: Statistical data shared by all the fork hierarchy processes. > > > + * @now: The current timestamp in jiffies. > > > + * @last_fork_crash: The last fork crash timestamp before updating it. > > > + * > > > + * For a correct management of an exec brute force attack it is only necessary > > > + * to update the exec statistics and test if an attack is happening based on > > > + * these data. > > > + * > > > + * It is important to note that if the fork and exec crash periods are the same, > > > + * the attack test is avoided. This allows that in a brute force attack that > > > + * happens through the fork system call, the mitigation method does not act on > > > + * the parent process of the fork hierarchy. > > > + * > > > + * The statistical data shared by all the fork hierarchy processes cannot be > > > + * NULL. > > > + * > > > + * It's mandatory to disable interrupts before acquiring the brute_stats::lock > > > + * since the task_free hook can be called from an IRQ context during the > > > + * execution of the task_fatal_signal hook. > > > + * > > > + * Context: Must be called with interrupts disabled and tasklist_lock and > > > + * brute_stats_ptr_lock held. > > > + */ > > > +static void brute_manage_exec_attack(struct brute_stats *stats, u64 now, > > > + u64 last_fork_crash) > > > +{ > > > + int ret; > > > + struct brute_stats *exec_stats = stats; > > > + u64 fork_period; > > > + u64 exec_period; > > > + > > > + ret = brute_update_exec_crash_period(&exec_stats, now, last_fork_crash); > > > + if (WARN(ret, "No exec statistical data\n")) > > > + return; > > > > I think this should fail closed: if there's a static processing error, > > treat it as an attack. > > Do you mean to trigger the mitigation of a brute force attack over this task? > So, IIUC you suggest that instead of generate warnings if there isn't > statistical data, we need to trigger the mitigation? This can be applied to > the case where the allocation of a brute_stats structure fails? Right -- it should be an impossible scenario that the stats are _missing_. There is not an expected execution path in the kernel where that could happen, so if you're testing for it (and correctly generating a WARN), it should _also_ fail closed: an impossible case has been found, so assume userspace is under attack. (Otherwise it could serve as a bypass for an attacker who has found a way to navigate a process into this state.) -Kees -- Kees Cook