On Sat, Mar 10, 2012 at 04:01:09PM -0800, Linus Torvalds wrote: > I would in general suggest strongly against using exec_id for anything > that involves files. It isn't designed for that, it's designed for the > whole "check the parent exec_id" thing for ptrace, where that whole > "pass things around to another process" approach doesn't work. Actually, the original/historical purpose of the exec_id stuff was to protect privileged parent processes (those having done a SUID/SGID exec) from non-standard child exit signals, which could be set with clone(). I think we may want to audit the current implementation and see if it still fully achieves the goal or maybe not (and fix it if not). IIRC, 32 bits was considered enough because it was only the trusted privileged parent process itself that could potentially cause a wraparound. (I did not verify this conclusion now. It might be wrong.) I include below pieces of the prototype implementation from linux-2.2.12-ow6.tar.gz released in 1999. One notable difference from the code that went into mainline kernels was that I only incremented the counter on privileged execve(), and I additionally handled counter wraparound. I am a bit concerned that a wraparound attack might be possible on the code currently in mainline kernels, thereby allowing for a bad exit signal to be sent to a privileged new parent program. Does anything prevent the wraparound attack currently? (I did not check for this yet, sorry.) On exec: + bprm->priv_change = id_change || cap_raised; + if (bprm->priv_change) { ... + /* + * Increment the privileged execution counter, so that our + * old children know not to send bad exit_signal's to us. + * Also, wait on the lock if there's an exit_signal being + * sent to us now, to make sure it doesn't get sent to the + * new privileged program. + */ + spin_lock_irqsave(¤t->priv_lock, flags); + if (!++current->priv) { + struct task_struct *p; + + /* + * The counter can't really overflow with real-world + * programs (and it has to be the privileged program + * itself that causes the overflow), but we handle + * this case anyway, just for correctness. + */ + read_lock(&tasklist_lock); + for_each_task(p) { + if (p->p_pptr == current) { + p->ppriv = 0; + current->priv = 1; + } + } + read_unlock(&tasklist_lock); + } + spin_unlock_irqrestore(¤t->priv_lock, flags); In task_struct: +/* Privileged execution counters, for exit_signal permission checking */ + spinlock_t priv_lock; + int priv, ppriv; On fork() and clone(): + spin_lock_init(&p->priv_lock); + p->priv = 0; + p->ppriv = current->priv; Exit signal: + unsigned long flags = 0; + int locked = 0; + + if (sig && sig != SIGCHLD) { + /* + * Make sure our parent hasn't executed a privileged program + * (such as, SUID) since we were born. + * + * We do some locking here to ensure that there's no race + * between the check and actually sending the signal. + * Currently, this is probably redundant as notify_parent() + * is only used either with the big lock obtained, or with + * the signal set to SIGCHLD. + */ + locked = 1; + spin_lock_irqsave(&tsk->p_pptr->priv_lock, flags); + if (tsk->p_pptr->priv != tsk->ppriv) { + spin_unlock_irqrestore(&tsk->p_pptr->priv_lock, flags); + locked = 0; + sig = 0; + } + } ... + if (locked) spin_unlock_irqrestore(&tsk->p_pptr->priv_lock, flags); IIRC, an equivalent of the above went upstream (with simplifications and a variables rename by Alan) in 2.2.13, so that may be another "reference implementation" to check against. Alexander -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html