On 7/12/19 6:55 PM, Eric Blake wrote: > On 7/3/19 2:19 AM, Michal Privoznik wrote: >> When spawning a child process, between fork() and exec() we close >> all file descriptors and keep only those the caller wants us to >> pass onto the child. The problem is how we do that. Currently, we >> get the limit of opened files and then iterate through each one >> of them and either close() it or make it survive exec(). This >> approach is suboptimal (although, not that much in default >> configurations where the limit is pretty low - 1024). We have >> /proc where we can learn what FDs we hold open and thus we can >> selectively close only those. >> >> Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> >> --- >> src/util/vircommand.c | 78 ++++++++++++++++++++++++++++++++++++++----- >> 1 file changed, 70 insertions(+), 8 deletions(-) > >> +# ifdef __linux__ >> +/* On Linux, we can utilize procfs and read the table of opened >> + * FDs and selectively close only those FDs we don't want to pass >> + * onto child process (well, the one we will exec soon since this >> + * is called from the child). */ >> +static int >> +virCommandMassCloseGetFDsLinux(virCommandPtr cmd ATTRIBUTE_UNUSED, >> + virBitmapPtr fds) >> +{ >> + DIR *dp = NULL; >> + struct dirent *entry; >> + const char *dirName = "/proc/self/fd"; >> + int rc; >> + int ret = -1; >> + >> + if (virDirOpen(&dp, dirName) < 0) >> + return -1; > > Unfortunately, POSIX says that opendir()/readdir() are not async-safe > (the list of async-safe functions is > https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04; > some implementations of opendir() call malloc(), and malloc() is > definitely not async-safe - if you ever fork() in one thread while > another thread is locked inside a malloc, your child process is > deadlocked if it has to malloc because the forked process no longer has > the thread to release the malloc lock). It also says readdir() is not > threadafe > (https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09) > > Therefore, opendir() (hidden in virDirOpen) and readdir() (hidden in > virDirRead) are generally unsafe to be used between fork() of a > multi-threaded app and the subsequent exec(). But maybe you are safe > because this code is only compiled on Linux? Are we absolutely sure this > can't deadlock, in spite of violating POSIX constraints on async-safety? > > http://austingroupbugs.net/view.php?id=696 points out some interesting > problems from the POSIX side - readdir_r() is absolutely broken, and > readdir() being marked as thread-unsafe may be too strict. But then > again, http://austingroupbugs.net/view.php?id=75 states > "The application shall not modify the structure to which the > return value of readdir() points, nor any storage areas pointed to > by pointers within the structure. The returned pointer, and > pointers within the structure, might be invalidated or the > structure or the storage areas might be overwritten by a > subsequent call to readdir() on the same directory stream. > They shall not be affected by a call to readdir() on a different > directory stream." > where it may be the intent that if opendir() doesn't take any locks or > other async-unsafe actions, using opendir() after fork() may be safe > after all (readdir on a DIR* that was opened in the parent is not, but > readdir() on a fresh DIR* opened in the child might be safe, even if not > currently strictly portable). D'oh. POSIX and its limitations. I could allocate the bitmap in the parent, sure. But avoiding opendir() is not possible (that's the whole point of this patch). While testing this - on Linux - I did not run into any deadlock, but maybe I was just lucky. On the other hand, this is going to run only on Linux, on anything else we'll still iterate over all FDs. I find it rather surprising (in a disturbing way) that there is no better solution to this problem than iterating over all FDs and closing them. One by one. Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list