Hello, The changes (described below) are available in a series of commits on my fork of the nfs-utils repository at https://github.com/chadjoan/nfs-utils/commits/bug344-passwd Background: I reported a bug (#344) to the linux-nfs bugzilla earlier this month (https://bugzilla.linux-nfs.org/show_bug.cgi?id=344) where the idmapd service would fail to start on one of my machines and printed a "general protection fault" message in my system log: Aug 3 07:13:21 l-andianov rpc.idmapd[29843]: Setting log level to 7 Aug 3 07:13:21 l-andianov rpc.idmapd[29843]: libnfsidmap: using domain: ozymandias Aug 3 07:13:21 l-andianov rpc.idmapd[29843]: libnfsidmap: Realms list: 'OZYMANDIAS' Aug 3 07:13:21 l-andianov rpc.idmapd[29843]: libnfsidmap: processing 'Method' list Aug 3 07:13:21 l-andianov kernel: traps: rpc.idmapd[29843] general protection fault ip:7fb4bc75c0cd sp:7ffe56527540 error:0 in libc.so[7fb4bc748000+5f000] Aug 3 07:13:21 l-andianov rpc.idmapd[29843]: libnfsidmap: loaded plugin /usr/lib/libnfsidmap/static.so for method static Aug 3 07:13:21 l-andianov /etc/init.d/rpc.idmapd[29819]: ERROR: rpc.idmapd failed to start Aug 3 07:13:21 l-andianov /etc/init.d/nfs[29818]: ERROR: cannot start nfs as rpc.idmapd would not start Long-story-short: I investigated, found out it was because musl libc returns `-1` from `sysconf(_SC_GETPW_R_SIZE_MAX)` calls (which is allowable by POSIX standards), and idmapd's code wasn't written to handle this possibility, so a buffer overrun/overflow happened. Resolution: I have written commits that solve this problem wherever it might occur, so anywhere `sysconf` was used to size memory buffers before calling `get**nam_r` or `get***id_r`. It may also fix other bugs or inconsistencies that were found in the affected code, as well as improving the comprehensiveness of error detection. Highlights: - Memory for `get**nam_r` and `get***id_r` functions is no longer allocated based on the return value of sysconf(...), as it can return misleading values or -1 in this use-case. - Introduces passwd_query.c module that handles buffer allocation and adjustment for those functions, as well as other edge-cases. This does increase complexity, but it allows some memory management and exception handling complexity to be shifted away from feature and retrieval logic. - The passwd_query.c module was (manually) tested in isolation to get more coverage than what I would be able to obtain by running idmapd. There is a program for that (pwgrp_test.c) in this repository: https://github.com/chadjoan/passwd_query - Calls to `get**nam_r` and `get***id_r` are now very likely to use stack allocation instead of `malloc`, though this logic will fall back to `malloc` if the demands on buffer size are large (ex: absurdly long comments in passwd file or very long group member lists). - Furthermore, if `get**nam_r` and `get***id_r` still need more buffer, `passwd_query.c` will use a `realloc`-loop until the buffer is large enough. This results in an interface that does the same thing but never returns ERANGE - the error does not have to be handled because it does not happen. - Any code that called the `get**nam_r` and `get***id_r` will now log error messages for every possible error code that those functions return. In the event of a non-standard error, it will attempt to get error text from `strerror`, and information about what was being looked-up or mapped will still be printed. - In the `gums.c` plugin, `translate_to_uid` and `translate_to_gid` did not have the same exception handling logic. For example, `translate_to_gid` had a `realloc`-loop, while `translate_to_uid` did not. Since this change has them all use `passwd_query.c` functionality, they should be more robust and equally robust. - In some places the code calling `get**nam_r` and `get***id_r` would loop after EINTR error codes (retry on interrupt), while other places would not. These differences were not justified with comments, so I gave the EINTR retry looping behavior to everything that queries `passwd` or `group` information. I did this by convention instead of putting the functionality in `passwd_query.c`, so it should be pretty easy to back it off selectively if there are places where this shouldn't be done. - There was a potential memory leak in `regex.c`: `regex_getpwnam` would not free the `localname` string if it completed successfully (the abortive cases were fine, though). That can no longer happen, and it is written in a way that makes it clear that the `free()` will always happen if-and-only-if `malloc()` happened. It will also attempt to use a small-ish fixed-size stack allocated buffer whenever the string fits. - Since the code related to `get**nam_r` and `get***id_r` in `nss.c` was already being refactored to some extent, I made it trivial to see if `free()` was called if something was allocated. The `malloc()` call was hidden in other functions in this case, so this probably won't look as good as what happened in `regex.c`, but it should still help. As far as I can tell, this code didn't leak memory beforehand; so this particular refactoring just makes the code more self-documenting. I like to make things clear if I have any doubt. - This was done in a series of separate commits, each of which should leave the codebase in a working state. Process, testing, and commit layout: The changes to `static.c` and `nfsidmap.c` were done alongside the addition of `passwd_query.c`, and are effectively the starting point for this changeset. This is the stuff I could test easily on my own machine. I could not run the `regex.c` and `nss.c` plugins, but I did make sure they compiled, and each of those was given their own commits. The `gums.c` plugin was something I didn't have dependencies installed for, so compilation was difficult. I still managed to obtain some confidence with it by commenting out all of the code I didn't change and then compiling what I did change, and that worked. I also gave `gums.c` its own commit. I normally like to write unittests for my code so that the (regression) testing can be performed automatically by others, and give you greater confidence in the maintainability of the code. In this case, I think that this would require some kind of mock libc implementation. I couldn't find that and I don't have one. So for this changeset I fell back on testing my code by hand whenever I made changes. I hope this will suffice. Have a great day! - Chad