Jeff Layton wrote: > On Tue, 16 Feb 2010 14:24:09 -0500 > Peter Staubach <staubach@xxxxxxxxxx> wrote: > >> Jeff Layton wrote: >>> This is the second attempt at this patch. The main changes are that this >>> one doesn't set a floor value for the size of the group list. There are >>> also a few minor cleanups and comments added. >>> >>> If authunix_create_default() is called by a user with more than 16 >>> supplimental groups, it'll abort(), which causes the program to crash >>> and coredump. >>> >>> Fix it to handle this situation gracefully. Get the number of groups >>> that the user has first, and then allocate a big enough buffer to hold >>> them. Then, just don't let the lower function use more than the NGRPS >>> groups. >>> >>> Also fix up the error handling in this function so that it just returns >>> a NULL pointer on error and logs a message via warnx() instead of >>> calling abort(). >>> >>> Reported-by: Peter Engel <peter.engel@xxxxxxx> >>> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> >>> --- >>> src/auth_unix.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++----- >>> 1 files changed, 56 insertions(+), 6 deletions(-) >>> >>> diff --git a/src/auth_unix.c b/src/auth_unix.c >>> index 71ca15d..a295e71 100644 >>> --- a/src/auth_unix.c >>> +++ b/src/auth_unix.c >>> @@ -49,6 +49,7 @@ >>> #include <stdlib.h> >>> #include <unistd.h> >>> #include <string.h> >>> +#include <errno.h> >>> >>> #include <rpc/types.h> >>> #include <rpc/xdr.h> >>> @@ -175,20 +176,69 @@ AUTH * >>> authunix_create_default() >>> { >>> int len; >>> + size_t bufsize = 0; >>> char machname[MAXHOSTNAMELEN + 1]; >>> uid_t uid; >>> gid_t gid; >>> - gid_t gids[NGRPS]; >>> + gid_t *gids = NULL; >>> + AUTH *auth; >>> + >>> + if (gethostname(machname, sizeof machname) == -1) { >>> + warnx("%s: gethostname() failed: %s", __func__, >>> + strerror(errno)); >>> + return NULL; >>> + } >>> >>> - if (gethostname(machname, sizeof machname) == -1) >>> - abort(); >>> machname[sizeof(machname) - 1] = 0; >>> uid = geteuid(); >>> gid = getegid(); >>> - if ((len = getgroups(NGRPS, gids)) < 0) >>> - abort(); >>> + >>> +retry: >>> + len = getgroups(0, NULL); >>> + if (len < 0) { >>> + warnx("%s: failed to get number of groups: %s", __func__, >>> + strerror(errno)); >>> + return NULL; >>> + } >>> + >>> + if (len == 0) >>> + goto no_groups; >>> + >>> + bufsize = len * sizeof(gid_t); >>> + gids = mem_alloc(bufsize); >>> + if (gids == NULL) { >>> + warnx("%s: memory allocation failed: %s", __func__, >>> + strerror(ENOMEM)); >>> + return NULL; >>> + } >>> + >>> + len = getgroups(len, gids); >>> + if (len < 0) { >>> + mem_free(gids, bufsize); >>> + /* >>> + * glibc equivalent routines mention that it's possible for >>> + * the number of groups to change between two getgroups calls. >>> + * If that happens, retry the whole thing again. >>> + */ >>> + if (len == -EINVAL) { >>> + gids = NULL; >>> + bufsize = 0; >>> + goto retry; >>> + } >>> + warnx("%s: failed to get group list: %s", __func__, >>> + strerror(errno)); >>> + return NULL; >>> + } >>> + >>> + /* AUTH_UNIX has a hard limit of NGRPS supplemental groups */ >>> + if (len > NGRPS) >>> + len = NGRPS; >>> + >>> +no_groups: >>> /* XXX: interface problem; those should all have been unsigned */ >>> - return (authunix_create(machname, uid, gid, len, gids)); >>> + auth = authunix_create(machname, uid, gid, len, gids); >>> + mem_free(gids, bufsize); >>> + return auth; >>> } >>> >>> /* >> This change to restrict the groups used to the first NGRPS >> groups is one that we have always avoided. It can be quite >> confusing to the user, to have an operation fail, but then >> to look and notice that the correct group is listed. >> >> Having the library abort seems odd and wrong, but this will >> also change semantics. Is there really a problem here, >> after all of these years, that must be addressed? >> > > Yes, we had a bug report against nfs-utils where someone was seeing > umount.nfs core dump because root was in >NGRPS groups. > > https://bugzilla.redhat.com/show_bug.cgi?id=565507 > > ...as Chuck points out, this is basically what glibc has done for > years. While it may be confusing for users, I think it's more > preferable than having the programs fail outright. > > What sort of behavior would you propose in its place? > Well, I do agree that a library routine should not call abort(). That's just pure and simple, wrong. That bug must be fixed. However, making a policy decision for the user also seems wrong. Just because glibc did something does not make it right. It does tend to make it a standard, so the semantic change argument becomes weaker, but this change was made to a different RPC library, so the application must have/be changed to use it, so all bets are off again. My inclination would be to return NULL and see who complains. ps -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html