On Tue, May 25, 2021 at 01:26:23AM +0800, Phi Nguyen wrote: > On 22/5/21 2:33 pm, Al Viro wrote: > > On Sat, May 22, 2021 at 01:34:29PM +0800, Nguyen Dinh Phi wrote: > > > This commit to fix the following sparse warning: > > > incorrect type in assignment (different address spaces) > > > expected void *[assigned] userdata > > > got void [noderef] __user *userdata > > > > > > Signed-off-by: Nguyen Dinh Phi <phind.uet@xxxxxxxxx> > > > --- > > > .../staging/vc04_services/interface/vchiq_arm/vchiq_arm.c | 5 ++++- > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c > > > index afbf01b7364c..2a4fc599f977 100644 > > > --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c > > > +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c > > > @@ -960,7 +960,10 @@ static int vchiq_irq_queue_bulk_tx_rx(struct vchiq_instance *instance, > > > current->pid); > > > userdata = &waiter->bulk_waiter; > > > } else { > > > - userdata = args->userdata; > > > + if (copy_from_user(userdata, args->userdata, sizeof(args->userdata))) { > > > > The contents of userdata (local variable of type void *) is uninitialized at that > > point. > Sorry, That was my mistake. > > > Just what do you think that call of copy_from_user() would do? > > Because according to the definition of struct vchiq_queue_bulk_transfer, the > args->userdata pointer is userspace address. That it is, but I'd asked "what would it do", not "why did it make sparse STFU"... Seriously, the first question should always be "what is it doing", not "what change would make given heuristics stop triggering"; after all, removing that assignment would also eliminate the warning. So would replacing it with if (((unsigned long)args->userdata) % 10000 == 6969) printk(KERN_INFO "address in decimal form is somewhat obscene"); for that matter. Folks, warnings are not oracles or entities to be appeased. Their role is different - they are heuristics pointing to the places that might (or might not) be worth looking into. "It looks odd, something might be mangled nearby, someone better take a look". THAT'S IT. They can't pass the Turing Test; all they can do is flag a funny-looking place for somebody who *can*. And (hopefully) provide some summary of what looks odd there. It's not hard to confuse sparse (or gcc, or clang, or whatever tool you are using) into not noticing anything odd. The thing one should never, ever do is the "I don't understand what's going on there, I don't understand what (if anything) I'd changed, but $TOOL doesn't complain anymore" kind of patches. I'm not familiar with the code in question, so all I can do is to look around and try to figure out what's going on. Let's see: The function in question is vchiq_irq_queue_bulk_tx_rx(). Warning complains about the value of args->userdata being stored in local variable (userdata), possibly mixing userland and kernel pointers. That sits in one of the branches cascade of ifs (happens when args->mode, whatever that might be, is neither VCHIQ_BULK_MODE_BLOCKING nor VCHIQ_BULK_MODE_WAITING, whatever those might be). In other branches the same local variable gets assigned &waiter->bulk_waiter (whatever that, etc.) and at least in one of those waiter had just been returned by kzalloc(), so it definitely can be pointing to kernel memory. In the other branch it's been looked up in some kind of list, and seeing that we dereference other members of the same structure in there, it would better be in kernel memory as well. OK, so far it looks genuine. What do we do with that pointer afterwards? status = vchiq_bulk_transfer(args->handle, NULL, args->data, args->size, userdata, args->mode, dir); Hmm... The value of args->mode had been passed along with it, so it's not impossible that the thing it calls acts differently depending upon that. Next question: where is that thing called and what does it get in args? case VCHIQ_IOC_QUEUE_BULK_TRANSMIT: case VCHIQ_IOC_QUEUE_BULK_RECEIVE: { struct vchiq_queue_bulk_transfer args; struct vchiq_queue_bulk_transfer __user *argp; enum vchiq_bulk_dir dir = (cmd == VCHIQ_IOC_QUEUE_BULK_TRANSMIT) ? VCHIQ_BULK_TRANSMIT : VCHIQ_BULK_RECEIVE; argp = (void __user *)arg; if (copy_from_user(&args, argp, sizeof(args))) { ret = -EFAULT; break; } ret = vchiq_irq_queue_bulk_tx_rx(instance, &args, dir, &argp->mode); } break; Uh-oh... At least here args->mode can be whatever the hell user has passed, and so can args->userdata. Another caller is if (copy_from_user(&args32, argp, sizeof(args32))) return -EFAULT; args = (struct vchiq_queue_bulk_transfer) { .handle = args32.handle, .data = compat_ptr(args32.data), .size = args32.size, .userdata = compat_ptr(args32.userdata), .mode = args32.mode, }; return vchiq_irq_queue_bulk_tx_rx(file->private_data, &args, dir, &argp->mode); and those two look like ioctl() and the variant of the same for 32bit processes. And yes, that's definitely meant to be a userland pointer. It does smell fishy, all right. Now, what is that pointer passed to and what's done with it there? <greps> Looks like it lives at drivers/staging/vc04_services/interface/vchiq_arm/vchiq_core.c:3055 Yup, we do look at the mode there: switch (mode) { case VCHIQ_BULK_MODE_NOCALLBACK: case VCHIQ_BULK_MODE_CALLBACK: break; case VCHIQ_BULK_MODE_BLOCKING: bulk_waiter = userdata; init_completion(&bulk_waiter->event); bulk_waiter->actual = 0; bulk_waiter->bulk = NULL; break; case VCHIQ_BULK_MODE_WAITING: bulk_waiter = userdata; bulk = bulk_waiter->bulk; goto waiting; default: goto error_exit; } So presumably only those 4 values are accepted. In the last two we copy the pointer to (local) bulk_waiter, and that's the cases where we had a kernel pointer passed to that sucker. In the other two we don't use it (yet) and bulk_waiter is... <checks> left NULL. Downstream from there we have this: bulk = &queue->bulks[BULK_INDEX(queue->local_insert)]; bulk->mode = mode; bulk->dir = dir; bulk->userdata = userdata; bulk->size = size; bulk->actual = VCHIQ_BULK_ACTUAL_ABORTED; then spew the pointer into syslog (presumably) in vchiq_log_info(vchiq_core_log_level, "%d: bt (%d->%d) %cx %x@%pad %pK", state->id, service->localport, service->remoteport, dir_char, size, &bulk->data, userdata); and that's it - we don't use that argument anymore. OK... What the hell is that 'bulk' thing, anyway? struct vchiq_bulk *bulk; Let's see... Declared with void *userdata, which isn't saying much; how is it used? Ow... Somebody has an unhealthy attraction to that identifier - a _lot_ of structures with member called that. Oh, well - only 60-odd instances of that word in the vicinity. And most of them refer to different structures... After trimming those: drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c:469: bulk->userdata = NULL; drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c:492: bulk->userdata = NULL; drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c:1017: waiter->bulk_waiter.bulk->userdata = NULL; drivers/staging/vc04_services/interface/vchiq_arm/vchiq_core.c:1276: waiter = bulk->userdata; drivers/staging/vc04_services/interface/vchiq_arm/vchiq_core.c:1295: reason, NULL, bulk->userdata); drivers/staging/vc04_services/interface/vchiq_arm/vchiq_core.c:3133: bulk->userdata = userdata; Sigh... OK, several of those simply store NULL there. NULL is NULL, userland or kernel... The last hit is the assignment quoted above that had brought that structure to our attention. And the two before that are places that look at it. OK... Around line 1276: if (bulk->mode == VCHIQ_BULK_MODE_BLOCKING) { struct bulk_waiter *waiter; spin_lock(&bulk_waiter_spinlock); waiter = bulk->userdata; if (waiter) { waiter->actual = bulk->actual; complete(&waiter->event); } spin_unlock(&bulk_waiter_spinlock); OK, that one wants the kernel pointer and looks like it gets one (MODE_BLOCKING is one of the cases where we store a kernel pointer there, so we should be fine, unless something changes bulk->mode somewhere else). ... and around line 1295 it's } else if (bulk->mode == VCHIQ_BULK_MODE_CALLBACK) { enum vchiq_reason reason = (bulk->dir == VCHIQ_BULK_TRANSMIT) ? ((bulk->actual == VCHIQ_BULK_ACTUAL_ABORTED) ? VCHIQ_BULK_TRANSMIT_ABORTED : VCHIQ_BULK_TRANSMIT_DONE) : ((bulk->actual == VCHIQ_BULK_ACTUAL_ABORTED) ? VCHIQ_BULK_RECEIVE_ABORTED : VCHIQ_BULK_RECEIVE_DONE); status = make_service_callback(service, reason, NULL, bulk->userdata); if (status == VCHIQ_RETRY) break; } Bugger... In this case it's "whatever the userland has passed" and the name of the function we are passing it to is interesting - it's entirely possible that this "userdata" thing is neither kernel nor userland pointer in this case. Or not... Let's check: static inline enum vchiq_status make_service_callback(struct vchiq_service *service, enum vchiq_reason reason, struct vchiq_header *header, void *bulk_userdata) { and we are dealing with the last argument. OK... status = service->base.callback(reason, header, service->handle, bulk_userdata); Passed as-is to whatever's that .callback() method is (another grep-friendly name, that...) <greps> drivers/staging/vc04_services/interface/vchiq_arm/vchiq_core.c:2379: service->base.callback = params->callback; Copies from the member of struct vchiq_service_params_kernel of the same name, and the rest are assignments to that: drivers/staging/vc04_services/bcm2835-audio/bcm2835-vchiq.c:129: .callback = audio_vchi_callback, drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c:834: .callback = service_callback, drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c:2234: .callback = vchiq_keepalive_vchiq_callback, drivers/staging/vc04_services/vchiq-mmal/mmal-vchiq.c:1870: .callback = service_callback, OK, so it's one of the 4 functions (both 'service_callback' are static in their files). Let's see... audio_vchi_callback() ignores its last argument completely and so does vchiq_keepalive_vchiq_callback(). vchiq_arm.c service_callback() is... ouch. OK, it gets spewed into syslog, then we have status = add_completion(instance, reason, NULL, user_service, bulk_userdata); and return add_completion(instance, reason, header, user_service, bulk_userdata); and add_completion() is doing what? completion->bulk_userdata = bulk_userdata; <obscenities> All right, what's _that_ about? It's in struct vchiq_completion_data_kernel { enum vchiq_reason reason; struct vchiq_header *header; void *service_userdata; void *bulk_userdata; }; and it's used only in /* * FIXME: address space mismatch, does bulk_userdata * actually point to user or kernel memory? */ user_completion.bulk_userdata = completion->bulk_userdata; if (vchiq_put_completion(args->buf, &user_completion, ret)) { if (ret == 0) ret = -EFAULT; break; } that put_completion thing being basically copy_to_user(). OK, so that looks like an opaque cookie identifying user request(?) and passed back to userland when some event happens. All right, assuming the genuine kernel pointers can't leak out that way... The last instance (mmal one): static enum vchiq_status service_callback(enum vchiq_reason reason, struct vchiq_header *header, unsigned int handle, void *bulk_ctx) and the last argument is used in bulk_receive_cb(instance, bulk_ctx); and bulk_abort_cb(instance, bulk_ctx); What. The. Hell? static void bulk_receive_cb(struct vchiq_mmal_instance *instance, struct mmal_msg_context *msg_context) { msg_context->u.bulk.status = 0; /* schedule the port callback */ schedule_work(&msg_context->u.bulk.work); } _that_ is definitely expecting a kernel pointer. And so does static void bulk_abort_cb(struct vchiq_mmal_instance *instance, struct mmal_msg_context *msg_context) { pr_err("%s: bulk ABORTED msg_context:%p\n", __func__, msg_context); msg_context->u.bulk.status = -EINTR; schedule_work(&msg_context->u.bulk.work); } Incidentally, other callers of make_service_callback() are passing NULL to it, apparenly ending up with NULL passed all the way down to those two. How does it manage to avoid oopsing? No, wait - it's actually even messier than that. There are several ways towards that assignment of bulk->userdata. vchiq_irq_queue_bulk_tx_rx() is not the only thing that calls vchiq_bulk_transfer(). There's also vchiq_bulk_transmit(), vchiq_bulk_receive() and vchiq_blocking_bulk_transfer(). The last one passes VCHIQ_BULK_MODE_BLOCKING as mode and kernel pointer as 'userdata'; that's an expected combination. vchiq_bulk_transmit() is called with VCHIQ_BULK_MODE_BLOCKING / NULL combination. And vchiq_bulk_receive() is used by that mmal thing (and nowhere else) and it gets VCHIQ_BULK_MODE_CALLBACK / msg_context (kernel pointer). vchiq_irq_queue_bulk_tx_rx() is not used by mmal at all. That's how it (presumably) manages to avoid stepping into oopsen there. So... Looks like the rules for userdata argument of vchiq_bulk_transfer() are * if mode is VCHIQ_BULK_MODE_BLOCKING or VCHIQ_BULK_MODE_WAITING, it's a (kernel) pointer to struct bulk_waiter * if mode is VCHIQ_BULK_MODE_NOCALLBACK, completely ignored * if mode is VCHIQ_BULK_MODE_CALLBACK, whatever the fuck is expected by service->base.callback when 'reason' is one of VCHIQ_BULK_{SEND,RECEIVE}_{ABORTED,DONE}; type safety? Wassat? Avoiding mismatches is up to whatever leads to call of vchiq_bulk_transfer()... For vchiq-mmal it's struct mmal_msg_context *; for vchiq_irq_queue_bulk_tx_rx() (ioctls) it's an opaque cookie supplied by userland and copied back on completion. And AFAICS nothing else can lead down that way... Oh, and 'userdata' is a fairly gross misnomer. Misannotated one in case of struct vchiq_queue_bulk_transfer. I would replace it with unsigned long, but it's much too late for that. Conclusions: * the code in question most definitely does smell fishy, along with rather wide area around it. As in "who the hell has dumped a truckload of herring on hot tarmac and let it sit there for days?" * that "userland pointer" is actually misannotated; as far as the kernel is concerned, it's an opaque 32bit or 64bit value that might or might not contain an address of some sort - the kernel never uses it for any kind of memory access. * no, we do *NOT* want to copy whatever it might be pointing to into the kernel space. Incidentally, if it *is* a pointer, we have no way to tell what it points to, so how much would we copy into the kernel space anyway? * vendor drivers suck. Looking into those can easily lead to acute nausea; if analysis starts to uncover something _that_ ugly, the best thing to do is to pass it to somebody else. This one is among the worse ones - usually the things are milder.