> > I think I should elaborate on the design underlying this submission; > > roughly, the present solution diverges from the 'generic' requestor > > mechanism you mentioned above in two main aspects: > > > > A) it 'moves' the ID removal into hv_compose_msi_msg() and other > > functions, > > Right. A key implication is that this patch allows the completion > function to be called multiple times, if Hyper-V were to be malicious > and send multiple responses with the same requestID. This is OK as > long as the completion functions are idempotent, which after looking, > I think they are in this driver. > > Furthermore, this patch allows the completion function to run anytime > between when the requestID is created and when it is deleted. This > patch creates the requestID just before calling vmbus_sendpacket(), > which is good. The requestID is deleted later in the various functions. > I saw only one potential problem, which is in new_pcichild_device(), > where the new hpdev is added to a global list before the requestID is > deleted. There's a window where the completion function could run > and update the probed_bar[] values asynchronously after the hpdev is > on the global list. I don't know if this is a problem or not, but it could > be prevented by deleting the requestID a little earlier in the function. > > > > > B) it adopts some ad-hoc locking scheme in the channel callback. > > > > AFAICT, such changes preserve the 'confidentiality' and correctness > > guarantees of the generic approach (modulo the issue discussed here > > with Saurabh). > > Yes, I agree, assuming the current functionality of the completion > functions. > > > > > These changes are justified by the bug/fix discussed in 2/2. For > > concreteness, consider a solution based on the VMbus requestor as > > reported at the end of this email. > > > > AFAICT, this solution can't fix the bug discussed in 2/2. Moreover > > (and looking back at (A-B)), we observe that: > > > > 1) locking in the channel callback is not quite as desired: we'd > > want a request_addr_callback_nolock() say and 'protected' it > > together with ->completion_func(); > > I'm not understanding this point. Could you clarify? Basically (on top of the previous diff): @@ -2700,6 +2725,7 @@ static void hv_pci_onchannelcallback(void *context) int ret; struct hv_pcibus_device *hbus = context; struct vmbus_channel *chan = hbus->hdev->channel; + struct vmbus_requestor *rqstor = &chan->requestor; u32 bytes_recvd; u64 req_id, req_addr; struct vmpacket_descriptor *desc; @@ -2713,6 +2739,7 @@ static void hv_pci_onchannelcallback(void *context) struct pci_dev_inval_block *inval; struct pci_dev_incoming *dev_message; struct hv_pci_dev *hpdev; + unsigned long flags; buffer = kmalloc(bufferlen, GFP_ATOMIC); if (!buffer) @@ -2747,8 +2774,10 @@ static void hv_pci_onchannelcallback(void *context) switch (desc->type) { case VM_PKT_COMP: - req_addr = chan->request_addr_callback(chan, req_id); + spin_lock_irqsave(&rqstor->req_lock, flags); + req_addr = __hv_pci_request_addr(chan, req_id); if (!req_addr || req_addr == VMBUS_RQST_ERROR) { + spin_unlock_irqrestore(&rqstor->req_lock, flags); dev_warn_ratelimited(&hbus->hdev->device, "Invalid request ID\n"); break; @@ -2758,6 +2787,7 @@ static void hv_pci_onchannelcallback(void *context) comp_packet->completion_func(comp_packet->compl_ctxt, response, bytes_recvd); + spin_unlock_irqrestore(&rqstor->req_lock, flags); break; case VM_PKT_DATA_INBAND: where I renamed request_addr_callback_nolock() to __hv_pci_request_addr() (this being as in vmbus_request_addr() but without the requestor lock). A "locked" callback would still be wanted and used in, e.g., the failure path of hv_ringbuffer_write(). > > 2) hv_compose_msi_msg() doesn't know the value of the request ID > > it has allocated (hv_compose_msi_msg() -> vmbus_sendpacket(); > > cf. also remove_request_id() in the current submission). > > Agreed. This would have to be addressed by adding another version of > vmbus_sendpacket() that returns the request ID. Indeed... Some care would be needed to make sure that that "ID removal" can't "race" with hv_pci_onchannelcallback() (which could have removed the ID now), but yes... > > Hope this helps clarify the problems at stake, and move forward to a > > 'final' solution... > > I think there's a reasonable way for the vmbus_next_request_id() > mechanism to solve the problem in Patch 2/2 (if a new version of > vmbus_sendpacket is added). To me, that mechanism seems safer > in that it restricts the completion function to running just once > per requestID. With this patch, we must remember that the > completion functions must remain idempotent. Fair enough. Thank you for bearing with me and patiently reviewing these matters. Working out the details... Thanks, Andrea