Hi 2020. november 19., csütörtök 0:06 keltezéssel, Maximilian Luz <luzmaximilian@xxxxxxxxx> írta: > [...] > >> +/** > >> + * ssam_nf_head_destroy() - Deinitialize the given notifier head. > >> + * @nh: The notifier head to deinitialize. > >> + */ > >> +static void ssam_nf_head_destroy(struct ssam_nf_head *nh) > >> +{ > >> + cleanup_srcu_struct(&nh->srcu); > >> +} > > > > I'm also wondering if there's any reason why these static one-liner functions are > > not explicitly marked inline. > > Isn't inline more of a linkage keyword at this point? I fully expect any > compiler to inline things like this automatically at the first hint of > an optimization flag. > I also expect the compiler to inline such functions even without the `inline` specifier. In retrospect I'm not sure why I actually made this comment, possibly because of my preference, but I believe it's fine either way, so my comment is moot, sorry. > [...] > >> + * error values via ssam_notifier_from_errno() to notifier values. > >> + * > >> + * Also note that any callback that could handle an event should return a value > >> + * with bit %SSAM_NOTIF_HANDLED set, indicating that the event does not go > >> + * unhandled/ignored. In case no registered callback could handle an event, > >> + * this function will emit a warning. > >> + * > >> + * In case a callback failed, this function will emit an error message. > >> + */ > >> +static void ssam_nf_call(struct ssam_nf *nf, struct device *dev, u16 rqid, > >> + struct ssam_event *event) > >> +{ > >> + struct ssam_nf_head *nf_head; > >> + int status, nf_ret; > >> + > >> + if (!ssh_rqid_is_event(rqid)) { > >> + dev_warn(dev, "event: unsupported rqid: 0x%04x\n", rqid); > > > > A small note, "%#04x" would insert the "0x" prefix. > > Oh neat, didn't know that. Should be "%#06x" though, right? > Yes, indeed, sorry, it should be "%#06x". > [...] > >> +static int ssam_controller_caps_load_from_acpi( > >> + acpi_handle handle, struct ssam_controller_caps *caps) > >> +{ > >> + u32 d3_closes_handle = false; > > > > Assinging a boolean like this to a `u32` looks very odd to me. > > The value returned by the corresponding DSM call is an integer, but > essentially only contains a bool value (zero vs. one). I thought using > false here explicitly for initialization helps clarify that. That also > makes it consistent with the "caps->d3_closes_handle = false" line > below. > Although I still find it pretty odd, your explanation makes sense to me. > >> +int ssam_request_sync_alloc(size_t payload_len, gfp_t flags, > >> + struct ssam_request_sync **rqst, > >> + struct ssam_span *buffer) > >> +{ > >> + size_t msglen = SSH_COMMAND_MESSAGE_LENGTH(payload_len); > >> + > >> + *rqst = kzalloc(sizeof(**rqst) + msglen, flags); > >> + if (!*rqst) > >> + return -ENOMEM; > >> + > >> + buffer->ptr = (u8 *)(*rqst + 1); > >> + buffer->len = msglen; > >> + > >> + return 0; > >> +} > > > > I think there is a bit of incosistency: sometimes you use ** pointer + return int, > > sometimes you return a pointer with potentially embedded errno. I think it would > > be better if you stuck with one or the other. > > My rule of thumb is: If it's one output parameter, use a pointer as > return value. If it's two or more output parameters (as in this case, > "buffer" is written to), use two pointer arguments and an int as return. > > I noticed that ssam_client_bind doesn't follow that scheme so I'll > change that. > I see, thanks for the explanation, what prompted me to make this comment was that, as you noted, there were functions with one output parameter that were not consistent. > [...] > >> + * Note that the packet completion callback is, in case of success and for a > >> + * sequenced packet, guaranteed to run on the receiver thread, thus providing > >> + * a way to reliably identify responses to the packet. The packet completion > >> + * callback is only run once and it does not indicate that the packet has > >> + * fully left the system (for this, one should rely on the release method, > >> + * triggered when the reference count of the packet reaches zero). In case of > >> + * re-submission (and with somewhat unlikely timing), it may be possible that > >> + * the packet is being re-transmitted while the completion callback runs. > >> + * Completion will occur both on success and internal error, as well as when > >> + * the packet is canceled. > > > > If I understand it correctly, it is possible that submission of a packet fails > > for the first time, but it's scheduled for resubmission, and this retransmission > > happens at the same time when the complete() callback is called. If that's the > > case, then the callback is called with an error condition, no? Thus it is possible > > that a packet is successfully submitted (for the second, third, etc. time), but the > > complete() callback receives notification about failure? Or am I missing something? > > Not sure I can follow you here. What do you mean by "fails for the first time"? > > A couple of notes before I try to explain the error cases: > > - Resubmission only happens on timeout or NAK. > > - Only sequenced packets can be resubmitted (unsequenced are not added > to the pending set, thus do not have a packet timeout and do not > react to NAKs). > > - Completion means the packet gets locked, which prevents it from being > resubmitted and retransmitted, and is removed from all collections. > > - Completion cannot be triggered by NAK. In case a NAK is received and > all tries have been exceeded, the packet still waits for the timeout > (or ACK, whatever happens first) one last time. > > The error cases: > > - serdev: Failure to send via serdev results in immediate completion > with error on the transmitter thread. Due to a NAK (or a _very_ > unlikely timeout), the packet can be queued for a second time at that > point, but, as completion runs on the transmitter thread in this > case, the packet is removed from the queue before the thread goes on > to get the next packet. > > - Timeout: On resubmission, the timeout of a packet is disabled. It is > (re-)armed directly before starting transmission on the transmitter > thread. > > What _could_ now happen is that, for some reason, the transmitter > thread waits the full timeout period between arming the timeout and > actually sending it (which I'd consider highly unlikely). Note that > "actually sending it" includes whatever asynchronous stuff the serdev > core does after serdev_device_write_buf() returns. > > In that case, the packet can be transmitted _after_ it has received a > timeout. So assuming this is the last try, then the packet can be > sent after completion with -ETIMEDOUT. > > I believe this is what you meant? > Yes, that's what I meant, thanks for the clarification and broader explanation about the things at play here. > Again, I consider this highly unlikely as this would require either > the scheduler to pause our thread for a full timeout period or > require the serdev core to basically do nothing for a full timeout > period. I'd argue this is a lot less likely than a dropped ACK packet > from the EC (which results in the same thing: EC receives data but we > think it didn't, causing a timeout error). > > Note that putting the timeout (re-)arming step after the last call to > serdev_device_write_buf() doesn't really help either, as that only > writes to the transport buffer. If we would want to guarantee that > this doesn't happen, we'd have to explicitly wait for the hardware to > finish transmitting, which essentially breaks the whole point of > buffering. > > Also I prefer to have the timeout cover the transmission part as > well. In the very least, this simplifies reasoning about it once I've > restricted it to be (re-)set under the pending lock only (as > mentioned in the message to Andy). > > - Finally, there is completion by cancellation or shutdown, for which > the packet can have been (or currently be) transmitted successfully > but completion returns with -ECANCELLED or -ESHUTDOWN, respectively. > This is the expected behavior in those cases. Again, the completion > procedure takes care of locking the packet and removing all > references from the collections. > > Also there's one success case, when we receive an ACK after a timeout > and during the re-transmission. Then the packet is completed with > success, but might still be transmitted once more. The EC should detect > that packet as re-submission via its packet ID and ignore it. > > >> [...] > > > >> +int ssh_ptl_rx_rcvbuf(struct ssh_ptl *ptl, const u8 *buf, size_t n) > >> +{ > >> + int used; > >> + > >> + if (test_bit(SSH_PTL_SF_SHUTDOWN_BIT, &ptl->state)) > >> + return -ESHUTDOWN; > >> + > >> + used = kfifo_in(&ptl->rx.fifo, buf, n); > > > > Isn't it possible that `n` is greater than the free space in the fifo? What > > happens then? > > If I've read the documentation of kfifo_in() right, it takes at most n > elements. If n is greater than the free space, it will only take as much > elements as it has space. The return value of kfifo_in() is the number > of elements actually consumed. We simply return that here (as elements = > bytes) and ultimately let the serdev/tty core take care of handling > that. > Ahh, sorry, you're right. > >> + if (used) > >> + ssh_ptl_rx_wakeup(ptl); > >> + > >> + return used; > >> +} > [...] > >> + ((struct ssh_frame *)sf.ptr)->len); > > > > Why isn't `get_unaligned_le16()` used here? (Or simply even `sp.len`.) > > This should actually be "sp.len + SSH_MESSAGE_LENGTH(0)", and also > format spezifier %zu with that as SSH_MESSAGE_LENGTH returns size_t. > Yes, indeed, sorry, I missed the `+ SSH_MESSAGE_LENGTH(0)` part. > >> + return -EMSGSIZE; > >> + } > >> [...] > >> + *frame = (struct ssh_frame *)sf.ptr; > > > > This also violates strict aliasing. > > Sure, the whole point of this function (and ssh_parse_command below) is > to get pointers to the data structures in the given buffer, i.e. create > aliases of the right types to the right places in the buffer. Not sure > how I'd do that differently, apart from copying. > > I think that this would even work properly (in practice) without setting > -fno-strict-aliasing, as all data at this point is effectively > read-only. Furthermore, after the call access is restricted to the > non-aliasing parts respectively (frame and payload), whereas the > underlying buffer they do alias isn't accessed for the rest of that > scope (i.e. until the other references don't exist any more). > > Anyway, as far as I know the kernel is built with -fno-strict-aliasing. > Please correct me if I'm wrong or relying on aliasing is frowned upon. > I don't believe it could cause issue here, I just wanted to note it. > [...] > >> +enum ssam_ssh_tc { > >> + /* Known SSH/EC target categories. */ > >> + // category 0x00 is invalid for EC use > >> + SSAM_SSH_TC_SAM = 0x01, // generic system functionality, real-time clock > >> + SSAM_SSH_TC_BAT = 0x02, // battery/power subsystem > >> + SSAM_SSH_TC_TMP = 0x03, // thermal subsystem > >> + SSAM_SSH_TC_PMC = 0x04, > >> + SSAM_SSH_TC_FAN = 0x05, > >> + SSAM_SSH_TC_PoM = 0x06, > >> + SSAM_SSH_TC_DBG = 0x07, > >> + SSAM_SSH_TC_KBD = 0x08, // legacy keyboard (Laptop 1/2) > >> + SSAM_SSH_TC_FWU = 0x09, > >> + SSAM_SSH_TC_UNI = 0x0a, > >> + SSAM_SSH_TC_LPC = 0x0b, > >> + SSAM_SSH_TC_TCL = 0x0c, > >> + SSAM_SSH_TC_SFL = 0x0d, > >> + SSAM_SSH_TC_KIP = 0x0e, > >> + SSAM_SSH_TC_EXT = 0x0f, > >> + SSAM_SSH_TC_BLD = 0x10, > >> + SSAM_SSH_TC_BAS = 0x11, // detachment system (Surface Book 2/3) > >> + SSAM_SSH_TC_SEN = 0x12, > >> + SSAM_SSH_TC_SRQ = 0x13, > >> + SSAM_SSH_TC_MCU = 0x14, > >> + SSAM_SSH_TC_HID = 0x15, // generic HID input subsystem > >> + SSAM_SSH_TC_TCH = 0x16, > >> + SSAM_SSH_TC_BKL = 0x17, > >> + SSAM_SSH_TC_TAM = 0x18, > >> + SSAM_SSH_TC_ACC = 0x19, > >> + SSAM_SSH_TC_UFI = 0x1a, > >> + SSAM_SSH_TC_USC = 0x1b, > >> + SSAM_SSH_TC_PEN = 0x1c, > >> + SSAM_SSH_TC_VID = 0x1d, > >> + SSAM_SSH_TC_AUD = 0x1e, > >> + SSAM_SSH_TC_SMC = 0x1f, > >> + SSAM_SSH_TC_KPD = 0x20, > >> + SSAM_SSH_TC_REG = 0x21, > >> +}; > > > > Is it known what these abbreviations stand for? Maybe I missed them? > > The comments state all we really know about these (through observation > and experimentation). The table itself has been extracted from the > Windows driver, but the abbreviations and values are all we're getting > from it. I see, thanks for the clarification. For some reason, I believed the "Known SSH/EC target categories" comment means that those are known, as in, it is known what they are for, etc., not just the abbreviation. > [...] Regards, Barnabás Pőcze