Thank you Hans for the scrutiny. Please find my comments inline. > -----Original Message----- > From: Hans de Goede <hdegoede@xxxxxxxxxx> > Sent: Wednesday, June 2, 2021 4:49 PM > To: K Naduvalath, Sumesh <sumesh.k.naduvalath@xxxxxxxxx>; > mgross@xxxxxxxxxxxxxxx; srinivas.pandruvada@xxxxxxxxxxxxxxx > Cc: Pandruvada, Srinivas <srinivas.pandruvada@xxxxxxxxx>; platform-driver- > x86@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Chinnu, Ganapathi > <ganapathi.chinnu@xxxxxxxxx>; Kumar, Nachiketa > <nachiketa.kumar@xxxxxxxxx> > Subject: Re: [PATCH 1/1] ishtp: Add support for Intel ishtp eclite driver > > Hi Sumesh, > > On 6/1/21 8:24 PM, K Naduvalath, Sumesh wrote: > > Thank you Hans for the review comments. Please find the reply inline - > > > >> -----Original Message----- > >> From: Hans de Goede <hdegoede@xxxxxxxxxx> > >> Sent: Monday, May 31, 2021 8:25 PM > >> To: K Naduvalath, Sumesh <sumesh.k.naduvalath@xxxxxxxxx>; > >> mgross@xxxxxxxxxxxxxxx; srinivas.pandruvada@xxxxxxxxxxxxxxx > >> Cc: Pandruvada, Srinivas <srinivas.pandruvada@xxxxxxxxx>; > >> platform-driver- x86@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > >> Chinnu, Ganapathi <ganapathi.chinnu@xxxxxxxxx>; Kumar, Nachiketa > >> <nachiketa.kumar@xxxxxxxxx> > >> Subject: Re: [PATCH 1/1] ishtp: Add support for Intel ishtp eclite > >> driver > >> > >> Hi, > >> > >> Thank you, I've done a quick review, which has already spotted quite > >> a few issues. Note I will probably do a more thorough review later > >> which mind find more issues, but lets start with fixing the serious > >> issues which this review has found. > > <snip> > > >>> diff --git a/drivers/platform/x86/Kconfig > >>> b/drivers/platform/x86/Kconfig index 60592fb88e7a..cfa2cb150909 > >>> 100644 > >>> --- a/drivers/platform/x86/Kconfig > >>> +++ b/drivers/platform/x86/Kconfig > >>> @@ -1180,6 +1180,19 @@ config INTEL_CHTDC_TI_PWRBTN > >>> To compile this driver as a module, choose M here: the module > >>> will be called intel_chtdc_ti_pwrbtn. > >>> > >>> +config INTEL_ISHTP_ECLITE > >>> + tristate "Intel ISHTP eclite controller" > >>> + depends on INTEL_ISH_HID > >>> + depends on ACPI > >>> + help > >>> + This driver is for accessing the PSE(Programmable Service Engine), > >>> + an Embedded Controller like IP, using ISHTP(Integratd Sensor Hub > >>> + Transport Protocol) to get battery, thermal and UCSI (USB Type-C > >>> + Connector System Software Interface) related data from > >>> +the > >> platform. > >> > >> This text has all the same issues as the commit message. Also please > >> explain on what sort of systems this functionality is typically > >> found/used so that users will have a better idea if this is something > >> which they should enable on their systems. > >> > > > > This functionality is enabled for the first time for Intel's Elkhartlake > platform. > > Users who don't want to use discrete Embedded Controller on their > > platform, they can leverage the integrated solution of ECLite which is > > part of Elkhartlake's PSE subsystem. I'll add more text for the config item. > > Thank you for the info. > > <snip> > > >>> +static acpi_status > >>> +ecl_opregion_cmd_handler(u32 function, acpi_physical_address > address, > >>> + u32 bits, u64 *value64, > >>> + void *handler_context, void *region_context) { > >>> + struct ishtp_opregion_dev *opr_dev; > >>> + struct opregion_cmd *cmd; > >>> + > >>> + if (!region_context || !value64) > >>> + return AE_BAD_PARAMETER; > >>> + > >>> + if (function == ACPI_READ) > >>> + return AE_ERROR; > >>> + > >>> + opr_dev = (struct ishtp_opregion_dev *)region_context; > >>> + cmd = &opr_dev->opr_context.cmd_area; > >> > >> This is shared between all threads, we have had issues with sharing > >> opregion context between threads like this in the past. > >> > >> What is stopping ACPI code from trying to use the opr_context from > >> multiple threads at the same time, messing things up? > >> > > > > This will not happen. BIOS calls ACPI methods (cmd and data in this > > driver) in a SERIALIZED manner. BIOS issues another call only after finishing > the first one. > > You say that this will not happen, but the problem which I have with the > OpRegion API is that this may actually very well happen. There are no > guarantees of this not happening. We are relying on the AML code from the > BIOS adding the "Serialized" attribute to all functions accessing the OpRegion, > all that is necessary is one bug in the AML code and then this will happen. > > And it is typically very hard to get vendors to issue BIOS updates to fix things > like this :| > > I must honestly say that I find the whole design of the OpRegion API lacking. > There are other options to interface AML code, like e.g. modelling the ISHTP > as a generic serial bus, then a single OpRegion access can do the > following: > > 1. AML fills a buffer > 2. OpRegion call processes buffer and writes back results (status code + > data read if it is a read) to buffer > 3. AML processes buffer > > This is e.g. used in some Microsoft Surface 3 devices, see: > drivers/platform/surface/surface3_power.c > > This would make the calls more-or-less atomic, removing the need for the > ACPI Methods to all be Serialized. > > This would also allow the Opregion to return an error status code when the > EClite is not ready, instead of unregistering + reregistering the OpRegion, > which in itself is another source of possible races (see below). > > I assume that it is too late to change the OpRegion API now, so that we are > stuck with this ? > > I must say I'm disappointed about the quality of the OpRegion API design > here. APIs should be designed so that it is easy / natural to use them in a > correct way, where as this API seems to be dessgned in such a way that it is > easy to use it the wrong way. > > I really expect Intel to do better the next time the introduce a new OpRegion > for something. It would help greatly if Intel would send some rough sketches > of how the API is going to look like to the mailinglist, so that we can actually > correct issues like these, rather then the API already being set in stone and > that we end up having to live with a not so good API. > I understand your concern and will certainly take this feedback for the next gen/integration of the product. Similar approach is being discussed already for nextgen. But timeline is a concern now here for this. > >> I'm especially worried about the offset + data_len used in various > >> places, even if we add checks for this, this could be changed > >> underneath us by another thread. > > > > There are checks in BIOS for offset + data, but I'll add them in the driver as > well. > > There is no other thread accessing it. Flow below - > > > > ACPI method --> cmd --> > > <--end ACPI method <-- > > > > <here no other ACPI method will execute because of serialized method> > > > > ACPI method --> data--> > > <--end ACPI method <-- > > > >> > >> You should add a mutex to the opr_dev and lock it in this function so > >> that the cmd struct cannot be changed underneath us while we are > processing it. > >> > >> Note this does not fully protect against races like this: > >> > >> 1. Thread a sets offset > >> 2. Thread b sets a different offset > >> 3. Thread a writes ECL_ISH_READ to command. > > > > These race conditions won't occur. No structure elements are used from > > two paths simultaneously. Only element from the opregion structure > > used outside ACPI path is > > > > opr_dev->dsm_event_id from event_work workqueque. > > > > But this element not accessed from anywhere else including serialized ACPI > path. > > I understand that the intention is for all the ACPI calls to be serialized, but > nothing is guaranteeing this. All it requires is one method not being marked > as serialized... > I get your point here too. I'll fix in v2. > >> This will result in thread a getting the data at the offset specified > >> by thread b, rather then at the offset which it requested. But there > >> is nothing we can do about that, that needs to be solved with > synchronization at the AML level. > > > > There is ASL serialization defined, But are you suggesting to put > > locks for the fairness of coding? > > I don't want the kernel to read / write beyond the end of kernel-owned > memory because the following happens (this assumes one AML method is > missing the Serialized > attribute): > > 1. #define ECL_DATA_OPR_BUFLEN 384 > 2. Thread A sets offset to 0 > 3. Thread A sets len to 100 > 4. Thread A writes ECL_ISH_WRITE > 5. Thread A kernel checks (offset + len) < ECL_DATA_OPR_BUFLEN, this is ok > 6. Thread B sets offset to 300 7. Thread A kernel does > memcpy(message.payload, opr_dev->opr_context.data_area.data + > message.header.offset, message.header.data_len); > > 7. Will now end up reading 100 bytes from offset 300 of > opr_context.data_area.data, which is only 384 bytes big, which needless to > say is not good. > I see now that the offset + len are copied into message.header and then the > copied values are used. It is tempting to think that this thus can be fixed by > doing a check on the copied values, but the message struct is a local variable, > so the compiler will alias the 2 values and might decide to do the check on the > originals; and the compiler is also free to re-order things. So this making of a > local copy does not help. > > This is a classic time-of-check vs time-of-use problem. Since we cannot > _guarantee_ that all users of the OpRegion are properly using the ACPI > Serialized Method attribute, we need some other mechanism to ensure that > len + offset do not change between checking and consuming them. The most > straight forward solution here is to protect opr_context with a mutex so that > it cannot be changed by another thread while one call is in progress. Since all > accesses should be serialized at the AML level anyways this will not add extra > serialization, while it will help avoid AML bugs turning into out-of-bounds > memory accesses inside the kernel. > Sure. Mutex and length checks will be added in v2. > <snip> > > >>> +static acpi_status > >>> +ecl_opregion_data_handler(u32 function, acpi_physical_address > address, > >>> + u32 bits, u64 *value64, > >>> + void *handler_context, void *region_context) { > >>> + struct ishtp_opregion_dev *opr_dev; > >>> + unsigned int bytes = BITS_TO_BYTES(bits); > >>> + void *data_addr; > >>> + > >>> + if (!region_context || !value64) > >>> + return AE_BAD_PARAMETER; > >>> + > >>> + if (address + bytes > ECL_DATA_OPR_BUFLEN) > >>> + return AE_BAD_PARAMETER; > >>> + > >>> + opr_dev = (struct ishtp_opregion_dev *)region_context; > >>> + data_addr = &opr_dev->opr_context.data_area.data[address]; > >>> + > >>> + if (function == ACPI_READ) > >>> + memcpy(value64, data_addr, bytes); > >>> + else if (function == ACPI_WRITE) > >>> + memcpy(data_addr, value64, bytes); > >> > >> What if bits is not a multiple of 8? Then we have just overwritten a > >> bunch of bits which the caller did not request us to do. > >> > >> Since the caller specifies bits, this should really do a > >> read-modify-write of the last byte when there are any "leftover" > >> bits. ? What does the documentation say about this? > > > > The request will always be multiple of 8 bits as per ASL definition/docs. > > Ok. > > > >>> + else > >>> + return AE_BAD_PARAMETER; > >>> + > >>> + return AE_OK; > >>> +} > >>> + > >>> +static int acpi_opregion_init(struct ishtp_opregion_dev *opr_dev) { > >>> + acpi_status status; > >>> + struct acpi_device *adev; > >>> + > >>> + /* Find ECLite device and install opregion handlers */ > >>> + adev = acpi_dev_get_first_match_dev("INTC1035", NULL, -1); > >>> + if (!adev) { > >>> + dev_err(cl_data_to_dev(opr_dev), "eclite ACPI device not > >> found\n"); > >>> + return -ENODEV; > >>> + } > >>> + > >>> + opr_dev->acpi_handle = adev->handle; > >> > >> acpi_opregion_init() seem to get called on every resume, doing the > >> lookup is only necessary once, after that the cached value in > >> opr_dev->acpi_handle can be reused. > >> > >> More importantly this whole dance of unregistering + re-registering > >> the opregion seems unnecessary. You already have ish_link_ready in > >> case the opregion gets called before things are ready; and if the > >> opregion is called when the link is not ready, still having the > >> opregion handler in place allows you to log a sensible error about what is > going on which is what we want. > > > > Initial approach was same as you suggested ( No uninstall, just wait > > in ACPI Method). But after every resume, the driver gets ACPI write > > and read requests for FAN and thermal controls which fails because > > link is not ready. Also we can't wait_event_interruptible_timeout() in > > ACPI method until we get the link_ready( link can become ready much > > later until PSE fully boots up after every Sx. Anything else gets executed > after this wait_event timeout fail in ACPI method. > > We can't afford to miss any critical thermal/FAN related setting from > > standby/ hibernation resume. No requests are missed by registering > > opregion only after link_ready. > > I see; and since the OpRegion API does not allow returning a status code to > communicate the link-not-ready thing you need to do this unregister + re- > register dance instead. See this is what I meant above when I wrote that the > whole OpRegion API seems lacking. This could all have been solved by having > the OpRegion calls communicate back a status code. > > Now you are relying on communicating the availability of the link by calling > _REG instead. This means that we better hope that all users of the OpRegion > will correctly check the flag set by _REG, which in my experience with a lot of > AML code is something which is often forgotten, so this will be another > source of DSDT/AML bugs for AML code using this OpRegion <sigh>. > > Also note that this is racy too, on suspend we unregister the OpRegion, > calling _REG in the process. But _REG methods typically are not Serialized so > now the following may happen: > > 1. Thread A is executing a function using the EClite OpRegion 2. Thread A > checks the flag set by _REG, sees the region is available 3. Thread B is > executing suspend for the EClite device, unregister the > OpRegion, calling _REG to clear the flag 4. Thread A continues with actually > accessing the no longer available > OPRegion leading to an error because there is no Opregion handler > registered. > > So this means that _REG should be marked Serialized for this device, I > wonder if it actually is marked as such in the DSDT of your test hardware ? > > I think we might get away with this regardless because 3 probably takes a rw- > lock on the ACPI namespace which it can only get if no other threads are > executing any AML code (I'm speculating here I did not check), but this is yet > another example about how this entire approach is less then ideal. > I'll check again and come back on this with a better approach to protect from race. I'll incorporate the changes in v2 with better solution. And sure, these points are taken as positive feedback. Thanks. > <snip> > > >>> +static void ecl_ishtp_cl_deinit(struct ishtp_cl *ecl_ishtp_cl) { > >>> + ishtp_cl_unlink(ecl_ishtp_cl); > >>> + ishtp_cl_flush_queues(ecl_ishtp_cl); > >>> + ishtp_cl_free(ecl_ishtp_cl); > >>> +} > >>> + > >>> +static void ecl_ishtp_cl_reset_handler(struct work_struct *work) { > >>> + struct ishtp_opregion_dev *opr_dev; > >>> + struct ishtp_cl_device *cl_device; > >>> + struct ishtp_cl *ecl_ishtp_cl; > >>> + int rv; > >>> + int retry; > >>> + > >>> + opr_dev = container_of(work, struct ishtp_opregion_dev, > >> reset_work); > >>> + > >>> + opr_dev->ish_link_ready = false; > >>> + > >>> + cl_device = opr_dev->cl_device; > >>> + ecl_ishtp_cl = opr_dev->ecl_ishtp_cl; > >>> + > >>> + ecl_ishtp_cl_deinit(ecl_ishtp_cl); > >>> + > >>> + ecl_ishtp_cl = ishtp_cl_allocate(cl_device); > >>> + if (!ecl_ishtp_cl) > >>> + return; > >> > >> Is this whole freeing + re-alloc of the ISHTP client here really > >> necessary ? This seems like overkill. > > > > This is required. reset is an asynchronous notification from ISH (PSE > > in this case) firmware and current connection becomes stale and needs > > to be reinitialized. All ISHTP client drivers are implemented same way. > > eg. > > drivers/hid/intel-ish-hid/ishtp-hid-client.c > > drivers/hid/intel-ish-hid/ishtp-fw-loader.c > > Ok. > > <snip> > > >>> +static int ecl_ishtp_cl_probe(struct ishtp_cl_device *cl_device) { > >>> + struct ishtp_cl *ecl_ishtp_cl; > >>> + struct ishtp_opregion_dev *opr_dev; > >>> + int rv; > >>> + > >>> + opr_dev = devm_kzalloc(ishtp_device(cl_device), sizeof(*opr_dev), > >>> + GFP_KERNEL); > >>> + if (!opr_dev) > >>> + return -ENOMEM; > >>> + > >>> + ecl_ishtp_cl = ishtp_cl_allocate(cl_device); > >>> + if (!ecl_ishtp_cl) > >>> + return -ENOMEM; > >>> + > >>> + ishtp_set_drvdata(cl_device, ecl_ishtp_cl); > >>> + ishtp_set_client_data(ecl_ishtp_cl, opr_dev); > >>> + opr_dev->ecl_ishtp_cl = ecl_ishtp_cl; > >>> + opr_dev->cl_device = cl_device; > >>> + > >>> + init_waitqueue_head(&opr_dev->read_wait); > >>> + INIT_WORK(&opr_dev->event_work, ecl_acpi_invoke_dsm); > >>> + INIT_WORK(&opr_dev->reset_work, ecl_ishtp_cl_reset_handler); > >>> + > >>> + /* Initialize ish client device */ > >>> + rv = ecl_ishtp_cl_init(ecl_ishtp_cl); > >>> + if (rv) { > >>> + dev_err(cl_data_to_dev(opr_dev), "Client init failed\n"); > >>> + goto err_exit; > >>> + } > >>> + > >>> + dev_dbg(cl_data_to_dev(opr_dev), "eclite-ishtp client > >>> +initialised\n"); > >>> + > >>> + /* Register a handler for eclite fw events */ > >>> + ishtp_register_event_cb(cl_device, ecl_ishtp_cl_event_cb); > >>> + > >>> + ishtp_get_device(cl_device); > >> > >> This seems weird, normally in the device-model a driver being bound > >> to a device guarantees that that device cannot go away before the > >> remove callback of the driver is called. > >> > >> So it seems to me that this call + the put in both the err_exit and > >> ecl_ishtp_cl_remove() cases can be dropped. > >> > > This platform driver has two interfaces - a)ISH and b)ACPI. > > ISH initializes first and if successful we ishtp_get_device(). > > Then ACPI initializes after that. If ACPI init fails, driver gets > > cleaned with ISHTP as well thus ishtp_get_device() in probe's > > err_exit. Both interface must be Required and initialized for the > functionality. > > > > Do you still see a problem? > > I don't see a problem, but I don't think that the ishtp_get_device() here and > the 2 ishtp_put_device() calls below are necessary. > > This is the probe function for a ishtp_cl_driver as long as that driver is bound > to the cl_device (which gets passed as parameter to the probe()) the > cl_device can never go away, not until the matching remove function from > the ishtp_cl_driver has been called, so the get at probe + the put at remove > are not necessary, since the cl_device already must stick around for as long > as the driver is bound. > > Another way of looking at this is that the linux device-model/driver core > already does a get when it binds a driver to the device (and a put on > remove). > Ok. I'll cross check with ish gurus if there is any concern with this. I'll remove them if there isn't any. > > >>> + > >>> + opr_dev->ish_link_ready = true; > >>> + > >>> + /* Now find ACPI device and init opregion handlers */ > >>> + rv = acpi_opregion_init(opr_dev); > >>> + if (rv) { > >>> + dev_err(cl_data_to_dev(opr_dev), "ACPI opregion init > >> failed\n"); > >>> + > >>> + goto err_exit; > >>> + } > >>> + > >>> + /* Reprobe devices depending on ECLite - battery, fan, etc. */ > >>> + acpi_walk_dep_device_list(opr_dev->acpi_handle); > >>> + > >>> + return 0; > >>> +err_exit: > >>> + ishtp_set_connection_state(ecl_ishtp_cl, > >> ISHTP_CL_DISCONNECTING); > >>> + ishtp_cl_disconnect(ecl_ishtp_cl); > >>> + ecl_ishtp_cl_deinit(ecl_ishtp_cl); > >>> + > >>> + ishtp_put_device(cl_device); > >>> + > >>> + return rv; > >>> +} > >>> + > >>> +static int ecl_ishtp_cl_remove(struct ishtp_cl_device *cl_device) { > >>> + struct ishtp_cl *ecl_ishtp_cl = ishtp_get_drvdata(cl_device); > >>> + struct ishtp_opregion_dev *opr_dev = > >>> + ishtp_get_client_data(ecl_ishtp_cl); > >>> + > >>> + if (opr_dev->acpi_init_done) > >>> + acpi_opregion_deinit(opr_dev); > >>> + > >>> + cancel_work_sync(&opr_dev->reset_work); > >>> + cancel_work_sync(&opr_dev->event_work); > >>> + > >>> + ishtp_set_connection_state(ecl_ishtp_cl, > >> ISHTP_CL_DISCONNECTING); > >>> + ishtp_cl_disconnect(ecl_ishtp_cl); > >>> + ecl_ishtp_cl_deinit(ecl_ishtp_cl); > >>> + > >>> + ishtp_put_device(cl_device); > >>> + > >>> + return 0; > >>> +} > >>> + > >>> +static int ecl_ishtp_cl_reset(struct ishtp_cl_device *cl_device) { > >>> + struct ishtp_cl *ecl_ishtp_cl = ishtp_get_drvdata(cl_device); > >>> + struct ishtp_opregion_dev *opr_dev = > >>> + ishtp_get_client_data(ecl_ishtp_cl); > >>> + > >>> + schedule_work(&opr_dev->reset_work); > >>> + > >>> + return 0; > >>> +} > >>> + > >>> +static int ecl_ishtp_cl_suspend(struct device *device) { > >>> + struct ishtp_cl_device *cl_device = ishtp_dev_to_cl_device(device); > >>> + struct ishtp_cl *ecl_ishtp_cl = ishtp_get_drvdata(cl_device); > >>> + struct ishtp_opregion_dev *opr_dev = > >>> + ishtp_get_client_data(ecl_ishtp_cl); > >>> + > >>> + if (acpi_target_system_state() == ACPI_STATE_S0) > >>> + return 0; > >>> + > >>> + acpi_opregion_deinit(opr_dev); > >>> + ecl_ish_cl_enable_events(opr_dev, false); > >>> + > >>> + return 0; > >>> +} > >>> + > >>> +static int ecl_ishtp_cl_resume(struct device *device) { > >>> + /* A reset is expected to call after an Sx. At this point > >>> + * we are not sure if the link is up or not to restore anything, > >>> + * so do nothing in resume path > >>> + */ > >>> + return 0; > >> > >> This seems very wrong, this means that there are no resume ordering > >> guarantees meaning that drivers which are ordered to resume after > >> this driver, because they rely on the opregion may end up not being > >> able to use the opregion leading to all kind of issues. > >> > >> IMHO what should happen here is that this driver waits for the EClite > >> to become ready here. Which probably means that it itself should be > >> only resumed after the ISH HID driver is, but I assume that the ISH > >> device is a parent of this device, so that ordering should be correct > automatically. > > > >> > >> TBH the whole lets just not resume and do a reset instead and then > >> just tearing down the entire ISHTP client struct and setting up a new > >> one from scratch, just feels very wrong. At a minimum the teardown + > >> bringup needs to happen before the resume callback completes, but > >> ideally this would be replaced by a much lighter resume solution. > > > > ISH Firmware (the PSE subsystem) can boot up/reinitialize Every Sx > > based on usecase or sometimes PSE is always-on. So the resume path is > > asynchronous and unpredictable in this case. Re-initialization and > > clean up required if PSE also boot up every Sx and might take good > > amount of time (Host can come alive before PSE comes up). Thus using > asynchronous reset notification. > > Ok, so I guess we need to live with the ugly deregister + re-register OpRegion > dance for the devices where the PSE is shutdown during suspend. > > You also write: "sometimes PSE is always-on", what about that case, I assume > in this case there will be no reset after resume? So then unregistering the > OpRegion handler at suspend (and setting link_ready=false) will be wrong > since without the reset the OpRegion handler will never get reinstalled ? reset is called from bus after every resume. When PSE is always-on, the notification comes quicky, where as there reset comes bit later in case of new boot. > > Regards, > > Hans Thanks again for the comments. Let me know if you have further review comments. I'll work on v2.