On 9/29/20 1:04 PM, Greg Kroah-Hartman wrote: > On Tue, Sep 29, 2020 at 06:51:02PM +0100, Christoph Hellwig wrote: >> Independ of my opinion on the whole scheme that I shared last time, >> we really should not bloat struct device with function pointers. Christoph, thank you for sharing a bit more of your concerns and bringing Greg into the discussion. It's more productive. >> >> On Fri, Sep 25, 2020 at 11:19:18AM -0500, Tony Asleson wrote: >>> Function callback and function to be used to write a persistent >>> durable name to the supplied character buffer. This will be used to add >>> structured key-value data to log messages for hardware related errors >>> which allows end users to correlate message and specific hardware. >>> >>> Signed-off-by: Tony Asleson <tasleson@xxxxxxxxxx> >>> --- >>> drivers/base/core.c | 24 ++++++++++++++++++++++++ >>> include/linux/device.h | 4 ++++ >>> 2 files changed, 28 insertions(+) > > I can't find this patch anywhere in my archives, why was I not cc:ed on > it? It's a v5 and no one thought to ask the driver core > developers/maintainers about it??? You were CC'd into v3, ref. https://lore.kernel.org/linux-ide/20200714081750.GB862637@xxxxxxxxx/ I should have continued to CC you on it, sorry about that. > {sigh} > > And for log messages, what about the dynamic debug developers, why not > include them as well? Since when is this a storage-only thing? Hannes Reinecke has been involved in the discussion some and he's involved in dynamic debug AFAIK. If others have a need to identify a specific piece of hardware in a potential sea of identical hardware that is encountering errors and logging messages and can optionally be added and removed at run-time, then yes they should be included too. There is nothing with this patch series that is preventing any device from registering a callback which provides this information when asked. >>> diff --git a/include/linux/device.h b/include/linux/device.h >>> index 5efed864b387..074125999dd8 100644 >>> --- a/include/linux/device.h >>> +++ b/include/linux/device.h >>> @@ -614,6 +614,8 @@ struct device { >>> struct iommu_group *iommu_group; >>> struct dev_iommu *iommu; >>> >>> + int (*durable_name)(const struct device *dev, char *buff, size_t len); > > No, that's not ok at all, why is this even a thing? > > Who is setting this? Why can't the bus do this without anything > "special" needed from the driver core? I'm setting it in the different storage subsystems. The intent is that when we go through a common logging path eg. dev_printk you can ask the device what it's ID is so that it can be logged as structured data with the log message. I was trying to avoid having separate logging functions all over the place which enforces a consistent meta data to log messages, but maybe that would be better than adding a function pointer to struct device? My first patch tried adding a call back to struct device_type, but that ran into the issue where struct device_type is static const in a number of areas. Basically for any piece of hardware with a serial number, call this function to get it. That was the intent anyway. > We have a mapping of 'struct device' to a unique hardware device at a > specific point in time, why are you trying to create another one? I don't think I'm creating anything new. Can you clarify this a bit more? I'm trying to leverage what is already in place. > What is wrong with what we have today? I'm trying to figure out a way to positively identify which storage device an error belongs to over time. Logging how something is attached and not what is attached, only solves for right now, this point in time. Additionally when the only identifier is in the actual message itself, it makes user space break every time the message content changes. Also what we have today in dev_printk doesn't tag the meta-data consistently for journald to leverage. This patch series adds structured data to the log entry for positive identification. It doesn't change when the content of the log message changes. It's true over time and it's true if a device gets moved to a different system. > So this is a HARD NAK on this patch for now. Thank you for supplying some feedback and asking questions. I've been asking for suggestions and would very much like to have a discussion on how this issue is best solved. I'm not attached to what I've provided. I'm just trying to get towards a solution. We've looked at user space quite a bit and there is an inherit race condition with trying to fetch the unique hardware id for a message when it gets emitted from the kernel as udev rules haven't even run (assuming we even have the meta-data to make the association). The last thing people want to do is delay writing the log message to disk until the device it belongs to can be identified. Of course this patch series still has a window from where the device is first discovered by the kernel and fetches the needed vpd data from the device. Any kernel messages logged during that time have no id to associate with it. Thanks, Tony