On Tue, Jan 6, 2009 at 00:37, Marcel Holtmann <marcel@xxxxxxxxxxxx> wrote: >> > so I was playing with adding environment variables to devices in the >> > udev database for later easy enumeration. And I like having ID_VENDOR >> > and ID_MODEL available and would like to extend this even to PCI and >> > SDIO in the future, but why are all whitespace replaced with "_". >> >> > E: ID_VENDOR=Novatel_Wireless >> > E: ID_MODEL=Novatel_Wireless_HSUPA_Modem >> > E: ID_REVISION=0000 >> > E: ID_SERIAL=Novatel_Wireless_Novatel_Wireless_HSUPA_Modem_356846015115701 >> >> Yeah, all strings in usb_id are mangled to be used in symlink names. >> Like all the other *_id tools do too. When we started doing that, >> nobody thought we would ever use these strings for anything else than >> creating symlinks. :) > > for things like ID_SERIAL this makes sense to me. For ID_VENDOR and > ID_MODEL not so much. Does udev itself use these values anywhere? I don't think the usb_id keys are used, but the same key names are used directly from ata_id and scsi_id. > Or can > we have udev just do the stripping part and have *_id tools return the > plain values? We need to sanitize some chars in any case, and the question is how far you go here, do we allow $, backticks, backslash, quotes, and such things? It get's pretty difficult how to decide what "plaintext" is in that context. All the values are from completely untrusted sources. Everybody can put any string in a fake USB device, and we would copy these strings in the environment of processes/shell scripts running as root. That's why we need to be very careful here. >> Some of the values, like ATA and SCSI just fill the whole remaining >> string buffer with whitespaces, which need to be stripped too, to be >> used in symlinks. >> >> For stuff that needs to be reversible, like filesystem label strings, >> which can contain any char, need to be valid and safe chars to be a >> symlink, and we need to be able to translate the name back to the >> original value, we encode the string with url encoding, like: >> ID_FS_LABEL=/ >> ID_FS_LABEL_ENC=\x2f >> >> We need to sanitize the values at least for things like newline, >> quoting and other control chars, because most of these values are >> completely "untrusted userdata", and a fake device with a usb model >> string like "Foo\nBAR=1" we don't want to import unmangled. :) >> >> If you have an idea what we should do/change to make these values more >> useful, let me know. > > I agree that we need to do at least some kind of sanitization, but for a > lot of these values having the clear text strings available would be > really nice. Especially the ID_VENDOR and ID_MODEL strings. If udev is > having access to these string (like for USB or when creating unique > symlinks) anyway, then it makes no sense to not duplicate the logic to > these from the device. Yeah, but as mentioned, we need to be careful and will need to mangle quoting and newlines at least. Some vendor values are known to contain single quotes, which we probably should not copy unmangled to an environment key. > Do you ever defined a character set for the environment values? It would > be nice if it UTF-8, because then we can copy most values natively. We don't do any transcoding or anything else here, we allow plain ascii [a-zA-Z0-9] and "#+-.:=@_" and also valid utf8 to pass unmangled, everything else gets replaced or encoded. Should we add encoded versions of these values, to be safe? Like ID_VENDOR_ENC and ID_VENDOR_MODEL_ENC? The consumer of these strings would need to translate "%xx" back to the char value, like we do for the filesystem labels, and free-text uuids today. Kay -- To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html