On Mon, Jun 11, 2018 at 9:05 PM, brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> wrote: > On Mon, Jun 11, 2018 at 03:47:43AM -0400, Eric Sunshine wrote: >> The word "translate" is very generic and is (at least in my mind) >> strongly associated with i18n/l10n, so the name test_translate() may >> be confusing for readers. Perhaps test_oid_lookup() or test_oid_get() >> or even just test_oid()? > > test_oid would be fine. One note is that this doesn't always produce > OIDs; sometimes it will produce other values, but as long as you don't > think that's too confusing, I'm fine with it. It was surprising to see it used for non-OID's (such as hash characteristics), but not hard to deal with. One could also view this as a generic key/value cache (not specific to OID's) with overriding super-key (the hash algorithm, in this case), which would allow for more generic name than test_oid(), but we don't have to go there presently. >> This is a very expensive lookup since it invokes a heavyweight command >> (perl, in this case) for *every* OID it needs to retrieve from the >> file. Windows users, especially, will likely not be happy about this. >> See below for an alternative. > > I agree perl would be expensive if it were invoked frequently, but > excepting SHA1-prerequisite tests, this function is invoked 32 times in > the entire testsuite. > > One of the reasons I chose perl was because we have a variety of cases > where we'll need spaces in values, and those tend to be complex in > shell. Can you give examples of cases in which values will contain spaces? It wasn't obvious from this patch series that such a need would arise. Are these values totally free-form? If not, some character (such as "_", "-", ".", etc.) could act as a stand-in for space. That shouldn't be too hard to handle. >> Here's what I had envisioned when reading your emails about OID lookup >> table functionality: >> >> --- >8 --- >> test_oid_cache () { >> while read tag rest >> do >> case $tag in \#*) continue ;; esac >> >> for x in $rest >> do >> k=${x%:*} >> v=${x#*:} >> if test "$k" = $test_hash_algo >> then >> eval "test_oid_$tag=$v" >> break >> fi >> done >> done >> } > > Using shell variables like this does have the downside that we're > restricted to only characters allowed in shell variables. That was > something I was trying to avoid, but it certainly isn't fatal. Is that just a general concern or do you have specific "weird" keys in mind? >> test_detect_hash() would detect the hash algorithm and record it >> instead of having to determine it each time an OID needs to be >> "translated". It probably would be called by test-lib.sh. > > We'll probably have to deal with multiple hashes in the future, > including for input and output, but this could probably be coerced to > handle that case. My original version of test_oid_cache() actually allowed for that by caching _all_ information from the tables rather than only values corresponding to $test_hash_algo. It looked like this: --- >8 --- test_oid_cache () { while read tag rest do case $tag in \#*) continue ;; esac for x in $rest do eval "test_oid_${tag}_${x%:*}=${x#*:}" done done } --- >8 --- The hash algorithm is incorporated into the cache variable name like this: "test_oid_hexsz_sha256" >> And, when specifying values from which to choose based upon hash >> algorithm: >> >> $(test_oid bored sha1:deadbeef NewHash:feedface) > > This syntax won't exactly be usable, because we have to deal with spaces > in values. It shouldn't be too much of a problem to just use > test_oid_cache at the top of the file, though. See above about possibly using a stand-in character for space. >> A nice property of how this implementation caches values is that you >> don't need test_oid() for really simple cases. You can just access the >> variable directly. For instance: $test_oid_hexsz > > Because we're going to need multiple hash support in the future, for > input, output, and on-disk, I feel like this is not a good direction for > us to go in the testsuite. Internally, those variable names are likely > to change. Indeed, this isn't a real selling point; I only just thought of it while composing the mail. Going through the API is more robust. (Although, see above how the revised test_oid_cache() incorporates the hash algorithm into the cache variable name.)