On 24.09.2019 8:21, Johannes Sixt wrote:
What are we testing here? Is there some back-and-forth conversion going on, and are we testing that the conversion happens at all, or that the correct conversion/encoding is picked, or that the conversion that is finally chosen is correct? Why does it help to test more interesting chars (and would you not also regard codepoints outside the BMP the most interesting because they require surrogate codepoints in UTF-16)?
According to my understanding (I'm not the author of test package), it is designed to test that various encodings are properly supported by git in the working tree. The new tests are designed to avoid any back-and-forth, which actually happened for the previous UTF-16-LE-BOM test, which in turn hidden that the test was bugged. Otherwise, the test verifies that if you requested some encoding, you get exactly that, and it covers various potential problems at once. > Why does it help to test more interesting chars (and would you not > also regard codepoints outside the BMP the most interesting because > they require surrogate codepoints in UTF-16)? It helps to cover more potential problems. One could agree that converting latin characters is mostly about padding/dropping zero chars, but this approach could never work for the chars I used. As for "outside the BMP", I'm simply not experienced with that. If you are, you're welcome to further improve the tests I added.