Re: (new) non-ASCII filenames break unit tests on Linux

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/4/23 12:10, Michael Stahl wrote:
On 03/12/2023 12:59, Stephan Bergmann wrote:
For better or worse, the payload of LO "internal" file URLs is always considered to be a UTF-8 encoding of the actual system pathname.  It is *not* a byte-for-byte representation of the bytes that make up the Unix system pathname.

What thus happens here is that the file UCP's TaskManager::getv -> osl::DirectoryItem::get -> osl_getDirectoryItem -> osl::detail::convertUrlToPathname -> getSystemPathFromFileUrl -> decodeFromUtf8 -> convert -> UnicodeToTextConverter_Impl::convert -> rtl_convertUnicodeToText tries to translate the Unicode chars of "hybrid_writer_абв_αβγ.pdf" to osl_getThreadTextEncoding() == RTL_TEXTENCODING_ASCII_US, but which doesn't work because ASCII has no representation of the Cyrillic and Greek letters.

in the "C" locale, every 8-bit value is valid, but only ASCII (<128) values are meaningful; the intent is that the application does not interpret file-names, but uses them as-is, and replacing characters with '?' (as apparently happens here) looks wrong to me.

probably there isn't yet a RTL_TEXTENCODING_C that behaves like this.

That's not the issue here (the issue is that "ASCII has no representation of the Cyrillic and Greek letters"), and the existing RTL_TEXTENCODING_UTF8 would do what you seek on that conversion step from a Unicode file URL payload to a byte sequence pathname.




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux