On Mon, 3 Jun 2019 at 10:07, esoteric escape wrote: > > Hello, I am on Windows OS where CHAR_BIT == 8. > > I am trying to understand whether this behavior is implementation-defined > or not. > > I have this string in UTF-8, and I am trying to understand if it is > implementation-defined: > > std::string s = u8"€"; > > It's clear to me that char c = 0xC8 is implementation defined for the > reasons: > 1. char's signedness depends on compiler.. > 2. If the value is beyond the representatable range of char say, -128 to > 127, then again it is implementation-defined. The char value is implementation-defined, but there will be some unique value that corresponds to 0xC8 and can be unambiguously converted to (unsigned char)0xC8. For GCC (and in C++20) the conversion to char is the obvious two's complement one, producing (char)-56. > > In the same way, I am trying to understand how the std::string case is > handled because its also uses char. > > So, € in UTF-8 means E2 82 AC sequence of bytes in hex. If std::string uses > suppose signed version of char, don't they fall beyond the representable > range and therefore, their value is implementation defined in std::string? > Or, is this case actually well-defined? Both. The code is perfectly well-defined in C++11, C++14 and C++17. The precise numerical values are implementation-defined, but there is a one-to-one mapping from 8-bit UTF-8 code units to char values, and back again. N.B. In C++20 the code is ill-formed and won't compile, because the type of u8"€" is const char8_t[4] which cannot be used to initialize a std::string. You'd need to cast it to (const char*) or use std::u8string instead.