Thanks! I see, yes speaking of C++17. Just to make sure I grasped it I'll say how I get it: 1. In the std::string's case, we care about bits regardless of the value of the chars inside std::string, so because mapping is precise that makes it well-defined. 2. In case of char, the underlying bit representation changes if value overflows range and its implementation-defined. Say, I decide to manually do this: std::string s = "\xE2\x82\xAC"; Or, char c[3]; c[0] = 0xE2; c[1] = 0x82; c[2] = 0xAC; Then, I suppose these cases will be more like my #2 above than #1, true? On Mon, Jun 3, 2019 at 2:52 PM Jonathan Wakely <jwakely.gcc@xxxxxxxxx> wrote: > On Mon, 3 Jun 2019 at 10:07, esoteric escape wrote: > > > > Hello, I am on Windows OS where CHAR_BIT == 8. > > > > I am trying to understand whether this behavior is implementation-defined > > or not. > > > > I have this string in UTF-8, and I am trying to understand if it is > > implementation-defined: > > > > std::string s = u8"€"; > > > > It's clear to me that char c = 0xC8 is implementation defined for the > > reasons: > > 1. char's signedness depends on compiler.. > > 2. If the value is beyond the representatable range of char say, -128 to > > 127, then again it is implementation-defined. > > The char value is implementation-defined, but there will be some > unique value that corresponds to 0xC8 and can be unambiguously > converted to (unsigned char)0xC8. For GCC (and in C++20) the > conversion to char is the obvious two's complement one, producing > (char)-56. > > > > > > In the same way, I am trying to understand how the std::string case is > > handled because its also uses char. > > > > So, € in UTF-8 means E2 82 AC sequence of bytes in hex. If std::string > uses > > suppose signed version of char, don't they fall beyond the representable > > range and therefore, their value is implementation defined in > std::string? > > Or, is this case actually well-defined? > > Both. The code is perfectly well-defined in C++11, C++14 and C++17. > The precise numerical values are implementation-defined, but there is > a one-to-one mapping from 8-bit UTF-8 code units to char values, and > back again. > > N.B. In C++20 the code is ill-formed and won't compile, because the > type of u8"€" is const char8_t[4] which cannot be used to initialize a > std::string. You'd need to cast it to (const char*) or use > std::u8string instead. >