On Mon, 3 Jun 2019 at 10:49, esoteric escape <manips88@xxxxxxxxx> wrote: > > Thanks! I see, yes speaking of C++17. Just to make sure I grasped it I'll say how I get it: > > 1. In the std::string's case, we care about bits regardless of the value of the chars inside std::string, so because mapping is precise that makes it well-defined. > 2. In case of char, the underlying bit representation changes On most implementations, no. The underlying bit representation is the same. 0xC8 as an unsigned char is 11001000 and as a char is also 11001000. What is implementation-defined is the value of 11001000 as a char. For signed char with GCC that value is (char)-56. For an unsigned char it's (char)200. One a one's complement system > if value overflows range and its implementation-defined. There's no difference between #1 and #2. In both cases the UTF-8 encoding produces some (implementation-defined) char value for each UTF-8 code unit. If you want UTF-8 encoded data then that's what you get. The resulting chars will work perfectly well with anything that expects UTF-8 encoded data. The fact that some characters in the string might have negative values is irrelevant. If converting the code unit to a char produces a negative value then that's what you get. You probably don't need to worry about it in any more detail than that. If you're using that string somewhere that expects UTF-8 then everything just works. If you're using it somewhere that expects 7-bit ASCII values then it might not work, but that's always true of UTF-8 data, it has nothing to do with whether char is signed or unsigned. > > Say, I decide to manually do this: > > std::string s = "\xE2\x82\xAC"; > > Or, > > char c[3]; > c[0] = 0xE2; > c[1] = 0x82; > c[2] = 0xAC; > > Then, I suppose these cases will be more like my #2 above than #1, true? Case #2 and #1 are the same.