Got it. Awesome Explanation. On Mon, Jun 3, 2019 at 4:11 PM Jonathan Wakely <jwakely.gcc@xxxxxxxxx> wrote: > On Mon, 3 Jun 2019 at 10:49, esoteric escape <manips88@xxxxxxxxx> wrote: > > > > Thanks! I see, yes speaking of C++17. Just to make sure I grasped it > I'll say how I get it: > > > > 1. In the std::string's case, we care about bits regardless of the value > of the chars inside std::string, so because mapping is precise that makes > it well-defined. > > 2. In case of char, the underlying bit representation changes > > On most implementations, no. The underlying bit representation is the > same. 0xC8 as an unsigned char is 11001000 and as a char is also > 11001000. What is implementation-defined is the value of 11001000 as a > char. For signed char with GCC that value is (char)-56. For an > unsigned char it's (char)200. One a one's complement system > > > > if value overflows range and its implementation-defined. > > There's no difference between #1 and #2. In both cases the UTF-8 > encoding produces some (implementation-defined) char value for each > UTF-8 code unit. If you want UTF-8 encoded data then that's what you > get. The resulting chars will work perfectly well with anything that > expects UTF-8 encoded data. The fact that some characters in the > string might have negative values is irrelevant. If converting the > code unit to a char produces a negative value then that's what you > get. You probably don't need to worry about it in any more detail than > that. > > If you're using that string somewhere that expects UTF-8 then > everything just works. If you're using it somewhere that expects 7-bit > ASCII values then it might not work, but that's always true of UTF-8 > data, it has nothing to do with whether char is signed or unsigned. > > > > > > Say, I decide to manually do this: > > > > std::string s = "\xE2\x82\xAC"; > > > > Or, > > > > char c[3]; > > c[0] = 0xE2; > > c[1] = 0x82; > > c[2] = 0xAC; > > > > Then, I suppose these cases will be more like my #2 above than #1, true? > > Case #2 and #1 are the same. >