Re: Implementation-defined behavior or not?

esoteric escape <manips88@xxxxxxxxx> · Mon, 3 Jun 2019 19:47:21 +0530

Got it. Awesome Explanation.

On Mon, Jun 3, 2019 at 4:11 PM Jonathan Wakely <jwakely.gcc@xxxxxxxxx>
wrote:

> On Mon, 3 Jun 2019 at 10:49, esoteric escape <manips88@xxxxxxxxx> wrote:
> >
> > Thanks! I see, yes speaking of C++17. Just to make sure I grasped it
> I'll say how I get it:
> >
> > 1. In the std::string's case, we care about bits regardless of the value
> of the chars inside std::string, so because mapping is precise that makes
> it well-defined.
> > 2. In case of char, the underlying bit representation changes
>
> On most implementations, no. The underlying bit representation is the
> same. 0xC8 as an unsigned char is 11001000 and as a char is also
> 11001000. What is implementation-defined is the value of 11001000 as a
> char. For signed char with GCC that value is (char)-56. For an
> unsigned char it's (char)200. One a one's complement system
>
>
> > if value overflows range and its implementation-defined.
>
> There's no difference between #1 and #2. In both cases the UTF-8
> encoding produces some (implementation-defined) char value for each
> UTF-8 code unit. If you want UTF-8 encoded data then that's what you
> get. The resulting chars will work perfectly well with anything that
> expects UTF-8 encoded data. The fact that some characters in the
> string might have negative values is irrelevant. If converting the
> code unit to a char produces a negative value then that's what you
> get. You probably don't need to worry about it in any more detail than
> that.
>
> If you're using that string somewhere that expects UTF-8 then
> everything just works. If you're using it somewhere that expects 7-bit
> ASCII values then it might not work, but that's always true of UTF-8
> data, it has nothing to do with whether char is signed or unsigned.
>
>
> >
> > Say, I decide to manually do this:
> >
> > std::string s = "\xE2\x82\xAC";
> >
> > Or,
> >
> > char c[3];
> > c[0] = 0xE2;
> > c[1] = 0x82;
> > c[2] = 0xAC;
> >
> > Then, I suppose these cases will be more like my #2 above than #1, true?
>
> Case #2 and #1 are the same.
>