On 24 August 2017 at 20:29, Brandon Casey <drafnel@xxxxxxxxx> wrote: > On Thu, Aug 24, 2017 at 9:52 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >> Brandon Casey <drafnel@xxxxxxxxx> writes: >> >>> Ah, you probably meant something like this: >>> >>> const char strbuf_slopbuf = '\0'; >>> >>> which gcc will apparently place in the read-only segment. I did not know that. >> >> Yes but I highly suspect that it would be very compiler dependent >> and not something the language lawyers would recommend us to rely >> on. > > I think compilers may have the option of placing variables that are > explicitly initialized to zero in the bss segment too, in addition to > those that are not explicitly initialized. So I agree that no one > should write code that depends on their variables being placed in one > segment or the other, but I could see someone using this behavior as > an additional safety check; kind of a free assert, aside from the > additional space in the .rodata segment. > >> My response was primarily to answer "why?" with "because we did not >> bother". The above is a mere tangent, i.e. "multiple copies of >> empty strings is a horrible implementation (and there would be a way >> to do it with a single instance)". > > Merely adding const to our current strbuf_slopbuf declaration does not > buy us anything since it will be allocated in r/w memory. i.e. it > would still be possible to modify it via the buf member of strbuf. So > you can't just do this: > > const char strbuf_slopbuf[1]; > > That's pretty much equivalent to what we currently have since it only > restricts modifying the contents of strbuf_slopbuf directly through > the strbuf_slopbuf variable, but it does not restrict modifying it > through a pointer to that object. > > Until yesterday, I was under the impression that the only way to > access data in the rodata segment was through a constant literal. So > my initial thought was that we could do something like: > > const char * const strbuf_slopbuf = ""; > > ..but that variable cannot be used in a static assignment like: > > struct strbuf foo = {0, 0, (char*) strbuf_slopbuf}; > > So it seemed like our only option was to use a literal "" everywhere > instead of a slopbuf variable _if_ we wanted to have the guarantee > that our "slopbuf" could not be modified. > > But what I learned yesterday, is that at least gcc/clang will place > the entire variable in the rodata segment if the variable is both > marked const _and_ initialized. > > i.e. this will be allocated in the .rodata segment: > > const char strbuf_slopbuf[1] = ""; > >>> #define STRBUF_INIT { .alloc = 0, .len = 0, .buf = (char*) &strbuf_slopbuf } >>> >>> respectively. Yeah, that's definitely preferable to a macro. >>> Something similar could be done in object.c. >> >> What is the main objective for doing this change? The "make sure we >> do not write into that slopbuf" assert() bothers you and you want to >> replace it with an address in the read-only segment? > > Actually nothing about the patch bothers me. The point of that patch > is to make sure we don't accidentally modify the slopbuf. I was just > looking for a way for the compiler to help out and wondering if there > was a reason we didn't attempt to do so in the first place. > > I think the main takeaway here is that I learned something yesterday > :-) I didn't actually intend to submit a patch for any of this, but > if anything useful came out of the discussion I thought Martin may > incorporate it into his patch if he wanted to. Thanks for interesting information. I also learned something new. :-) My first thought was, well, maybe someone writes '\0' to sb.buf[len]. That should intuitively be a no-op and "ok", but the documentation actually only says that it's safe to write to positions 0 .. len-1, so sb.buf[len] is supposedly not safe (no-op or not). Maybe a degenerate and rarely tested case of otherwise sane code could end up writing '\0' to slopbuf[0]. (Arguably strbuf_setlen should have been used instead.) I can see the value of placing the slopbuf in read-only memory, but I think that would be a follow-up patch with its own pros and cons. Martin