On 03/08/2015 12:23 AM, brian m. carlson wrote: > This is a patch series to convert some of the relevant uses of unsigned > char [20] to struct object_id. > > The goal of this series to improve type-checking in the codebase and to > make it easier to move to a different hash function if the project > decides to do that. This series does not convert all of the codebase, > but only parts. I've dropped some of the patches from earlier (which no > longer apply) and added others. > > Certain parts of the code have to be converted before others to keep the > patch sizes small, maintainable, and bisectable, so functions and > structures that are used across the codebase (e.g. struct object) will > be converted later. Conversion has been done in a somewhat haphazard > manner by converting modules with leaf functions and less commonly used > structs first. > > Since part of the goal is to ease a move to a different hash function if > the project decides to do that, I've adopted Michael Haggerty's > suggestion of using variables named "oid", naming the structure member > "sha1", and using GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ as the constants. > > I've been holding on to this series for a long time in hopes of > converting more of the code before submitting, but realized that this > deprived others of the ability to use the new structure and required me > to rebase extremely frequently. > [...] I've added CC to several people who commented on v1. I think this is a really interesting project and I hope that it works out. In my opinion, the biggest risk (aside from the sheer amount of work required) is the issue that was brought up on the mailing list when you submitted v1 [1]: Converting an arbitrary (unsigned char *) to a (struct object_id *) is not allowed, because the alignment requirements of the latter might be stricter than those of the former. This means that if, for example, we are reading some data from disk or from the network, and we expect the 20 bytes starting with byte number 17 to be a SHA-1 in binary format, we used to be able to do unsigned char *sha1 = buf + 17; and use sha1 like any other SHA-1, without the need for any copying. But we can't do the analogous struct object_id *oid = (struct object_id *)(buf + 17); because the alignment is not necessarily correct. So in a pure "struct object_id" world, I think we would be forced to change such code to struct object_id oid; hashcpy(oid.sha1, buf + 17); This uses additional memory and introduces copying overhead. Also, the lifetime of oid might exceed the scope of a function, so oid might have to be allocated on the heap and later freed. This adds more computational overhead, more memory overhead, and more programming effort to get it all right. So as much as I like the idea of wrapping SHA-1s in objects, I think it would be a good idea to first make sure that we can all agree on a plan for dealing with situations like this. I can think of the following possibilities: 1. Maybe code that needs to handle SHA-1s with screwy alignment does not exist, or maybe it does the copying anyway. I haven't actually checked. 2. Maybe somebody can prove that struct object_id *oid = (struct object_id *)(buf + 17); somehow *is* allowed by the C standard, or at least that it is sufficiently portable for our purposes. 3. We accept the additional runtime costs and development effort for the extra copies. To accept this approach, we would need some idea of which areas of the code will be affected, and some estimate of how much additional memory it would cost. 4. We continue to support working with SHA-1s declared to be (unsigned char *) in some performance-critical code, even as we migrate most other code to using SHA-1s embedded within a (struct object_id). This will cost some duplication of code. To accept this approach, we would need an idea of *how much* code duplication would be needed. E.g., how many functions will need both (unsigned char *) versions and (struct object_id *) versions? 5. We only make the change opportunistically. Whenever we find a function that needs to work with non-struct-aligned SHA-1s, we leave the function as-is rather than converting it or creating a second version that works with object_id objects. This approach would leave the codebase somewhat schizophrenic. I'm not trying to dissuade you from this project, but I think that for your project to have a chance of success, we need an answer to this question. So let's confront it now rather than after you have invested lots more time and/or gotten patches merged. Michael [1] http://thread.gmane.org/gmane.comp.version-control.git/248054 -- Michael Haggerty mhagger@xxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html