On Fri, Jan 18, 2019 at 8:10 AM Duy Nguyen <pclouds@xxxxxxxxx> wrote: > > On Fri, Jan 18, 2019 at 8:04 PM Patrick Hogg <phogg@xxxxxxxxxxxx> wrote: > > > > On Fri, Jan 18, 2019 at 4:21 AM Duy Nguyen <pclouds@xxxxxxxxx> wrote: > >> > >> On Fri, Jan 18, 2019 at 9:28 AM Patrick Hogg <phogg@xxxxxxxxxxxx> wrote: > >> > > >> > ac77d0c37 ("pack-objects: shrink size field in struct object_entry", > >> > 2018-04-14) added an extra usage of read_lock/read_unlock in the newly > >> > introduced oe_get_size_slow for thread safety in parallel calls to > >> > try_delta(). Unfortunately oe_get_size_slow is also used in serial > >> > code, some of which is called before the first invocation of > >> > ll_find_deltas. As such the read mutex is not guaranteed to be > >> > initialized. > >> > >> This must be the SIZE() macros in type_size_sort(), isn't it? I think > >> we hit the same problem (use of uninitialized mutex) in this same code > >> not long ago. I wonder if there's anyway we can reliably test and > >> catch this. > > > > > > It was actually the SET_SIZE macro in check_object, at least for the repo at my company that hits this issue. I took a look at the call tree for oe_get_size_slow and found that it's used in many places outside of ll_find_deltas, so there are many potential call sites where this could crop up: > > > > [snip] > > > > Ah, yes. I think the only problematic place is from prepare_pack(). > The single threaded access after ll_find_deltas() is fine because we > never destroy mutexes. I'm a bit confused, I see calls to pthread_mutex_destroy in cleanup_threaded_search. It's true that only prepare_packing_data(&to_pack) is called and there is no cleanup of the to_pack instance (at least as far as I can see) in cmd_pack_objects, but aren't the threaded_search mutexes destroyed? > > > (Sorry if this is redundant for those who know the code better) > > Actually it's me to say sorry. I apparently did not know the code flow > good enough to prevent this problem in the first place. > > >> > Resolve this by splitting off the read mutex initialization from > >> > init_threaded_search. Instead initialize (and clean up) the read > >> > mutex in cmd_pack_objects. > >> > >> Maybe move the mutex to 'struct packing_data' and initialize it in > >> prepare_packing_data(), so we centralize mutex at two locations: > >> generic ones go there, command-specific mutexes stay here in > >> init_threaded_search(). We could also move oe_get_size_slow() back to > >> pack-objects.c (the one outside builtin/). > > > > > > I was already thinking that generic mutexes should be separated from command specific ones (that's why I introduced init_read_mutex and cleanup_read_mutex, but that may well not be the right exposure.) I'll try my hand at this tonight (just moving the mutex to struct packing_data and initializing it in prepare_packing_data, I'll leave large code moves to the experts) and see how it turns out. > > Yes, leave the code move for now. Bug fixes stay small and simple (and > get merged faster) I was looking at this and noticed that packing_data already has a lock mutex member. Perhaps I am missing something but would it be appropriate to drop read_mutex altogether, change lock to be a recursive mutex, then use that instead in read_lock()/read_unlock()? (Or even to directly call packing_data_lock/packing_data_unlock instead of read_lock/read_unlock? Strictly speaking it would be a pack lock and not a read lock so the read_lock/read_unlock terminology wouldn't be accurate anymore.) I have the change locally to move read_mutex to the packing_data struct (and rename it to read_lock to be consistent with the "lock" member), but it seems redundant. (And the lock member is only used in oe_set_delta_size.) > -- > Duy