Currently, when pushing a pack to the server that has specified a pack size limit, we don't detect that we exceed that limit until we have already generated (and started transmitting) that much pack data. Ideally, we should be able to predict the approximate pack size _before_ we start generating and transmitting the pack data, and abort early if the estimated pack size exceeds the pack size limit. This patch tries to provide such an estimate: It looks at the objects that are to be included in the pack, and for already-packed objects, it assumes that their compressed in-pack size is a good estimate of how much they will contribute to the pack currently being generated. This assumption should be valid as long as the objects are reused as-is. For loose objects that are to be included in the pack, we currently have no good estimate as to how much they will contribute to the pack size. Since it's better to underestimate (because an overestimation will prevent us from sending a pack that might actually be within the pack size limit), we don't include loose objects at all in the pack size estimate. This makes the estimate somewhat useless in common workflows (where the push happens before (most of) the pushed objects are packed). The estimate is generated before the "Compressing" and "Writing" phases of the push, so if the estimate exceeds the pack size limit, we abort before sending any pack data to the server. If the estimate turns out to be too low (e.g. because we're pushing many loose objects), there is still code in place to abort the push when we reach the pack size limit during transmission. Signed-off-by: Johan Herland <johan@xxxxxxxxxxx> --- I'm not really happy with excluding loose objects in the pack size estimate. However, the size contributed by loose objects varies wildly depending on whether a (good) delta is found. Therefore, any estimate done at an early stage is bound to be wildly inaccurate. We could maybe use some sort of absolute minimum size per object instead, but I thought I should publish this version before spending more time futzing with it... A drawback of not including loose objects in the pack size estimate, is that pushing loose objects is a very common use case (most people push more often than they 'git gc'). However, for the pack sizes that servers are most likely to refuse (hundreds of megabytes), most of those objects will probably already be packed anyway (e.g. by 'git gc --auto'), so I still hope the pack size estimate will be useful when it really matters. ...Johan builtin/pack-objects.c | 23 +++++++++++++++++++++++ 1 files changed, 23 insertions(+), 0 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index e226053..c0c6a0a 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1141,23 +1141,46 @@ static int pack_offset_sort(const void *_a, const void *_b) (a->in_pack_offset > b->in_pack_offset); } +static unsigned long estimate_packed_size(const struct object_entry *entry) +{ + unsigned long ret; + if (entry->in_pack) { + /* Assume that all packed objects are reused as-is */ + struct revindex_entry *revidx = find_pack_revindex( + entry->in_pack, + entry->in_pack_offset); + return revidx[1].offset - entry->in_pack_offset; + } + return 0; +} + static void get_object_details(void) { uint32_t i; struct object_entry **sorted_by_offset; + unsigned long sum_size; sorted_by_offset = xcalloc(nr_objects, sizeof(struct object_entry *)); for (i = 0; i < nr_objects; i++) sorted_by_offset[i] = objects + i; qsort(sorted_by_offset, nr_objects, sizeof(*sorted_by_offset), pack_offset_sort); + if (pack_to_stdout && pack_size_limit) + sum_size = sizeof(struct pack_header) + 20; /* pack overhead */ + for (i = 0; i < nr_objects; i++) { struct object_entry *entry = sorted_by_offset[i]; check_object(entry); if (big_file_threshold <= entry->size) entry->no_try_delta = 1; + if (pack_to_stdout && pack_size_limit && !entry->preferred_base) + sum_size += estimate_packed_size(entry); } + if (pack_to_stdout && pack_size_limit && sum_size > pack_size_limit) + die("estimated pack size exceeds the pack size limit (%lu bytes)", + pack_size_limit); + free(sorted_by_offset); } -- 1.7.5.rc1.3.g4d7b -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html