Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes: > On Mon, May 16, 2011 at 9:39 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >> Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes: >> >>> On Mon, May 16, 2011 at 7:30 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >>>> One typical use of a large binary file is to hold a sparse on-disk hash >>>> table with a lot of holes. Help preserving the holes with lseek(). >>> >>> Should that be done only with big enough holes? Random zeros may >>> increase the number of syscalls unnecessarily. >> >> I think that is a valid concern, but doesn't the code do that already? > > Ahh I see you only increase kept when the the whole buf is zero. I was > looking for an explicit threshold, but it's implicitly the buffer > size. Because the code works on 10k chunks and read_istream() does not give you a short-read, most of the time "kept" will only grab contiguous stream of NULs in 10k increment. At the very end of the file, however, the code can seek by less than the chunksize, as the check is done by comparing the holdto with readlen, not with sizeof(buf). We might want to make sizeof(buf) a multiple of typical file block size (e.g. 16k) to get a better alignment. Seeking to 10k and writing 2k on a filesystem with 4k pagesize will only make two blocks of hole, not two and half, and we would be wasting a seek in that case. Also we may want to omit seeking for the last chunk that is smaller than the chunk size. Like this... entry.c | 16 +++++++++------- 1 files changed, 9 insertions(+), 7 deletions(-) diff --git a/entry.c b/entry.c index e063e72..f751c60 100644 --- a/entry.c +++ b/entry.c @@ -137,18 +137,20 @@ static int streaming_write_entry(struct cache_entry *ce, char *path, goto close_and_exit; for (;;) { - char buf[10240]; + char buf[1024 * 16]; ssize_t wrote, holeto; ssize_t readlen = read_istream(st, buf, sizeof(buf)); if (!readlen) break; - for (holeto = 0; holeto < readlen; holeto++) - if (buf[holeto]) - break; - if (readlen == holeto) { - kept += holeto; - continue; + if (sizeof(buf) == readlen) { + for (holeto = 0; holeto < readlen; holeto++) + if (buf[holeto]) + break; + if (readlen == holeto) { + kept += holeto; + continue; + } } if (kept && lseek(fd, kept, SEEK_CUR) == (off_t) -1) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html