Re: [PATCH 08/11] streaming_write_entry(): support files with holes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes:

> On Mon, May 16, 2011 at 9:39 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes:
>>
>>> On Mon, May 16, 2011 at 7:30 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>>> One typical use of a large binary file is to hold a sparse on-disk hash
>>>> table with a lot of holes. Help preserving the holes with lseek().
>>>
>>> Should that be done only with big enough holes? Random zeros may
>>> increase the number of syscalls unnecessarily.
>>
>> I think that is a valid concern, but doesn't the code do that already?
>
> Ahh I see you only increase kept when the the whole buf is zero. I was
> looking for an explicit threshold, but it's implicitly the buffer
> size.

Because the code works on 10k chunks and read_istream() does not give you
a short-read, most of the time "kept" will only grab contiguous stream of
NULs in 10k increment.  At the very end of the file, however, the code can
seek by less than the chunksize, as the check is done by comparing the
holdto with readlen, not with sizeof(buf).

We might want to make sizeof(buf) a multiple of typical file block size
(e.g. 16k) to get a better alignment.

Seeking to 10k and writing 2k on a filesystem with 4k pagesize will only
make two blocks of hole, not two and half, and we would be wasting a seek
in that case.

Also we may want to omit seeking for the last chunk that is smaller than
the chunk size.

Like this...

 entry.c |   16 +++++++++-------
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/entry.c b/entry.c
index e063e72..f751c60 100644
--- a/entry.c
+++ b/entry.c
@@ -137,18 +137,20 @@ static int streaming_write_entry(struct cache_entry *ce, char *path,
 		goto close_and_exit;
 
 	for (;;) {
-		char buf[10240];
+		char buf[1024 * 16];
 		ssize_t wrote, holeto;
 		ssize_t readlen = read_istream(st, buf, sizeof(buf));
 
 		if (!readlen)
 			break;
-		for (holeto = 0; holeto < readlen; holeto++)
-			if (buf[holeto])
-				break;
-		if (readlen == holeto) {
-			kept += holeto;
-			continue;
+		if (sizeof(buf) == readlen) {
+			for (holeto = 0; holeto < readlen; holeto++)
+				if (buf[holeto])
+					break;
+			if (readlen == holeto) {
+				kept += holeto;
+				continue;
+			}
 		}
 
 		if (kept && lseek(fd, kept, SEEK_CUR) == (off_t) -1)


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]