Re: [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 13, 2021 at 09:00:55AM -0500, Derrick Stolee wrote:
> I think the root cause is that the object order can change when the
> preferred pack changes with the same set of pack-files. Suppose we
> added more complicated ways of deduplicating objects across the packs?
> Then whatever we include here based on preferred packs and mtimes
> would need to be updated to match.
>
> However, if we store the contents of the .rev file in the MIDX itself,
> then we don't need that extra layer of indirection.

Yeah, towards the end of the weekend I started to lean towards putting
the whole object order inside of the MIDX.

You're definitely right that the root of the problem is the object order
is stored separately from the MIDX itself. And we probably could get
pretty creative and figure out what the minimal amount of information
needed to fingerprint an object order is. But the amount of time I spent
thinking about each proposal did not leave me with the obvious sense
that there still weren't ways around that could change the object order
without changing the MIDX's checksum.

> I'm leaning towards keeping the contents of the PORD chunk as-is, but
> renaming it to something like OORD (for object order). Then, we can
> carefully transition from using the .rev file to reading this chunk.
> We will want to continue looking for the .rev file when this chunk does
> not exist.

Agreed. What do you think about a series that just writes the chunk but
does not read it? That would solve the bug in the immediate-term. Then
we can go back and transition from the .rev file to the new chunk.

Reading the revindex chunk over the .rev file is pretty easy, as
demonstrated by the patch below.

What I'm less certain about is how we should write tests to make sure
the old strategy works, too. Do we want to run the MIDX bitmap tests
twice in a separate mode each (one that writes .rev files and one that
writes chuns in the MIDX)? Something else? Thoughts welcome.

--- 8< ---

diff --git a/pack-revindex.c b/pack-revindex.c
index 70d0fbafcb..d59ae73198 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -301,6 +301,20 @@ int load_midx_revindex(struct multi_pack_index *m)
 	if (m->revindex_data)
 		return 0;

+	if (m->chunk_revindex) {
+		/*
+		 * If the MIDX `m` has a `RIDX` chunk, then use its contents for
+		 * the reverse index instead of trying to load a separate `.rev`
+		 * file.
+		 *
+		 * Note that we do *not* set `m->revindex_map` here, since we do
+		 * not want to accidentally call munmap() in the middle of the
+		 * MIDX.
+		 */
+		m->revindex_data = (const uint32_t *)m->chunk_revindex;
+		return 0;
+	}
+
 	get_midx_rev_filename(&revindex_name, m);

 	ret = load_revindex_from_disk(revindex_name.buf,



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux