Re: GSoC - Some questions on the idea of

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/30/2012 3:34 PM, Jeff King wrote:
On Fri, Mar 30, 2012 at 03:51:20PM -0400, Bo Chen wrote:

The sub-problems of "delta for large file" problem.

1 large file

Note that there are other problem areas with big files that can be
worked on, too. For example, some people want to store 100 gigabytes
in a repository.

I take it that you have in mind a 100G set of files comprised entirely
of big-files that cannot be logically separated into smaller submodules?

My understanding is that a main strategy for "big files" is to separate
your big-files logically into their own submodule(s) to keep them from
bogging down the not-big-file repo(s).

Is one of the goals of big-file-support to make submodule strategizing unconcerned about big-file groupings and only concerned about logical-file groupings? Big-file groupings are not necessarily logical file groupings, but perhaps a technical file grouping subset of a logical file grouping that is necessitated by big-file performance considerations. IOW, is the goal of big-file-support to make big-files "just work" so that users don't have to think about graphics files, binaries, etc, and just treat them like everything else? Obviously, a 100G database file will always be a 'big-file' for the foreseeable future, but a 0.5G graphics file is not a "big file" generally speaking (as opposed to git-speaking).

Because git is distributed, that means 100G in the repo database,
and 100G in the working directory, for a total of 200G.

I take it that you are implying that the 100G object-store size is due
to the notion that binary files cannot-be/are-not compressed well?

People in this situation may want to be able to store part of the
repository database in a network-accessible location, trading some
of the convenience of being fully distributed for the space savings.
So another project could be designing a network-based alternate
object storage system.

I take it you are implying a local area network with users git repos on workstations?

In regards to "network-based alternate objects" that are in fact on the internet they would need to first be cloned onto the local area network. Or are you imagining this would work for internet "network-based alternate objects"?

Some setups login to a linux server and have all their repos there. The "alternate objects" does not need to network-based in that case. It is "local", but local does not mean 20 people cloning the alternate objects to their workstations. It means one copy of alternate objects, and twenty repos referencing that one copy.

v/r,
neal
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]