Re: [GSoC14][RFC] Proposal Draft: Refactor tempfile handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/01/2014 10:04 PM, Brian Gesiak wrote:
> Hello all,
> 
> My name is Brian Gesiak. I'm a research student at the University of
> Tokyo, and I'm hoping to participate in this year's Google Summer of
> Code by contributing to Git. I'm a longtime user, first-time
> contributor--some of you may have noticed my "microproject"
> patches.[1][2]
> 
> I'd like to gather some information on one of the GSoC ideas posted on
> the ideas page. Namely, I'm interested in refactoring the way
> tempfiles are cleaned up.
> 
> The ideas page points out that while lock files are closed and
> unlinked[3] when the program exits[4], object pack files implement
> their own brand of temp file creation and deletion. This
> implementation doesn't share the same guarantees as lock files--it is
> possible that the program terminates before the temp file is
> unlinked.[5]
> 
> Lock file references are stored in a linked list. When the program
> exits, this list is traversed and each file is closed and unlinked. It
> seems to me that this mechanism is appropriate for temp files in
> general, not just lock files. Thus, my proposal would be to extract
> this logic into a separate module--tempfile.h, perhaps. Lock and
> object files would share the tempfile implementation.
> 
> That is, both object and lock temp files would be stored in a linked
> list, and all of these would be deleted at program exit.
> 
> I'm very enthused about this project--I think it has it all:
> 
> - Tangible benefits for the end-user
> - Reduced complexity in the codebase
> - Ambitious enough to be interesting
> - Small enough to realistically be completed in a summer
> 
> Please let me know if this seems like it would make for an interesting
> proposal, or if perhaps there is something I am overlooking. Any
> feedback at all would be appreciated. Thank you!

Hi Brian,

Thanks for your proposal.  I have a technical point that I think your
proposal should address:

Currently the linked list of lockfiles only grows, never shrinks.  Once
an object has been linked into the list, there is no way to remove it
again even after the lock has been released.  So if a lock needs to be
created dynamically at a random place in the code, its memory is
unavoidably leaked.

This hasn't been much of a problem in the past because (1) the number of
locks acquired/released during a Git invocation is reasonable, and (2) a
lock object (even if it is already in the list) can be reused after the
lock has been released.  So there are many lock callsites that define
one static lock instance and use it over and over again.

But I have a feeling that if we want to use a similar mechanism to
handle all temporary files (of which there can be more), then it would
be a good idea to lift this limitation.  It will require some care,
though, to make sure that record removal is done in a way that is
threadsafe and safe in the event of all expected kinds of process death.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]