Re: Use tar to append?

"Mikkel L. Ellertson" <mikkel@xxxxxxxxxxxxxxxx> · Fri, 09 Mar 2007 14:57:20 -0600

Mike McCarty wrote:
> Mikkel L. Ellertson wrote:
> 
>> The problem is that the entire archive is compressed, and not the
> 
> [snip]
> 
> No, it is not. Please speak of what you know, rather than
> what you conjecture.[1]
> 
Are you saying that this is not how compressed tar archives are
created? If so, where did you get this? When you use the z option,
tar filters through gzip. If you use the j option, it filters throug
bzip2. This is why you can convert a compressed tar archive to an
uncompressed archive with gzip or bzip2. For a large archive, this
take a fair amount of time. When you want to add to the end of a
compressed archive, it has uncompress the entire archive, add the
files, and compress the new archive. This is also why it is so hard
to recover files from a damaged compressed archive that are after
the damaged section, or from a damaged tape built this way.

Now, if you were restoring a file from the archive, instead of
adding to the archive, then you can stop uncompressing after you get
to the end of the file you are after.

> I ran a test in which I created a tar archive, not compressed,
> but straight tar format. The size of the file was 3653601280 bytes.
> 
> I created another tar archive, also not compressed straight
> tar format. The size of that file was 10240 bytes. It took
> less than a second to create.
> 
> I then used "tar A" to append the tiny archive to the larger
> one, and the run time was 2:20; that is two minutes twenty
> seconds. During that time, my machine was approx. 80% wait state,
> and approx. 16% system state, per top.

Try the same thing with the archives compressed, and let me know how
long it takes.
> 
> It may be that, due to the format of a tar file, tar is
> architecturally constrained to do individual seeks for
> every file contained in the archive, and that doing
> thousands of seeks in this rather large file is necessarily
> time consuming. This archive has 34619 files in it. I watched
> the file system size using df, and found that it did not vary
> during this process, so I conjecture that this was seeks
> rather than copies.
> 
Well, tar was created for use with tape drives, with limited to no
seek capabilities. It may well be seeking from the start of the
archive for each file. I do know that when restoring one file from a
tape, it will start at the beginning of the tape archive, and search
through until it gets to the file it wants. This can be interesting
with multi-tape archives.

>From the results you are seeing, it would appear that you are using
the wrong tool for the job you are trying to do.

Mikkel
-- 

  Do not meddle in the affairs of dragons,
for thou art crunchy and taste good with Ketchup!