[Bug 15910] zero-length files and performance degradation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=15910





--- Comment #8 from Guillem Jover <guillem@xxxxxxxxxxx>  2010-05-10 17:23:31 ---
(In reply to comment #5)
> Why not unpack all of the files as "foo.XXXXXX" (where XXXXXX is a
> mkstemp filename template) do a sync call (which in Linux is
> synchronous and won't return until all the files have been written),
> and only then, rename the files? That's going to be the most fastest
> and most efficient way to guarantee safety under Linux; the downside
> is that you need to have enough free space to store the old and the
> new files in the package simultaneously. But this also is a win,
> because it means you don't actually start overwriting files in a
> package until you know that the package installation is most likely
> going to succeed.  (Well, it could fail in the postinstall script, but
> at least you don't have to worry about disk full errors.)

Ah, forgot to mention that we also discussed about using sync(), but
the problem as you say is that using sync() is not portable, so we need
the deferred fsync() and rename() code anyway for unpacked files on
non-Linux systems. Another possible issue, is that if there's been lots
of I/O in parallel or just before running dpkg the sync() might take much
longer than expected, but depending on the implementation fsync() might
show similar slowdowns anyway (not, though if it was on a different
"disk" and file system).

Regarding the downsides and wins you mention they already apply to the
current implementation. As I mentioned before dpkg has always supported
rolling back, by making a hardlinked backup of the old file as .dpkg-tmp,
extracting the new file as .dpkg-new and then doing an atomic rename() over
the current file, and in case of error (from dpkg itself or the appropriate
maintainer script) restoring all the old file backups for the package
(either in the current run or in a subsequent dpkg run). And only once
the unpack stage has been successful it removes the backups in one pass.
So the need for rollback already makes dpkg take (approx.) twice the space
per package, and thus there's no unsafe overwrites that cannot be reverted
(except for the zero-length ones).

I've added the conditional code now for Linux to do the sync() and then
rename() all files in one pass, and it's just few lines of code (due to
the deferred fsync() changes which are now in place), I'll request some
testing from ext4 users, and if it improves something and does not make
the matters worse on ext3 and other file systems, then I guess we might
use that on Linux. It still looks like a workaround to me.

As a side remark, I don't think it's fair though, that you complain about
application developers not doing the right thing, when at the same time,
you expect them not to use the proper portable tool for such job. And that
you seem to not see a problem that using it implies a performance penalty
on a file system that really needs it. That there's lots of users willing
to sacrifice safety for performance, tells me the penalty is significant
enough. Isn't there anything that could be improved to make the correct
fsync()+rename() case a bit faster? In this particular case those are
already batched after all writes have been performed.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux