Re: regarding flush taking full inode locks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/02/2012 06:44 AM, Pranith Kumar Karampuri wrote:
hi Avati,
     There was some reason Flush is made into a transaction with full file inode-locks for some optimization. I remember you saying that the optimization is not valid anymore. Could you please let us know the history of this. Basically VMs are hanging in some cases because of the flush fop while self-heal in progress. We need to evaluate if this full inode-lock can be removed or not.

Pranith.

The reason flush() was made a transaction was twofold:

1. before piggyback-changelog optimization was introduced, there existed another similar optimization in a much cruder form where changelog would be written on first write, and cleared on close() [i.e, flush()].

2. write-behind (back then) was not intelligent enough to hold off issuing a flush downwards till previous writes were fulfilled.

So flush was doing a full file inodelk in order to confirm that all previous writes were complete, and that unsetting the changelog would be safe. It also helped the other side of the problem, that the next arriving write would wait till flush unlocked, and was guaranteed to see itself as the first write (and would set the changelog again) - (yes, we were really using inodelk() to synchronize on inode_ctx flag instead of a mutex)

That had a bunch of issues -

1. flush() is not reliable. Sometimes you don't get flushes. Sometimes you get them twice, or more. They loosely correlate with close(), not strict enough to base changelog clearing behavior on.

2. It was too simplistic and the code was such that if any write failure happened, handling it was very messy and almost always the only changelog we would be left with was a FOOL-FOOL state (unless there were no failure at all).

The current piggyback-changelog fixes all those issues. We don't need flush to be a transaction anymore.

We still need to synchronize flush() against lk() for keeping up POSIX conformance, but I feel that is a separate (open) problem.

Avati



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux