On Wed, Nov 23, 2022 at 12:10 PM Jan Kara <jack@xxxxxxx> wrote: > > On Wed 16-11-22 18:24:06, Amir Goldstein wrote: > > > > Why then give up on the POST_WRITE events idea? > > > > Don't you think it could work? > > > > > > So as we are discussing, the POST_WRITE event is not useful when we want to > > > handle crash safety. And if we have some other mechanism (like SRCU) which > > > is able to guarantee crash safety, then what is the benefit of POST_WRITE? > > > I'm not against POST_WRITE, I just don't see much value in it if we have > > > another mechanism to deal with events straddling checkpoint. > > > > > > > Not sure I follow. > > > > I think that crash safety can be achieved also with PRE/POST_WRITE: > > - PRE_WRITE records an intent to write in persistent snapshot T > > and add to in-memory map of in-progress writes of period T > > - When "checkpoint T" starts, new PRE_WRITES are recorded in both > > T and T+1 persistent snapshots, but event is added only to > > in-memory map of in-progress writes of period T+1 > > - "checkpoint T" ends when all in-progress writes of T are completed > > So maybe I miss something but suppose the situation I was mentioning few > emails earlier: > > PRE_WRITE for F -> F recorded as modified in T > modify F > POST_WRITE for F > > PRE_WRITE for F -> ignored because F is already marked as > modified > > -> checkpoint T requested, modified files > reported, process modified files > modify F > --------- crash > > Now unless filesystem freeze or SRCU is part of checkpoint, we will never > notify about the last modification to F. So I don't see how PRE + > POST_WRITE alone can achieve crash safety... > > And if we use filesystem freeze or SRCU as part of checkpoint, then > processing of POST_WRITE events does not give us anything new. E.g. > synchronize_srcu() during checkpoing before handing out list of modified > files makes sure all modifications to files for which PRE_MODIFY events > were generated (and thus are listed as modified in checkpoint T) are > visible for userspace. > > So am I missing some case where POST_WRITE would be more useful than SRCU? > Because at this point I'd rather implement SRCU than POST_WRITE. > I tend to agree. Even if POST_WRITE can be done, SRCU will be far better. > > The trick with alternating snapshots "handover" is this > > (perhaps I never explained it and I need to elaborate on the wiki [1]): > > > > [1] https://github.com/amir73il/fsnotify-utils/wiki/Hierarchical-Storage-Management-API#Modified_files_query > > > > The changed files query results need to include recorded changes in both > > "finalizing" snapshot T and the new snapshot T+1 that was started in > > the beginning of the query. > > > > Snapshot T MUST NOT be discarded until checkpoint/handover > > is complete AND the query results that contain changes recorded > > in T and T+1 snapshots have been consumed. > > > > When the consumer ACKs that the query results have been safely stored > > or acted upon (I called this operation "bless" snapshot T+1) then and > > only then can snapshot T be discarded. > > > > After snapshot T is discarded a new query will start snapshot T+2. > > A changed files query result includes the id of the last blessed snapshot. > > > > I think this is more or less equivalent to the SRCU that you suggested, > > but all the work is done in userspace at application level. > > > > If you see any problem with this scheme or don't understand it > > please let me know and I will try to explain better. > > So until now I was imagining that query results will be returned like a one > big memcpy. I.e. one off event where the "persistent log daemon" hands over > the whole contents of checkpoint T to the client. Whatever happens with the > returned data is the bussiness of the client, whatever happens with the > checkpoint T records in the daemon is the daemon's bussiness. The model you > seem to speak about here is somewhat different - more like readdir() kind > of approach where client asks for access to checkpoint T data, daemon > provides the data record by record (probably serving the data from its > files on disk), and when the client is done and "closes" checkpoint T, > daemon's records about checkpoint T can be erased. Am I getting it right? > Yes, something like that. The query result (which is actually a recursive readdir) could be huge. So it cannot really be returned as a blob, it must be steamed to consumers. > This however seems somewhat orthogonal to the SRCU idea. SRCU essentially > serves the only purpose - make sure that modifications to all files for > which we have received PRE_WRITE event are visible in respective files. > Absolutely right. Sorry for the noise, but at least you've learned one more thing about my persistent change snapshots architecture ;-) Thanks, Amir.