Andrew Sullivan wrote: > On Thu, Apr 17, 2008 at 12:35:33AM +0800, Craig Ringer wrote: >> That's subject to the same issues, because a transaction's >> current_timestamp() is determined at transaction start. > > But clock_timestamp() (and its ancestors in Postgres) don't have that > restriction. True, as I noted later, but it doesn't help. AFAIK you can't guarantee that multiple concurrent INSERTs will be committed in the same order that their clock_timestamp() calls were evaluated. Consequently, you can still have a situation where a record with a lower timestamp becomes visible to readers after a record with a higher timestamp has, and after the reader has already recorded the higher timestamp as their cutoff. > I dunno that it's enough for you, though, since you have > visibility issues as well. You seem to want both the benefits of files and > relational database transactions, and I don't think you can really have both > at once without paying in reader complication. Or writer complication. In the end, the idea that using a file based log wouldn't have this problem is based on the implicit assumption that the file based logging mechanism would provide some sort of serialization of writes. As POSIX requires the write() call to be thread safe, write() would be doing its own internal locking (or doing clever lock-free queueing etc) to ensure writes are serialized. However, at least in Linux fairly recently, writes aren't serialized, so you have to do it yourself. See: http://lwn.net/Articles/180387/ In any case, logging to a file with some sort of writer serialization isn't significantly different to logging to a database table outside your transaction using some sort of writer serialization. Both mechanisms must serialize writers to work. Both mechanisms must operate outside the transactional rules of the transaction invoking the logging operation in order to avoid serializing all operations in transactions that write to the log on the log. > One way I can think of doing it is to write a seen_log that notes what the > client has already seen with a timestamp of (say) 1 minute. Then you can > say "go forward from this time excluding ids (ids here)". It won't work with multiple concurrent writers. There is no guarantee that an INSERT with a timestamp older than the one you just saw isn't waiting to commit. -- Craig Ringer