On Fri, Aug 14, 2009 at 4:21 PM, Tom Lane<tgl@xxxxxxxxxxxxx> wrote: > Jeff Janes <jeff.janes@xxxxxxxxx> writes: >> I apologize if it is bad form to respond to a message that is two >> months old, but I did not see this question answered elsewhere and >> thought it would be helpful to have it answered. This my rough >> understanding. Oracle never "takes" a snapshot, it computes one the >> fly, if and when it is needed. It maintains a structure of recently >> committed transactions, with the XID for when they committed. If a >> process runs into a tuple that is neither from the future nor from the >> deep past, it consults this structure to see if that transaction has >> committed, and if so whether it did so before or after the current >> query was started. The structure is partionable so it does not have >> one global lock to serialize on, and the lock is short as it only gets >> the info it needs, not the entire set of global open transactions. > > Are you sure it's partitionable? I don't have inside knowledge, but I'm pretty sure that that structure is partionable. Each data block has in its header a list of in-doubt transactions touching that block, and a link to where in the rollback/UNDO to find info on each one. The UNDO header knows that transaction's status. Of course there is still the global serialization on obtaining the SCN, but the work involved in obtaining that (other than fighting over the lock) is constant, it doesn't increase with the number of backends. Real Applications Clusters must have solved that somehow, I don't recall how. But I think it does make compromises, like in read committed mode a change made by another transaction might be invisible to your simple select statements for up to 3 seconds or so. I've never had the opportunity to play with a RAC. For all I know, the work of scanning ProcArray is trivial compared to the work of obtaining the lock, even if the array is large. If I had the talent to write and run stand alone programs that could attach themselves to the shared memory structure and then run my arbitrary code, I would test that out. > I've been told that Oracle's > transaction log is a serious scalability bottleneck. (But I think > I first heard that in 2001, so maybe they've improved it in recent > releases.) Well, something always has to be the bottleneck. Do you know at what degree of scaling that became a major issue? I don't think that there is a point of global serialization, other than taking SCNs, but if there is enough pair-wise fighting, it could still add up to a lot of contention. > We know that Postgres' WAL log is a bottleneck --- check > for recent discussions involving XLogInsert. Would these two be good places for me to start looking into that: http://archives.postgresql.org/pgsql-hackers/2009-06/msg01205.php http://archives.postgresql.org/pgsql-hackers/2009-06/msg01019.php Or is bulk-copy (but with WAL logging) to specific to apply findings to the general case? > But the WAL log is > only touched by read-write transactions, whereas in Oracle even > read-only transactions often have to go to the transaction log. That's true, but any given read only transaction shouldn't have to make heavy use of the transaction log just to decide if a transaction has committed or not. It should be able to look that up once and cache it for the rest of that subtran. Of course if it actually has to construct a consistent read from the UNDO on many different buffers due to the same interfering transaction, that is more work and more contention. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance