> I can accept faster in certain cases but if you say HUGELY faster, I
would like to see some numbers.
Ok, first a specific case that actually came up at my work in the last
week.....
We've got a middleware messaging application we developed we'll call a
'republisher'. it recieves a stream of 'event' messages from a upstream
server, and forwards them to a number of downstream servers that have
registered ('subscribed') with it. it queues for each downstream
subscriber so if they are down, it will hold event messages for it. not
all servers want all 'topics', so we only send each server the specific
event types its interested in. it writes the incoming event stream to
a number of subscriber queue files as well as a series of journal files
to track all this state. there is a queue for each incoming 'topic', its
entries aren't cleared til the last subscriber on that topic has
confirmed delivery.
{before someone screams my-favorite-MQ-server, let me just say, we HAD a
commercial messaging system doing this, and our production operations
staff is fed up with realtime problems that involve multinational vendor
finger pointing, so we have developed our own to replace it}
On a typical dual xeon linux server running CentOS 4.4, with a simple
direct connect disk, this republisher can easily handle 1000
messages/second using simple write(). However, if this process is
busy humming away under the production workload of 60-80 messages/sec,
and the power is pulled or the server crashes (this happened exactly
once so far at a Thailand manufacturing facility, due to operator
error), it lost 2000+ manufacturing events that the downstream servers
couldn't easily recover, this was data in Linux's disk cache that hadn't
yet been commited to disk. So, the obvious solution is to call fsync()
on the various files after each 'event' has been processed, to insure
its an atomic operation.
However, if this republisher does an fsync() after each event, it slows
to like 50/second on a direct connect disk. If its run on a similar
server with RAID controller that has battery-protected writeback cache
enabled, it can easily do 700-800/second. We need 100+/second and
prefer 200/second to have margins for catchup after data interruptions.
now, everything I've described above is a rather unusual application...
so let me present a far more common scenarios...
Relational DB Management Servers, like Oracle, or PostgreSQL. when
the RDBMS does a 'commit' at transaction END;, the server HAS to fsync
its buffers to disk to maintain data integrity. With a writeback
cache disk controller, the controller can acknowlege the writes as soon
as the data is in its cache, then it can write that data to disk at its
leisure. With software RAID, the server has to wait until ALL
drives of the RAID slice have seeked, and completed the physical writes
to the disk. In a write intensive database, where most of the read
data is cached to memory, this is a HUGE performance hit.
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos