I have an Application with big write delays and need some help determining what may be causing it. The Application uses a proprietary MUMPS database, not a relational database manager like Oracle. Let me first explain the architecture a little. There is a buffer pool in Shared Memory. When a user process needs a record, it first searches the Shared Memory buffer pool for the block needed. If not found it allocates a buffer in Shared Memory and reads the block from the disk to the buffer. This way it can be accessed by other processes so they do not have to do the physical read again. If the block is modified, the user process does not do the write, it just leaves the block in the buffer pool and another support process (disk writer) will write it later. So the user process only has to wait on reads but not writes. If the user process has to modify the block and it has not been Before Imaged since a point in time, then the block is copied to another buffer that will be written to the Before Image Log by another support process (bil writer). There are a limited number of buffers for this use. The user process will also put the transaction that caused the block change in a journal buffer, which gets written by another support process (jnl writer). There are a limited number of buffers for this use. A little bit about the hardware (disk) layout. There is an HBA Raid Array where the OS, Swap and other file systems are located. The database data, BIL and JNL is stored directly on Logical Volumes in a Volume Group that has the Physical Volumes as LUNs on a SAN. So there is no file system on the LV, just direct reads and writes to the LV. Writes are done with the standard write system call, followed by a calling fdatasync, which causes the writer process to wait until the block is truly on the disk, well at least the SAN has accepted the block. The write and fdatasync normally take less than 0.0004 seconds on average. Now for the problem. Many times when a large file (1 to 2 G) is written to the local disk the writes/fdatasync suffer big time, from several seconds to several minutes at times. When this happens, the limited number of BIL and JNL buffers fills up and the Application user processes have to wait for them to be written before it can complete a transaction. This makes it seem like the Application has locked up, well basically it has because it is waiting on a resource. I don't understand how the I/O on the local disk is affecting the I/O going to the SAN. They are using different HBAs and unrelated to the Application, that is the I/O on the local disk is in no way using the Shared Memory the Application is using, so there should not be any memory page locking and such going on between them. The problem can be cause just by doing something like "cp /tmp/bigfile /var/tmp/bigfile" where /tmp and /var are on different file systems, but both are in the same VG on the local disk, which is a different VG than the database LVs are in. Running vmstat 5 while this is happen shows a few blocks being written, most likely the big file that was copied that is mostly located in the kernel buffers aging and being flushed to disk. Most systems have 4G or more of Physical Memory and when this happens there is still a good bit of free memory and nothing gets paged out. So I don't see it as an overall low memory problem. And Shared Memory has been locked in Physical Memory. When this first showed up as a problem I did some research and found that Shared Memory is one of the first things to get paged out. This is the reason it is now locked in Physical Memory, but it did not really help. The Application is running on RHEL 5.6 and this happens on various hardware from Dell, HP and IBM. All use different local disk HBA Raid controllers. I can provide more details about the exact kernel version and other things if needed. I know this has been long, but I hope you all will take the time to read all this and be able to make some good suggest at to what may be causing this problem. ----- Jack Allen -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list