On Mar 25, 2014, at 8:46 AM, Matthew Spilich wrote: > Has any on the forum seen something similar? I think I reported similar phenomenon in my SIGMOD 2013 paper (Latch-free data structures for DBMS: design, implementation, and evaluation, <http://dl.acm.org/citation.cfm?id=2463720>). > ----- 47245 ----- > 0x00000037392eb197 in semop () from /lib64/libc.so.6 > #0 0x00000037392eb197 in semop () from /lib64/libc.so.6 > #1 0x00000000005e0c87 in PGSemaphoreLock () > #2 0x000000000061e3af in LWLockAcquire () > #3 0x000000000060aa0f in ReadBuffer_common () > #4 0x000000000060b2e4 in ReadBufferExtended () ... > ----- 47257 ----- > 0x00000037392eb197 in semop () from /lib64/libc.so.6 > #0 0x00000037392eb197 in semop () from /lib64/libc.so.6 > #1 0x00000000005e0c87 in PGSemaphoreLock () > #2 0x000000000061e3af in LWLockAcquire () > #3 0x000000000060aa0f in ReadBuffer_common () > #4 0x000000000060b2e4 in ReadBufferExtended () ... These stack trace results indicate that there was heavy contention of LWLocks for buffers. What I observed is that, in a similar situation, there was also heavy contention on spin locks that ensure mutual exclusion of LWLock status data. Those contentions resulted in a sudden increase in CPU utilization, which is consistent with the following description. > At the time of the event, we see a spike in system CPU and load average, but we do not see a corresponding spike in disk reads or writes which would indicate IO load. If the cause of the problem is the same as what I observed, a possible instant countermeasure is increasing the value of 'NUM_BUFFER_PARTITIONS' defined in src/include/storage/lwlock.h from 16 to, for example, 128 or 256, and build the binary. # Using latch-free buffer manager, proposed in my paper, would take long time, since it is not unincorporated in the upstream. -- Takashi Horikawa, Ph.D., Knowledge Discovery Research Laboratories, NEC Corporation.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature