Re: pg_dump: Error message from server: lost synchronization with server: got messag e type "d",

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I wrote:
> Hm.  Given that the message type and length seem perfectly reasonable,
> I suspect this must actually represent an out-of-memory condition within
> pg_dump (*not* on the server end).  But you'd have to be running it on a
> toy box, or with a rather silly ulimit, for 6MB to be a problem...

BTW, how old is your pg_dump (or really, libpq)?  I wonder if you are
hitting this bug in some form:

Author: Tom Lane <tgl@xxxxxxxxxxxxx>
Branch: master Release: REL9_4_BR [2f557167b] 2014-05-07 21:39:13 -0400
Branch: REL9_3_STABLE Release: REL9_3_5 [b4f9c93ce] 2014-05-07 21:38:38 -0400
Branch: REL9_2_STABLE Release: REL9_2_9 [f7672c8ce] 2014-05-07 21:38:41 -0400
Branch: REL9_1_STABLE Release: REL9_1_14 [86888054a] 2014-05-07 21:38:44 -0400
Branch: REL9_0_STABLE Release: REL9_0_18 [77e662827] 2014-05-07 21:38:47 -0400
Branch: REL8_4_STABLE Release: REL8_4_22 [664ac3de7] 2014-05-07 21:38:50 -0400

    Avoid buffer bloat in libpq when server is consistently faster than client.
    
    If the server sends a long stream of data, and the server + network are
    consistently fast enough to force the recv() loop in pqReadData() to
    iterate until libpq's input buffer is full, then upon processing the last
    incomplete message in each bufferload we'd usually double the buffer size,
    due to supposing that we didn't have enough room in the buffer to finish
    collecting that message.  After filling the newly-enlarged buffer, the
    cycle repeats, eventually resulting in an out-of-memory situation (which
    would be reported misleadingly as "lost synchronization with server").
    Of course, we should not enlarge the buffer unless we still need room
    after discarding already-processed messages.
    
    This bug dates back quite a long time: pqParseInput3 has had the behavior
    since perhaps 2003, getCopyDataMessage at least since commit 70066eb1a1ad
    in 2008.  Probably the reason it's not been isolated before is that in
    common environments the recv() loop would always be faster than the server
    (if on the same machine) or faster than the network (if not); or at least
    it wouldn't be slower consistently enough to let the buffer ramp up to a
    problematic size.  The reported cases involve Windows, which perhaps has
    different timing behavior than other platforms.
    
    Per bug #7914 from Shin-ichi Morita, though this is different from his
    proposed solution.  Back-patch to all supported branches.

			regards, tom lane


-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux