Re: PMTU discovery behaviour

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2017-09-21 14:01 GMT+03:00 Neil Horman <nhorman@xxxxxxxxxxxxx>:
> On Wed, Sep 20, 2017 at 02:02:45PM +0300, Peter Salin wrote:
>> 2017-09-19 20:09 GMT+03:00 Neil Horman <nhorman@xxxxxxxxxxxxx>:
>> > On Mon, Sep 11, 2017 at 03:44:57PM +0300, Peter Salin wrote:
>> >> Hi,
>> >>
>> >> I encountered some strange PMTUD related behaviour that I need help in
>> >> understanding.
>> >>
>> >> Setup:
>> >>
>> >> +-----------+        +---+        +--------+
>> >> | 10.0.0.10 |--------| X |--------|10.0.0.3|
>> >> +-----------+        +---+        +--------+
>> >>
>> >> A one to many socket is setup at 10.0.0.10. Two instances of the
>> >> lksctp sctp_darn applications are ran at 10.0.0.3 listening to ports
>> >> 8001 and 8002. 10.0.0.3 was also setup to generate ICMP frag needed
>> >> messages for incoming messages over 600 bytes. This same issue also
>> >> occurs also when a router on the path was setup to generate the ICMP
>> >> message instead.
>> >>
>> >> Test 1:
>> >> Two associations were connected from 10.0.0.10 to 10.0.0.3, one to
>> >> port 8001 and another one to 8002. Then a too large message was sent
>> >> on the association to 8001, triggering ICMP generation. When checking
>> >> the MTU reported in spinfo_mtu field of SCTP_GET_PEER_ADDR_INFO, the
>> >> association now reports 600. The association to 8002 reports 1500
>> >> until traffic is sent on it, at which point it also adjusts to 600
>> >> which I think makes sense since the destination IP is the same. When
>> >> reopening the associations, the value of 600 would be remembered for
>> >> about 10 min, which I also think makes sense since
>> >> net.ipv4.route.mtu_expires is 600.
>> >>
>> >> Test 2:
>> >> Again the same two associations were connected to 10.0.0.3, but in
>> >> addition an attempt to connect a third association to a non-existing
>> >> IP was done, this attempt fails with timeout after a while. After
>> >> that, again an ICMP triggering large message was sent to 8001. Now the
>> >> behaviour is different from before. The association to 8001 reports a
>> >> spinfo_mtu of 600, but only for a brief moment, it does not stay at
>> >> 600 for 10 minutes. In addition the spinfo_mtu of the association to
>> >> 8002 never changes, it stays at the original 1500.
>> >>
>> >> The only difference between the two tests is the attempt to connect to
>> >> a non-responding IP at the beginning of test 2. Any ideas why the
>> >> behaviour changes, is this a bug or is there some other reason for
>> >> this?
>> >>
>> >> I have attached the sample application used for reproducing this.
>> >>
>> >> BR,
>> >> -Peter
>> >>
>> > Hey, apologies for the delay on this, I've had it in my reader for days and kept
>> > meaning to respond, but kept getting sidetracked.
>> >
>> > First glance, this sounds incorrect.  Each association (or rather each
>> > transport) maintains its own mtu, and the association reflects the mtu of the
>> > active transport. Given that each transport holds its own dst cache entry, I
>> > have a hard time seeing how one transports mtu changes might leak to another
>> >
>> > But thats not really whats happening here.  By your description, the active
>> > transport on the established association isn't updating its pathmtu, which
>> > should happen in response to receiving the ICMP_FRAG_NEEDED message.
>> >
>> > I know you've provided the reproducer bellow, and I appreciate that, but I don't
>> > have the cycles to set this up at the moment.  Could you tell me if, during the
>> > second test, after you attempt to connect to the fake ip address and then send
>> > the large message that should trigger the frag needed message, does said large
>> > message get retransmitted and eventually arrive at the peer host?  If so, that
>> > suggests that the sctp stack:
>> >
>> > a) receives the frag needed message
>> > and
>> > b) resends the packet at the lower frag point
>> >
>> > That in turn suggests we just have some internal reporting error in which we
>> > don't update the associations pmtu with the active transports
>> >
>> > Let me know the answer to that question and it will give me some places to start
>> > looking
>> > Neil
>> >
>> Thanks for responding. In response to your question, the first large
>> message does get retransmitted without the Don't Fragment bit set. I
>> modified the test a bit to also send further messages after the first
>> one. Those messages are indeed fragmented according to the limit of
>> the ICMP message. I have attached a PCAP trace and SCTP debug logs in
>> case that helps here.
>>
>> I also tried sending a large message on the other association after
>> the large message on the first association had been sent. For test 2
>> that message was not fragmented even though the ICMP was already
>> received for the first assoc. After the second assoc also received an
>> ICMP it adjusted to use the lower MTU for subsequent messages. In the
>> case of test 1, sending a large message on the second assoc would auto
>> fragment already on the first message.
>>
>> Also, after stopping and rerunning test 2 the MTU would always be
>> reset at 1500, whereas in test 1 the lower limit would still be in
>> effect for a new run. So it seems like in test 2 the lower MTU is only
>> known within each association, where as in test 1 the lower MTU also
>> gets stored deeper down?
>>
>> BR,
>> -Peter
>>
> So, from what I can see, your included tcpdump only shows the first part of what
> you are describing.  That is to say that it sends a large data chunk on an
> association that gets an ICMP frag needed response, after which the pmtu is
> lowered and smaller message fragments are sent, which is good (i.e. working as
> designed).
>
> I don't see anything in the tcpdump relating to the remainder of your test,
> showing failed fragmentation.  Can you include that please?
>
> Neil
Yes, please find attached traces that include sending on the other
association after receiving the first ICMP.

BR,
-Peter
>
>> >> ------ ver_linux output ------
>> >> Linux esalipe-test 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11
>> >> 21:17:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>> >>
>> >> GNU C                   5.4.0
>> >> GNU Make                4.1
>> >> Binutils                2.26.1
>> >> Util-linux              2.27.1
>> >> Mount                   2.27.1
>> >> Module-init-tools       22
>> >> E2fsprogs               1.42.13
>> >> Xfsprogs                4.3.0
>> >> Linux C Library         2.23
>> >> Dynamic linker (ldd)    2.23
>> >> Linux C++ Library       6.0.21
>> >> Procps                  3.3.10
>> >> Net-tools               1.60
>> >> Kbd                     1.15.5
>> >> Console-tools           1.15.5
>> >> Sh-utils                8.25
>> >> Udev                    229
>> >> Modules Loaded          ablk_helper aes_x86_64 aesni_intel
>> >> async_memcpy async_pq async_raid6_recov async_tx async_xor autofs4
>> >> binfmt_misc btrfs  crc32_pclmul crct10dif_pclmul cryptd floppy
>> >> gf128mul ghash_clmul ni_intel glue_helper hid hid_generic ib_addr
>> >> ib_cm ib_core ib_iser ib_mad ib_sa input_leds irqbypass iscsi_tcp
>> >> iw_cm joydev kvm kvm_intel libcrc32c libiscsi libiscsi_tcp linear lrw
>> >> multipath parport parport_pc ppdev psmouse raid0 raid1 raid10 raid456
>> >> raid6_pq rdma_cm scsi_transport_iscsi sctp serio_raw usbhid xor
>> >
>> >>
>> >> #include <cstring>
>> >> #include <ctime>
>> >> #include <iomanip>
>> >> #include <iostream>
>> >>
>> >> #include <errno.h>
>> >> #include <unistd.h>
>> >> #include <arpa/inet.h>
>> >> #include <net/if.h>
>> >> #include <netinet/in.h>
>> >> #include <netinet/sctp.h>
>> >> #include <sys/ioctl.h>
>> >> #include <sys/socket.h>
>> >>
>> >> using namespace std;
>> >>
>> >> static const int ERROR_BUFLEN = 64;
>> >> static const char* SCTP_INTERFACE_NAME = "ens4";
>> >>
>> >> static string data100 = "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789";
>> >> static string data1000 = "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789"
>> >>   "01234567890123456789012345678901234567890123456789";
>> >>
>> >> void printError(const string& msg, const string& funcName) {
>> >>   char errorMessage[ERROR_BUFLEN] {};
>> >>   char* errMsg = ::strerror_r(errno, errorMessage,
>> >>                             sizeof(errorMessage));
>> >>
>> >>   cerr << "::" << funcName << ": " << msg << ": " << errMsg << endl;
>> >> }
>> >>
>> >> int createSocket() {
>> >>   int sockFd = socket (AF_INET,
>> >>                      SOCK_SEQPACKET,
>> >>                      IPPROTO_SCTP);
>> >>   if (sockFd == -1) {
>> >>     printError("Creation of socket failed", __FUNCTION__);
>> >>     return -1;
>> >>   }
>> >>
>> >>   // Enable address reuse
>> >>   int enable = 1;
>> >>   int err = setsockopt(sockFd,
>> >>                      SOL_SOCKET,
>> >>                      SO_REUSEADDR,
>> >>                      &enable,
>> >>                      sizeof(enable));
>> >>
>> >>   if (err) {
>> >>     printError("Error setting socket option SO_REUSEADDR", __FUNCTION__);
>> >>     close(sockFd);
>> >>     return -1;
>> >>   }
>> >>
>> >>   // Configure SCTP
>> >>   sctp_initmsg initmsg{};
>> >>   initmsg.sinit_num_ostreams = 3;
>> >>   initmsg.sinit_max_instreams = 3;
>> >>   initmsg.sinit_max_attempts = 2;
>> >>   initmsg.sinit_max_init_timeo = 0;
>> >>
>> >>   err = setsockopt(sockFd,
>> >>                  IPPROTO_SCTP,
>> >>                  SCTP_INITMSG,
>> >>                  &initmsg,
>> >>                  sizeof(initmsg));
>> >>
>> >>   if (err) {
>> >>     printError("Configuring SCTP socket failed", __FUNCTION__);
>> >>     close(sockFd);
>> >>     return -1;
>> >>   }
>> >>
>> >>   struct sctp_paddrparams paddr_params{};
>> >>   memset(&paddr_params, 0, sizeof(paddr_params));
>> >>   socklen_t size_of_sctp_paddr_params = sizeof(paddr_params);
>> >>   paddr_params.spp_flags = SPP_HB_ENABLE | SPP_PMTUD_ENABLE | SPP_SACKDELAY_ENABLE;
>> >>
>> >>   err = setsockopt(sockFd,
>> >>                  IPPROTO_SCTP,
>> >>                  SCTP_PEER_ADDR_PARAMS,
>> >>                  &paddr_params,
>> >>                  size_of_sctp_paddr_params);
>> >>
>> >>   if (err) {
>> >>     printError("Configuring SCTP params failed", __FUNCTION__);
>> >>     close(sockFd);
>> >>     return -1;
>> >>   }
>> >>
>> >>   return sockFd;
>> >> }
>> >>
>> >> bool bindSocket(const int sockFd, const int localPort) {
>> >>   // Get IP of ethernet interface
>> >>   string localAddress = "";
>> >>   ifreq ifr{};
>> >>   ifr.ifr_addr.sa_family = AF_INET;
>> >>   strncpy(ifr.ifr_name, SCTP_INTERFACE_NAME, IFNAMSIZ - 1);
>> >>   const int ioctlStatus = ioctl(sockFd,
>> >>                               SIOCGIFADDR,
>> >>                               &ifr);
>> >>
>> >>   if (ioctlStatus == -1) {
>> >>     printError("Failed to get local address", __FUNCTION__);
>> >>     return false;
>> >>   }
>> >>
>> >>   char ipAddrBuffer[INET_ADDRSTRLEN] {};
>> >>   inet_ntop(AF_INET,
>> >>           &reinterpret_cast<sockaddr_in*>(&(ifr.ifr_addr))->sin_addr,
>> >>           ipAddrBuffer,
>> >>           sizeof(ipAddrBuffer));
>> >>
>> >>   localAddress.assign(ipAddrBuffer);
>> >>
>> >>   // Bind to found ip address
>> >>   sockaddr_in serv_addr{};
>> >>   serv_addr.sin_family = AF_INET;
>> >>   inet_pton(AF_INET,
>> >>           localAddress.c_str(),
>> >>           &serv_addr.sin_addr);
>> >>   serv_addr.sin_port = htons(localPort);
>> >>
>> >>   if (bind(sockFd,
>> >>          reinterpret_cast<sockaddr*>(&serv_addr),
>> >>          sizeof(serv_addr))) {
>> >>     printError("Failed to bind socket to local address", __FUNCTION__);
>> >>     localAddress.clear();
>> >>     close(sockFd);
>> >>     return false;
>> >>   }
>> >>
>> >>   cout << "Local endpoint succussfully bound to local address: " << localAddress << endl;
>> >>
>> >>   return true;
>> >> }
>> >>
>> >> bool openAssociation(const int sockFd,
>> >>                    const string &remoteAddress,
>> >>                    std::uint16_t remotePort) {
>> >>
>> >>   sockaddr_in address{};
>> >>   address.sin_family = AF_INET;
>> >>   inet_pton(AF_INET, remoteAddress.c_str(), &address.sin_addr);
>> >>   address.sin_port = htons(remotePort);
>> >>
>> >>   int connectError = connect(sockFd,
>> >>                            reinterpret_cast<sockaddr *>(&address),
>> >>                            sizeof(address));
>> >>   if (connectError) {
>> >>     printError("Error connecting association", __FUNCTION__);
>> >>     return false;
>> >>   }
>> >>
>> >>   cout << "Association connected to address: " << remoteAddress << ":" << remotePort << endl;
>> >>   return true;
>> >> }
>> >>
>> >> void sendReq(const int sockFd,
>> >>            const string& remoteAddress,
>> >>            const uint16_t remotePort,
>> >>            const std::string& data)
>> >> {
>> >>
>> >>   struct sockaddr_in remoteAddr {};
>> >>   remoteAddr.sin_family = AF_INET;
>> >>   remoteAddr.sin_port = htons(remotePort);
>> >>
>> >>   uint32_t payloadProtId = 7;
>> >>   uint16_t streamId = 0;
>> >>   uint32_t dataLength = data.size();
>> >>   sockaddr* servaddr = reinterpret_cast<sockaddr*>(&remoteAddr);
>> >>   inet_pton(AF_INET, remoteAddress.c_str(), &remoteAddr.sin_addr);
>> >>
>> >>   const std::string ipaddr =
>> >>     inet_ntoa(reinterpret_cast<sockaddr_in*>(servaddr)->sin_addr);
>> >>
>> >>   cout << "Sending SCTP req to " << remoteAddress << ":" << remotePort;
>> >>   cout << ", len=" << dataLength << endl;
>> >>
>> >>   const int bytesSent = sctp_sendmsg(sockFd,
>> >>                                    data.c_str(),
>> >>                                    (size_t)dataLength,
>> >>                                    servaddr,
>> >>                                    sizeof(sockaddr_in),
>> >>                                    htonl(payloadProtId),
>> >>                                    SCTP_ADDR_OVER,
>> >>                                    streamId,
>> >>                                    200,
>> >>                                    0);
>> >>
>> >>   if (bytesSent == -1) {
>> >>     printError("SCTP send failed", __FUNCTION__);
>> >>   }
>> >>
>> >>   return;
>> >> }
>> >>
>> >> sctp_assoc_t getSocketAssociationId(const int sockFd,
>> >>                                   const string &remoteIpAddress,
>> >>                                   std::uint16_t remotePort)
>> >>
>> >> {
>> >>   sockaddr_in socket_address_in{};
>> >>
>> >>   socket_address_in.sin_family = AF_INET;
>> >>   socket_address_in.sin_port = htons(remotePort);
>> >>   inet_pton(AF_INET, remoteIpAddress.c_str(), &socket_address_in.sin_addr);
>> >>
>> >>   struct sockaddr *socket_address = reinterpret_cast<sockaddr*>(&socket_address_in);
>> >>   socklen_t salen = sizeof(&socket_address);
>> >>
>> >>   struct sctp_paddrinfo peer_address_info{};
>> >>   socklen_t size_of_sctp_paddrinfo = sizeof peer_address_info;
>> >>   std::memcpy(&peer_address_info.spinfo_address, socket_address, salen);
>> >>
>> >>   const int sctpOptInfoError = sctp_opt_info(sockFd,
>> >>                                            0,
>> >>                                            SCTP_GET_PEER_ADDR_INFO,
>> >>                                            &peer_address_info,
>> >>                                            &size_of_sctp_paddrinfo);
>> >>
>> >>   if (sctpOptInfoError) {
>> >>     printError("Failed to get association id", __FUNCTION__);
>> >>   }
>> >>
>> >>   return peer_address_info.spinfo_assoc_id;
>> >> }
>> >>
>> >> std::uint32_t getAssociationPathMtu(const int sockFd,
>> >>                                   const string &remoteIpAddress,
>> >>                                   const std::uint16_t remotePort) {
>> >>   sockaddr_in socket_address_in{};
>> >>
>> >>   socket_address_in.sin_family = AF_INET;
>> >>   socket_address_in.sin_port = htons(remotePort);
>> >>   inet_pton(AF_INET, remoteIpAddress.c_str(), &socket_address_in.sin_addr);
>> >>
>> >>   struct sockaddr *socket_address = reinterpret_cast<sockaddr*>(&socket_address_in);
>> >>   socklen_t salen = sizeof(&socket_address);
>> >>
>> >>   struct sctp_paddrinfo peer_address_info{};
>> >>   socklen_t size_of_sctp_paddrinfo = sizeof(peer_address_info);
>> >>   std::memcpy(&peer_address_info.spinfo_address, socket_address, salen);
>> >>
>> >>   sctp_assoc_t sctpAssociationId = getSocketAssociationId(sockFd, remoteIpAddress, remotePort);
>> >>
>> >>   const int sctpOptInfoError = sctp_opt_info(sockFd, sctpAssociationId,
>> >>                                            SCTP_GET_PEER_ADDR_INFO,
>> >>                                            &peer_address_info, &size_of_sctp_paddrinfo);
>> >>   if (sctpOptInfoError) {
>> >>     printError("Failed to get pmtu", __FUNCTION__);
>> >>   }
>> >>
>> >>   auto t = std::time(nullptr);
>> >>   auto tm = *std::localtime(&t);
>> >>   std::cout << std::put_time(&tm, "%H:%M:%S ") << remoteIpAddress << ":" << remotePort;
>> >>   cout << " currently has a PMTU of " << peer_address_info.spinfo_mtu << endl;
>> >>
>> >>   return peer_address_info.spinfo_mtu;
>> >> }
>> >>
>> >> void test1(const string& data) {
>> >>   int localPort = 2944;
>> >>   string remoteIp1 = "10.0.0.3";
>> >>   uint16_t remotePort1 = 8001;
>> >>   uint16_t remotePort2 = 8002;
>> >>
>> >>   int sockFd = createSocket();
>> >>   bindSocket(sockFd, localPort);
>> >>
>> >>   cout << "### Test 1: 2 assocs" << endl;
>> >>
>> >>   openAssociation(sockFd, remoteIp1, remotePort1);
>> >>   openAssociation(sockFd, remoteIp1, remotePort2);
>> >>
>> >>   getAssociationPathMtu(sockFd, remoteIp1, remotePort1);
>> >>   getAssociationPathMtu(sockFd, remoteIp1, remotePort2);
>> >>
>> >>   sendReq(sockFd, remoteIp1, remotePort1, data);
>> >>   for (int i = 0; i < 10; i++) {
>> >>     sleep(10);
>> >>     getAssociationPathMtu(sockFd, remoteIp1, remotePort1);
>> >>     getAssociationPathMtu(sockFd, remoteIp1, remotePort2);
>> >>   }
>> >> }
>> >>
>> >> void test2(const string& data) {
>> >>   int localPort = 2944;
>> >>   string remoteIp1 = "10.0.0.3";
>> >>   uint16_t remotePort1 = 8001;
>> >>   uint16_t remotePort2 = 8002;
>> >>   string remoteIpFake = "10.52.96.204";
>> >>   uint16_t remotePortFake = 3239;
>> >>
>> >>   int sockFd = createSocket();
>> >>   bindSocket(sockFd, localPort);
>> >>
>> >>   cout << "### Test 2: 2 assocs + 1 unreachable assoc" << endl;
>> >>
>> >>   openAssociation(sockFd, remoteIp1, remotePort1);
>> >>   openAssociation(sockFd, remoteIp1, remotePort2);
>> >>   openAssociation(sockFd, remoteIpFake, remotePortFake);
>> >>
>> >>   getAssociationPathMtu(sockFd, remoteIp1, remotePort1);
>> >>   getAssociationPathMtu(sockFd, remoteIp1, remotePort2);
>> >>
>> >>   sendReq(sockFd, remoteIp1, remotePort1, data);
>> >>   for (int i = 0; i < 10; i++) {
>> >>     sleep(10);
>> >>     getAssociationPathMtu(sockFd, remoteIp1, remotePort1);
>> >>     getAssociationPathMtu(sockFd, remoteIp1, remotePort2);
>> >>   }
>> >> }
>> >>
>> >>
>> >> int main(int argc, char** argv) {
>> >>   string testNr = "1";
>> >>   string& testData = data1000;
>> >>   if (argc >= 2) {
>> >>     testNr = argv[1];
>> >>   }
>> >>   if (argc >= 3) {
>> >>     testData = data100;
>> >>   }
>> >>
>> >>   if (testNr == "1") {
>> >>     test1(testData);
>> >>   } else {
>> >>     test2(testData);
>> >>   }
>> >>
>> >>   return 0;
>> >> }
>> >
>
>

Attachment: test2_traces_send_on_both_assocs.tar.gz
Description: GNU Zip compressed data


[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux