Short story, 'memmem()' is a gnuish, nonstandard function. I wanted to provide a generic fallback in some code. So, lets borrow it somewhere else: First looking at git's compat/memmem.c which I found was suboptimal with roughly O(M*N) performance (albeit ok for that case since it was just a generic fallback). Well, second taking a look at glibc's source surprised me, there is the same code as in git. I somewhat expected a faster implementation from a generic C library. That thought and done, the code is attached to this mail. The algorithm is similar to the Rabin/Karp method for string searches but uses a weaker and simpler additive rolling hash. The result is still somewhat like O(M+N) for most cases (There may be corner cases where it not that good, but its really hard to imagine those). Anyways, it is always faster than the current implementation, in my tests with common data (parsing multipart/form-data cgi responses) it yields approx 20%-100% improvements and with little artificial cheating (having strings in haystack which only differ at the last char of needle) the improvements are much better (500%, since is another complexity class). The fact is that the old implementation does a brute force search which has corner cases which are quite easy to hit (repeating symbol sequences, small set of possible symbols) while for my implementation the corner cases are much more rare and even then, it will still perform better than the old implementation. The code is not much more complex as the old one, not the original 'Rabin/Karp' algorithm because that would require a efficient modulo operation with a prime number which might be slower on some platforms (just a guess, I didn't even tried, the performance satisfies me perfectly). Other search algorithms like 'Knuth/Morris/Pratt' or 'Boyer/More' are more complex to implement and require O(Needle) extra memory for common implementations, which reserve them for special purposes imo. I just wanted to keep it simple but still considerably better than the current one. Feel free to use the code in glibc and/or git. Christian
/* Copyright (C) 1991,92,93,94,96,97,98,2000,2004 Free Software Foundation, Inc. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. */ #include <stddef.h> #include <string.h> #ifndef _LIBC # define __builtin_expect(expr, val) (expr) #endif #undef memmem /* Return the first occurrence of NEEDLE in HAYSTACK. */ void * memmem (haystack, haystack_len, needle, needle_len) const void *haystack; size_t haystack_len; const void *needle; size_t needle_len; { /* not really Rabin-Karp, just using additive hashing */ char* haystack_ = (char*)haystack; char* needle_ = (char*)needle; int hash = 0; /* this is the static hash value of the needle */ int hay_hash = 0; /* rolling hash over the haystack */ char* last; size_t i; if (haystack_len < needle_len) return NULL; if (!needle_len) return haystack_; /* initialize hashes */ for (i = needle_len; i; --i) { hash += *needle_++; hay_hash += *haystack_++; } /* iterate over the haystack */ haystack_ = (char*)haystack; needle_ = (char*)needle; last = haystack_+(haystack_len - needle_len + 1); for (; haystack_ < last; ++haystack_) { if (__builtin_expect(hash == hay_hash, 0) && *haystack_ == *needle_ && /* prevent calling memcmp, was a optimization from existing glibc */ !memcmp (haystack_, needle_, needle_len)) return haystack_; /* roll the hash */ hay_hash -= *haystack_; hay_hash += *(haystack_+needle_len); } return NULL; }