Hi Jan If I can suggest that you look at: http://engineering.linkedin.com/performance/optimizing-linux-memory-managem ent-low-latency-high-throughput-databases where LinkedIn ended up disabling some of the new kernel features to prevent memory thrashing. Search for Transparent Huge Pages.. RHEL7 has these now disabled by default - LinkedIn are using GraphDB which is a log-structured system. Paul On 09/09/2015 10:54, "ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of Jan Schermer" <ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of jan@xxxxxxxxxxx> wrote: >I looked at THP before. It comes enabled on RHEL6 and on our KVM hosts it >merges a lot (~300GB hugepages on a 400GB KVM footprint). >I am probably going to disable it and see if it introduces any problems >for me - the most important gain here is better processor memory lookup >table (cache) utilization where it considerably lowers the number of >entries. Not sure how it affects different workloads - HPC guys should >have a good idea? I can only evaluate the effect on OSDs and KVM, but the >problem is that going over the cache limit even by a tiny bit can have >huge impact - theoretically... > >This issue sounds strange, though. THP should kick in and defrag/remerge >the pages that are part-empty. Maybe it's just not aggressive enough? >Does the "free" memory show as used (part of RSS of the process using the >page)? I guess not because there might be more processes with memory in >the same hugepage. > >This might actually partially explain the pagecache problem I mentioned >there about a week ago (slow OSD startup), maybe kswapd is what has to do >the work and defrag the pages when memory pressure is high! > >I'll try to test it somehow, hopefully then there will be cake. > >Jan > >> On 09 Sep 2015, at 07:08, Alexandre DERUMIER <aderumier@xxxxxxxxx> >>wrote: >> >> They are a tracker here >> >> https://github.com/jemalloc/jemalloc/issues/243 >> "Improve interaction with transparent huge pages" >> >> >> >> ----- Mail original ----- >> De: "aderumier" <aderumier@xxxxxxxxx> >> À: "Sage Weil" <sweil@xxxxxxxxxx> >> Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, "ceph-users" >><ceph-users@xxxxxxxxxxxxxx> >> Envoyé: Mercredi 9 Septembre 2015 06:37:22 >> Objet: Re: [ceph-users] jemalloc and transparent hugepage >> >>>> Is this something we can set with mallctl[1] at startup? >> >> I don't think it's possible. >> >> TP hugepage are managed by kernel, not jemalloc. >> >> (but a simple "echo never > >>/sys/kernel/mm/transparent_hugepage/enabled" in init script is enough) >> >> ----- Mail original ----- >> De: "Sage Weil" <sweil@xxxxxxxxxx> >> À: "aderumier" <aderumier@xxxxxxxxx> >> Cc: "Mark Nelson" <mnelson@xxxxxxxxxx>, "ceph-devel" >><ceph-devel@xxxxxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, >>"Somnath Roy" <somnath.roy@xxxxxxxxxxx> >> Envoyé: Mercredi 9 Septembre 2015 04:07:59 >> Objet: Re: [ceph-users] jemalloc and transparent hugepage >> >> On Wed, 9 Sep 2015, Alexandre DERUMIER wrote: >>>>> Have you noticed any performance difference with tp=never? >>> >>> No difference. >>> >>> I think hugepage could speedup big memory sets like 100-200GB, but for >>> 1-2GB they are no noticable difference. >> >> Is this something we can set with mallctl[1] at startup? >> >> sage >> >> [1] >>http://www.canonware.com/download/jemalloc/jemalloc-latest/doc/jemalloc.h >>tml >> >>> >>> >>> >>> >>> >>> >>> ----- Mail original ----- >>> De: "Mark Nelson" <mnelson@xxxxxxxxxx> >>> À: "aderumier" <aderumier@xxxxxxxxx>, "ceph-devel" >>><ceph-devel@xxxxxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx> >>> Cc: "Somnath Roy" <somnath.roy@xxxxxxxxxxx> >>> Envoyé: Mercredi 9 Septembre 2015 01:49:35 >>> Objet: Re: [ceph-users] jemalloc and transparent hugepage >>> >>> Excellent investigation Alexandre! Have you noticed any performance >>> difference with tp=never? >>> >>> Mark >>> >>> On 09/08/2015 06:33 PM, Alexandre DERUMIER wrote: >>>> I have done small benchmark with tcmalloc and jemalloc, transparent >>>>hugepage=always|never. >>>> >>>> for tcmalloc, they are no difference. >>>> but for jemalloc, the difference is huge (around 25% lower with >>>>tp=never). >>>> >>>> jemmaloc 4.6.0+tp=never vs tcmalloc use 10% more RSS memory >>>> >>>> jemmaloc 4.0+tp=never almost use same RSS memory than tcmalloc ! >>>> >>>> >>>> I don't have monitored memory usage in recovery, but I think it >>>>should help too. >>>> >>>> >>>> >>>> >>>> tcmalloc 2.1 tp=always >>>> ------------------- >>>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >>>> >>>> root 67746 120 1.0 1531220 671152 ? Ssl 01:18 0:43 /usr/bin/ceph-osd >>>>--cluster=ceph -i 0 -f >>>> root 67764 144 1.0 1570256 711232 ? Ssl 01:18 0:51 /usr/bin/ceph-osd >>>>--cluster=ceph -i 1 -f >>>> >>>> root 68363 220 0.9 1522292 655888 ? Ssl 01:19 0:46 /usr/bin/ceph-osd >>>>--cluster=ceph -i 0 -f >>>> root 68381 261 1.0 1563396 702500 ? Ssl 01:19 0:55 /usr/bin/ceph-osd >>>>--cluster=ceph -i 1 -f >>>> >>>> root 68963 228 1.0 1519240 666196 ? Ssl 01:20 0:31 /usr/bin/ceph-osd >>>>--cluster=ceph -i 0 -f >>>> root 68981 268 1.0 1564452 694352 ? Ssl 01:20 0:37 /usr/bin/ceph-osd >>>>--cluster=ceph -i 1 -f >>>> >>>> >>>> >>>> tcmalloc 2.1 tp=never >>>> ----------------- >>>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >>>> >>>> root 69560 144 1.0 1544968 677584 ? Ssl 01:21 0:20 /usr/bin/ceph-osd >>>>--cluster=ceph -i 0 -f >>>> root 69578 167 1.0 1568620 704456 ? Ssl 01:21 0:23 /usr/bin/ceph-osd >>>>--cluster=ceph -i 1 -f >>>> >>>> >>>> root 70156 164 0.9 1519680 649776 ? Ssl 01:21 0:16 /usr/bin/ceph-osd >>>>--cluster=ceph -i 0 -f >>>> root 70174 214 1.0 1559772 692828 ? Ssl 01:21 0:19 /usr/bin/ceph-osd >>>>--cluster=ceph -i 1 -f >>>> >>>> root 70757 202 0.9 1520376 650572 ? Ssl 01:22 0:20 /usr/bin/ceph-osd >>>>--cluster=ceph -i 0 -f >>>> root 70775 236 1.0 1560644 694088 ? Ssl 01:22 0:23 /usr/bin/ceph-osd >>>>--cluster=ceph -i 1 -f >>>> >>>> >>>> >>>> jemalloc 3.6 tp = always >>>> ------------------------ >>>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >>>> >>>> root 92005 46.1 1.4 2033864 967512 ? Ssl 01:00 0:04 /usr/bin/ceph-osd >>>>--cluster=ceph -i 5 -f >>>> root 92027 45.5 1.4 2021624 963536 ? Ssl 01:00 0:04 /usr/bin/ceph-osd >>>>--cluster=ceph -i 4 -f >>>> >>>> >>>> >>>> root 92703 191 1.5 2138724 1002376 ? Ssl 01:02 1:16 /usr/bin/ceph-osd >>>>--cluster=ceph -i 5 -f >>>> root 92721 183 1.5 2126228 986448 ? Ssl 01:02 1:13 /usr/bin/ceph-osd >>>>--cluster=ceph -i 4 -f >>>> >>>> >>>> root 93366 258 1.4 2139052 984132 ? Ssl 01:03 1:09 /usr/bin/ceph-osd >>>>--cluster=ceph -i 5 -f >>>> root 93384 250 1.5 2126244 990348 ? Ssl 01:03 1:07 /usr/bin/ceph-osd >>>>--cluster=ceph -i 4 -f >>>> >>>> >>>> >>>> jemalloc 3.6 tp = never >>>> ----------------------- >>>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >>>> >>>> root 93990 238 1.1 2105812 762628 ? Ssl 01:04 1:16 /usr/bin/ceph-osd >>>>--cluster=ceph -i 4 -f >>>> root 94033 263 1.1 2118288 781768 ? Ssl 01:04 1:18 /usr/bin/ceph-osd >>>>--cluster=ceph -i 5 -f >>>> >>>> >>>> root 94656 266 1.1 2139096 781392 ? Ssl 01:05 0:58 /usr/bin/ceph-osd >>>>--cluster=ceph -i 5 -f >>>> root 94674 257 1.1 2126316 760632 ? Ssl 01:05 0:56 /usr/bin/ceph-osd >>>>--cluster=ceph -i 4 -f >>>> >>>> root 95317 297 1.1 2135044 780532 ? Ssl 01:06 0:35 /usr/bin/ceph-osd >>>>--cluster=ceph -i 5 -f >>>> root 95335 284 1.1 2112016 760972 ? Ssl 01:06 0:34 /usr/bin/ceph-osd >>>>--cluster=ceph -i 4 -f >>>> >>>> >>>> >>>> jemalloc 4.0 tp = always >>>> ------------------------ >>>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >>>> >>>> root 100275 198 1.3 1784520 880288 ? Ssl 01:14 0:45 /usr/bin/ceph-osd >>>>--cluster=ceph -i 4 -f >>>> root 100320 239 1.1 1793184 760824 ? Ssl 01:14 0:47 /usr/bin/ceph-osd >>>>--cluster=ceph -i 5 -f >>>> >>>> >>>> root 100897 200 1.3 1765780 891256 ? Ssl 01:15 0:50 /usr/bin/ceph-osd >>>>--cluster=ceph -i 4 -f >>>> root 100942 245 1.1 1817436 746956 ? Ssl 01:15 0:53 /usr/bin/ceph-osd >>>>--cluster=ceph -i 5 -f >>>> >>>> root 101517 196 1.3 1769904 877132 ? Ssl 01:16 0:33 /usr/bin/ceph-osd >>>>--cluster=ceph -i 4 -f >>>> root 101562 258 1.1 1805172 746532 ? Ssl 01:16 0:36 /usr/bin/ceph-osd >>>>--cluster=ceph -i 5 -f >>>> >>>> >>>> jemalloc 4.0 tp = never >>>> ----------------------- >>>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >>>> >>>> root 98362 87.8 1.0 1841748 678848 ? Ssl 01:10 0:53 /usr/bin/ceph-osd >>>>--cluster=ceph -i 4 -f >>>> root 98405 97.0 1.0 1846328 699620 ? Ssl 01:10 0:56 /usr/bin/ceph-osd >>>>--cluster=ceph -i 5 -f >>>> >>>> >>>> >>>> root 99018 233 1.0 1812580 698848 ? Ssl 01:12 0:30 /usr/bin/ceph-osd >>>>--cluster=ceph -i 5 -f >>>> root 99036 226 1.0 1822344 677420 ? Ssl 01:12 0:29 /usr/bin/ceph-osd >>>>--cluster=ceph -i 4 -f >>>> >>>> root 99666 281 1.0 1814640 696420 ? Ssl 01:13 0:33 /usr/bin/ceph-osd >>>>--cluster=ceph -i 5 -f >>>> root 99684 266 1.0 1835676 676768 ? Ssl 01:13 0:32 /usr/bin/ceph-osd >>>>--cluster=ceph -i 4 -f >>>> >>>> >>>> >>>> >>>> ----- Mail original ----- >>>> De: "aderumier" <aderumier@xxxxxxxxx> >>>> À: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, "ceph-users" >>>><ceph-users@xxxxxxxxxxxxxx> >>>> Envoyé: Mardi 8 Septembre 2015 21:42:35 >>>> Objet: [ceph-users] jemalloc and transparent hugepage >>>> >>>> Hi, >>>> I have found an interesting article about jemalloc and transparent >>>>hugepages >>>> >>>> >>>>https://www.digitalocean.com/company/blog/transparent-huge-pages-and-al >>>>ternative-memory-allocators/ >>>> >>>> >>>> Could be great to see if disable transparent hugepage help to have >>>>lower jemalloc memory usage. >>>> >>>> >>>> Regards, >>>> >>>> Alexandre >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >-- >To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >the body of a message to majordomo@xxxxxxxxxxxxxxx >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html