This is great, thank you! Jan > On 09 Sep 2015, at 12:37, HEWLETT, Paul (Paul) <paul.hewlett@xxxxxxxxxxxxxxxxxx> wrote: > > Hi Jan > > If I can suggest that you look at: > > http://engineering.linkedin.com/performance/optimizing-linux-memory-managem > ent-low-latency-high-throughput-databases > > > where LinkedIn ended up disabling some of the new kernel features to > prevent memory thrashing. > Search for Transparent Huge Pages.. > > RHEL7 has these now disabled by default - LinkedIn are using GraphDB which > is a log-structured system. > > Paul > > On 09/09/2015 10:54, "ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of Jan > Schermer" <ceph-devel-owner@xxxxxxxxxxxxxxx on behalf of jan@xxxxxxxxxxx> > wrote: > >> I looked at THP before. It comes enabled on RHEL6 and on our KVM hosts it >> merges a lot (~300GB hugepages on a 400GB KVM footprint). >> I am probably going to disable it and see if it introduces any problems >> for me - the most important gain here is better processor memory lookup >> table (cache) utilization where it considerably lowers the number of >> entries. Not sure how it affects different workloads - HPC guys should >> have a good idea? I can only evaluate the effect on OSDs and KVM, but the >> problem is that going over the cache limit even by a tiny bit can have >> huge impact - theoretically... >> >> This issue sounds strange, though. THP should kick in and defrag/remerge >> the pages that are part-empty. Maybe it's just not aggressive enough? >> Does the "free" memory show as used (part of RSS of the process using the >> page)? I guess not because there might be more processes with memory in >> the same hugepage. >> >> This might actually partially explain the pagecache problem I mentioned >> there about a week ago (slow OSD startup), maybe kswapd is what has to do >> the work and defrag the pages when memory pressure is high! >> >> I'll try to test it somehow, hopefully then there will be cake. >> >> Jan >> >>> On 09 Sep 2015, at 07:08, Alexandre DERUMIER <aderumier@xxxxxxxxx> >>> wrote: >>> >>> They are a tracker here >>> >>> https://github.com/jemalloc/jemalloc/issues/243 >>> "Improve interaction with transparent huge pages" >>> >>> >>> >>> ----- Mail original ----- >>> De: "aderumier" <aderumier@xxxxxxxxx> >>> À: "Sage Weil" <sweil@xxxxxxxxxx> >>> Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, "ceph-users" >>> <ceph-users@xxxxxxxxxxxxxx> >>> Envoyé: Mercredi 9 Septembre 2015 06:37:22 >>> Objet: Re: [ceph-users] jemalloc and transparent hugepage >>> >>>>> Is this something we can set with mallctl[1] at startup? >>> >>> I don't think it's possible. >>> >>> TP hugepage are managed by kernel, not jemalloc. >>> >>> (but a simple "echo never > >>> /sys/kernel/mm/transparent_hugepage/enabled" in init script is enough) >>> >>> ----- Mail original ----- >>> De: "Sage Weil" <sweil@xxxxxxxxxx> >>> À: "aderumier" <aderumier@xxxxxxxxx> >>> Cc: "Mark Nelson" <mnelson@xxxxxxxxxx>, "ceph-devel" >>> <ceph-devel@xxxxxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, >>> "Somnath Roy" <somnath.roy@xxxxxxxxxxx> >>> Envoyé: Mercredi 9 Septembre 2015 04:07:59 >>> Objet: Re: [ceph-users] jemalloc and transparent hugepage >>> >>> On Wed, 9 Sep 2015, Alexandre DERUMIER wrote: >>>>>> Have you noticed any performance difference with tp=never? >>>> >>>> No difference. >>>> >>>> I think hugepage could speedup big memory sets like 100-200GB, but for >>>> 1-2GB they are no noticable difference. >>> >>> Is this something we can set with mallctl[1] at startup? >>> >>> sage >>> >>> [1] >>> http://www.canonware.com/download/jemalloc/jemalloc-latest/doc/jemalloc.h >>> tml >>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ----- Mail original ----- >>>> De: "Mark Nelson" <mnelson@xxxxxxxxxx> >>>> À: "aderumier" <aderumier@xxxxxxxxx>, "ceph-devel" >>>> <ceph-devel@xxxxxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx> >>>> Cc: "Somnath Roy" <somnath.roy@xxxxxxxxxxx> >>>> Envoyé: Mercredi 9 Septembre 2015 01:49:35 >>>> Objet: Re: [ceph-users] jemalloc and transparent hugepage >>>> >>>> Excellent investigation Alexandre! Have you noticed any performance >>>> difference with tp=never? >>>> >>>> Mark >>>> >>>> On 09/08/2015 06:33 PM, Alexandre DERUMIER wrote: >>>>> I have done small benchmark with tcmalloc and jemalloc, transparent >>>>> hugepage=always|never. >>>>> >>>>> for tcmalloc, they are no difference. >>>>> but for jemalloc, the difference is huge (around 25% lower with >>>>> tp=never). >>>>> >>>>> jemmaloc 4.6.0+tp=never vs tcmalloc use 10% more RSS memory >>>>> >>>>> jemmaloc 4.0+tp=never almost use same RSS memory than tcmalloc ! >>>>> >>>>> >>>>> I don't have monitored memory usage in recovery, but I think it >>>>> should help too. >>>>> >>>>> >>>>> >>>>> >>>>> tcmalloc 2.1 tp=always >>>>> ------------------- >>>>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >>>>> >>>>> root 67746 120 1.0 1531220 671152 ? Ssl 01:18 0:43 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 0 -f >>>>> root 67764 144 1.0 1570256 711232 ? Ssl 01:18 0:51 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 1 -f >>>>> >>>>> root 68363 220 0.9 1522292 655888 ? Ssl 01:19 0:46 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 0 -f >>>>> root 68381 261 1.0 1563396 702500 ? Ssl 01:19 0:55 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 1 -f >>>>> >>>>> root 68963 228 1.0 1519240 666196 ? Ssl 01:20 0:31 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 0 -f >>>>> root 68981 268 1.0 1564452 694352 ? Ssl 01:20 0:37 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 1 -f >>>>> >>>>> >>>>> >>>>> tcmalloc 2.1 tp=never >>>>> ----------------- >>>>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >>>>> >>>>> root 69560 144 1.0 1544968 677584 ? Ssl 01:21 0:20 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 0 -f >>>>> root 69578 167 1.0 1568620 704456 ? Ssl 01:21 0:23 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 1 -f >>>>> >>>>> >>>>> root 70156 164 0.9 1519680 649776 ? Ssl 01:21 0:16 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 0 -f >>>>> root 70174 214 1.0 1559772 692828 ? Ssl 01:21 0:19 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 1 -f >>>>> >>>>> root 70757 202 0.9 1520376 650572 ? Ssl 01:22 0:20 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 0 -f >>>>> root 70775 236 1.0 1560644 694088 ? Ssl 01:22 0:23 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 1 -f >>>>> >>>>> >>>>> >>>>> jemalloc 3.6 tp = always >>>>> ------------------------ >>>>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >>>>> >>>>> root 92005 46.1 1.4 2033864 967512 ? Ssl 01:00 0:04 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 5 -f >>>>> root 92027 45.5 1.4 2021624 963536 ? Ssl 01:00 0:04 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 4 -f >>>>> >>>>> >>>>> >>>>> root 92703 191 1.5 2138724 1002376 ? Ssl 01:02 1:16 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 5 -f >>>>> root 92721 183 1.5 2126228 986448 ? Ssl 01:02 1:13 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 4 -f >>>>> >>>>> >>>>> root 93366 258 1.4 2139052 984132 ? Ssl 01:03 1:09 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 5 -f >>>>> root 93384 250 1.5 2126244 990348 ? Ssl 01:03 1:07 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 4 -f >>>>> >>>>> >>>>> >>>>> jemalloc 3.6 tp = never >>>>> ----------------------- >>>>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >>>>> >>>>> root 93990 238 1.1 2105812 762628 ? Ssl 01:04 1:16 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 4 -f >>>>> root 94033 263 1.1 2118288 781768 ? Ssl 01:04 1:18 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 5 -f >>>>> >>>>> >>>>> root 94656 266 1.1 2139096 781392 ? Ssl 01:05 0:58 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 5 -f >>>>> root 94674 257 1.1 2126316 760632 ? Ssl 01:05 0:56 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 4 -f >>>>> >>>>> root 95317 297 1.1 2135044 780532 ? Ssl 01:06 0:35 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 5 -f >>>>> root 95335 284 1.1 2112016 760972 ? Ssl 01:06 0:34 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 4 -f >>>>> >>>>> >>>>> >>>>> jemalloc 4.0 tp = always >>>>> ------------------------ >>>>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >>>>> >>>>> root 100275 198 1.3 1784520 880288 ? Ssl 01:14 0:45 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 4 -f >>>>> root 100320 239 1.1 1793184 760824 ? Ssl 01:14 0:47 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 5 -f >>>>> >>>>> >>>>> root 100897 200 1.3 1765780 891256 ? Ssl 01:15 0:50 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 4 -f >>>>> root 100942 245 1.1 1817436 746956 ? Ssl 01:15 0:53 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 5 -f >>>>> >>>>> root 101517 196 1.3 1769904 877132 ? Ssl 01:16 0:33 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 4 -f >>>>> root 101562 258 1.1 1805172 746532 ? Ssl 01:16 0:36 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 5 -f >>>>> >>>>> >>>>> jemalloc 4.0 tp = never >>>>> ----------------------- >>>>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND >>>>> >>>>> root 98362 87.8 1.0 1841748 678848 ? Ssl 01:10 0:53 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 4 -f >>>>> root 98405 97.0 1.0 1846328 699620 ? Ssl 01:10 0:56 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 5 -f >>>>> >>>>> >>>>> >>>>> root 99018 233 1.0 1812580 698848 ? Ssl 01:12 0:30 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 5 -f >>>>> root 99036 226 1.0 1822344 677420 ? Ssl 01:12 0:29 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 4 -f >>>>> >>>>> root 99666 281 1.0 1814640 696420 ? Ssl 01:13 0:33 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 5 -f >>>>> root 99684 266 1.0 1835676 676768 ? Ssl 01:13 0:32 /usr/bin/ceph-osd >>>>> --cluster=ceph -i 4 -f >>>>> >>>>> >>>>> >>>>> >>>>> ----- Mail original ----- >>>>> De: "aderumier" <aderumier@xxxxxxxxx> >>>>> À: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx>, "ceph-users" >>>>> <ceph-users@xxxxxxxxxxxxxx> >>>>> Envoyé: Mardi 8 Septembre 2015 21:42:35 >>>>> Objet: [ceph-users] jemalloc and transparent hugepage >>>>> >>>>> Hi, >>>>> I have found an interesting article about jemalloc and transparent >>>>> hugepages >>>>> >>>>> >>>>> https://www.digitalocean.com/company/blog/transparent-huge-pages-and-al >>>>> ternative-memory-allocators/ >>>>> >>>>> >>>>> Could be great to see if disable transparent hugepage help to have >>>>> lower jemalloc memory usage. >>>>> >>>>> >>>>> Regards, >>>>> >>>>> Alexandre >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>>> in >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >>>> in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html