On Fri, Jan 25, 2013 at 10:07 AM, Andrey Korolyov <andrey@xxxxxxx> wrote: > Sorry, I have written too less yesterday because of being sleepy. > That`s obviously a cache pressure since dropping caches resulted in > disappearance of this errors for a long period. I`m not very familiar > with kernel memory mechanisms, but shouldn`t kernel try to allocate > memory on the second node if this not prohibited by process` cpuset > first and only then report allocation failure(as can be seen only node > 0 involved in the failures)? I really have no idea where > numa-awareness may be count in case of osd daemons. Hi Andrey, You said that the allocation failure doesn't occur if you flush caches, but the kernel should evict pages from the cache as needed so that the osd can allocate more memory (unless their dirty, but it doesn't look like you have many dirty pages in this case). It looks like you have plenty of reclaimable pages as well. Does the osd remain running after that error occurs? I wonder if you see the same error if you do a long write intensive workload on the local disk for the osd in question, maybe dd if=/dev/zero of=/data/osd.0/foo -sam > > On Fri, Jan 25, 2013 at 2:42 AM, Andrey Korolyov <andrey@xxxxxxx> wrote: >> Hi, >> >> Those traces happens only constant high constant writes and seems to >> be very rarely. OSD processes do not consume more memory after this >> event and peaks are not distinguishable by monitoring. I have able to >> catch it having four-hour constant writes on the cluster. >> >> http://xdel.ru/downloads/ceph-log/allocation-failure/ > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html