Re: page allocation failures on osd nodes

Sam Lang <sam.lang@xxxxxxxxxxx> · Fri, 25 Jan 2013 17:40:13 -0600

On Fri, Jan 25, 2013 at 10:07 AM, Andrey Korolyov <andrey@xxxxxxx> wrote:
> Sorry, I have written too less yesterday because of being sleepy.
> That`s obviously a cache pressure since dropping caches resulted in
> disappearance of this errors for a long period. I`m not very familiar
> with kernel memory mechanisms, but shouldn`t kernel try to allocate
> memory on the second node if this not prohibited by process` cpuset
> first and only then report allocation failure(as can be seen only node
> 0 involved in the failures)? I really have no idea where
> numa-awareness may be count in case of osd daemons.

Hi Andrey,

You said that the allocation failure doesn't occur if you flush
caches, but the kernel should evict pages from the cache as needed so
that the osd can allocate more memory (unless their dirty, but it
doesn't look like you have many dirty pages in this case).  It looks
like you have plenty of reclaimable pages as well.  Does the osd
remain running after that error occurs?

I wonder if you see the same error if you do a long write intensive
workload on the local disk for the osd in question, maybe dd
if=/dev/zero of=/data/osd.0/foo

-sam

>
> On Fri, Jan 25, 2013 at 2:42 AM, Andrey Korolyov <andrey@xxxxxxx> wrote:
>> Hi,
>>
>> Those traces happens only constant high constant writes and seems to
>> be very rarely. OSD processes do not consume more memory after this
>> event and peaks are not distinguishable by monitoring. I have able to
>> catch it having four-hour constant writes on the cluster.
>>
>> http://xdel.ru/downloads/ceph-log/allocation-failure/
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html