Re: Cephfs/Hadoop/HBase

Mike Bryant <mike.bryant@xxxxxxxxx> · Fri, 10 May 2013 13:23:41 +0100

Investigating further, there seems to be a large number of inodes with
caps, many of which are actually unlinked from the filesystem.
2013-05-10 13:04:11.270365 7f2d7f349700  2 mds.0.cache
check_memory_usage total 306000, rss 90624, heap 143444, malloc 53053
mmap 0, baseline 131152, buffers 0, max 1048576, 3012 / 3537 inodes
have caps, 4222 caps, 1.19367 caps per inode

The number of inodes with caps is steadily increasing, and keeping
pace with the total number of inodes.
The majority of them seem to be ending up in this state:

2013-05-10 12:38:41.871531 7f2d8154f700 10 mds.0.locker  wanted pFscr -> Fc
2013-05-10 12:38:41.871535 7f2d8154f700 10 mds.0.locker _do_cap_update
dirty - issued pAsXsFsxcrwb wanted Fc on [inode
 1000000f573 [2,head] ~mds0/stray8/1000000f573 auth v271004 s=2639959
nl=0 n(v0 b2639959 1=1+0) (ifile excl) (iversion
 lock) cr={789359=0-67108864@1}
caps={789359=pAsXsFsxcrwb/Fc@10},l=789359 | ptrwaiter=0 request=0
lock=0 caps=1 dirty=
1 authpin=0 0x66e0580]

After that they don't seem to appear in the logfile, they just get
forgotten about. (Presumably until the client would actually drop the
cache.)

I've just found this bug report though: http://tracker.ceph.com/issues/3601
Looks like that may be the same issue..

Mike

On 10 May 2013 08:31, Mike Bryant <mike.bryant@xxxxxxxxx> wrote:
> Mhm, if that was the case I would expect it to be deleting things over time.
> On one occurrence for example the data pool reached 160GB after 3 or 4 days,
> with a reported usage in cephfs of 12GB. Within minutes of my stopping the
> clients, the data pool dropped by over 140GB.
> I suspect the filehandles aren't being closed correctly, somewhere within
> hbase, hadoop-cephfs.jar, or the ceph java bindings.
> Adding some debug to the cephfs jar, I was able to match up all opens with
> either a current existing file in cephfs, or a matching close.
> I think I need some debug from the mds to find out why it's keeping the
> objects alive, but I'm not sure which messages I should be looking at.
>
> Mike
>
>
> On 9 May 2013 19:31, Noah Watkins <noah.watkins@xxxxxxxxxxx> wrote:
>>
>> Mike,
>>
>> I'm guessing that HBase is creating and deleting its blocks, but that the
>> deletes are delayed:
>>
>>   http://ceph.com/docs/master/dev/delayed-delete/
>>
>> which would explain the correct reporting at the file system level, but
>> not the actual 'data' pool. I'm not as familiar with this level of detail,
>> and copied Greg who can probably answer easily.
>>
>> Thanks,
>> Noah
>>
>> On May 9, 2013, at 4:29 AM, Mike Bryant <mike.bryant@xxxxxxxxx> wrote:
>>
>> > Hi,
>> > I'm experimenting with running hbase using the hadoop-ceph java
>> > filesystem implementation, and I'm having an issue with space usage.
>> >
>> > With the hbase daemons running, The amount of data in the 'data' pool
>> > grows continuously, at a much higher rate than expected. Doing a du,
>> > or ls -lh on a mounted copy shows a usage of ~16GB. But the data pool
>> > has grown to consume ~160GB at times. When I restart the daemons,
>> > shortly thereafter the data pool shrinks rapidly. If I restart all of
>> > them it comes down to match the actual space usage.
>> >
>> > My current hypothesis is that the MDS isn't deleting the objects for
>> > some reason, possibly because there's still an open filehandle?
>> >
>> > My question is, how can I get a report from the MDS on which objects
>> > aren't visible from the filesystem / why it's not deleted them yet /
>> > what open filehandles there are etc.
>> >
>> > Cheers
>> > Mike
>> >
>> > --
>> > Mike Bryant | Systems Administrator | Ocado Technology
>> > mike.bryant@xxxxxxxxx | 01707 382148 | www.ocado.com
>> >
>> > --
>> > Notice:  This email is confidential and may contain copyright material
>> > of
>> > Ocado Limited (the "Company"). Opinions and views expressed in this
>> > message
>> > may not necessarily reflect the opinions and views of the Company.
>> >
>> > If you are not the intended recipient, please notify us immediately and
>> > delete all copies of this message. Please note that it is your
>> > responsibility to scan this message for viruses.
>> >
>> > Company reg. no. 3875000.
>> >
>> > Ocado Limited
>> > Titan Court
>> > 3 Bishops Square
>> > Hatfield Business Park
>> > Hatfield
>> > Herts
>> > AL10 9NE
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
>
> --
> Mike Bryant | Systems Administrator | Ocado Technology
> mike.bryant@xxxxxxxxx | 01707 382148 | www.ocado.com
>

--
Mike Bryant | Systems Administrator | Ocado Technology
mike.bryant@xxxxxxxxx | 01707 382148 | www.ocado.com

-- 
Notice:  This email is confidential and may contain copyright material of 
Ocado Limited (the "Company"). Opinions and views expressed in this message 
may not necessarily reflect the opinions and views of the Company.

If you are not the intended recipient, please notify us immediately and 
delete all copies of this message. Please note that it is your 
responsibility to scan this message for viruses.

Company reg. no. 3875000.

Ocado Limited
Titan Court
3 Bishops Square
Hatfield Business Park
Hatfield
Herts
AL10 9NE
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com