Re: [PATCH] Expose Ceph data location information to Hadoop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 30 Nov 2010, Noah Watkins wrote:
> > After writing this code, I do like seeing the words "scrap" and "JNI" 
> > so close in the same sentence.  That's more up to the Hadoop 
> > community, though; I don't know how well-accepted JNA is in their code 
> > base.
> One solution might be a lazily populated sysfs interface for retrieving 
> object information for a given file, circumventing the problem Java has 
> calling IOCTLs. But that's another conversation.

Yeah, not pretty.  A shared library wouldn't really help here either, 
right?  And a command-line tool means additional overhead.  JNI (or 
equivalent) calling an ioctl seems like the most appropriate tool.

> > The ioctl struct ceph_ioctl_dataloc already returns the primary copy's 
> > object offset for an input file offset, though I think it would be a 
> > little more useful if it included replica offsets.
> I can submit a patch for this. Sage, I remember you mentioning that 
> reading from replicas might pose (scalability?) problems. Any thoughts 
> on this?

There are two things.  First, we'd need a DATALOC_V2 ioctl that would 
return locations for all replicas of the object.  Is Hadoop smart about 
scheduling jobs on the best replica?

The second part is be able to read from them.  In general, sending any/all 
reads to a random replica does bad things to your cache.  In principle, 
it's possible, though, at least when a file is only opened for read (at 
that point all replicas are known consistent on disk).  Someone suggested 
on IRC a while back that in such a case we have a check to read from a 
non-primary replica if that replica happens to be on the local node.  That 
sort of optimization would work in this case.  A number of changes in the 
OSD and client will be needed, but nothing too invasive (I think!).

Before going to the trouble, though, I want to make sure we'll really 
benefit from all of that...

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux