Re: OSD daemon changes port no

hemant surale <hemant.surale@xxxxxxxxx> · Fri, 30 Nov 2012 11:47:47 +0530

Hi Sage,Community ,
   I am unable to use 2 directories to direct data to 2 different
pools. I did following expt.

Created 2 pool "host" & "ghost" to seperate data placement .
--------------------------------------------------//crushmap file
-------------------------------------------------------
# begin crush map

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 pool
type 7 ghost

# buckets
host hemantone-mirror-virtual-machine {
        id -6           # do not change unnecessarily
        # weight 1.000
        alg straw
        hash 0  # rjenkins1
        item osd.2 weight 1.000
}
host hemantone-virtual-machine {
        id -7           # do not change unnecessarily
        # weight 1.000
        alg straw
        hash 0  # rjenkins1
        item osd.1 weight 1.000
}
rack one {
        id -2           # do not change unnecessarily
        # weight 2.000
        alg straw
        hash 0  # rjenkins1
        item hemantone-mirror-virtual-machine weight 1.000
        item hemantone-virtual-machine weight 1.000
}
ghost hemant-virtual-machine {
        id -4           # do not change unnecessarily
        # weight 1.000
        alg straw
        hash 0  # rjenkins1
        item osd.0 weight 1.000
}
ghost hemant-mirror-virtual-machine {
        id -5           # do not change unnecessarily
        # weight 1.000
        alg straw
        hash 0  # rjenkins1
        item osd.3 weight 1.000
}
rack two {
        id -3           # do not change unnecessarily
        # weight 2.000
        alg straw
        hash 0  # rjenkins1
        item hemant-virtual-machine weight 1.000
        item hemant-mirror-virtual-machine weight 1.000
}
pool default {
        id -1           # do not change unnecessarily
        # weight 4.000
        alg straw
        hash 0  # rjenkins1
        item one weight 2.000
        item two weight 2.000
}

# rules
rule data {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step take one
        step chooseleaf firstn 0 type host
        step emit
}
rule metadata {
        ruleset 1
        type replicated
        min_size 1
        max_size 10
        step take default
        step take one
        step chooseleaf firstn 0 type host
        step emit
}
rule rbd {
        ruleset 2
        type replicated
        min_size 1
        max_size 10
        step take default
        step take one
        step chooseleaf firstn 0 type host
        step emit
}
rule forhost {
        ruleset 3
        type replicated
        min_size 1
        max_size 10
        step take default
        step take one
        step chooseleaf firstn 0 type host
        step emit
}
rule forghost {
        ruleset 4
        type replicated
        min_size 1
        max_size 10
        step take default
        step take two
        step chooseleaf firstn 0 type ghost
        step emit
}

# end crush map
------------------------------------------------------------------------------------------------------------------------
1) set replication factor to 2. and crushrule accordingly . ( "host"
got crush_ruleset = 3 & "ghost" pool got  crush_ruleset = 4).
2) Now I mounted data to dir.  using "mount.ceph 10.72.148.245:6789:/
/home/hemant/x"   & "mount.ceph 10.72.148.245:6789:/ /home/hemant/y"
3) then "mds add_data_pool 5"  & "mds add_data_pool 6"  ( here pool id
are host = 5, ghost = 6)
4) "cephfs /home/hemant/x set_layout --pool 5 -c 1 -u 4194304 -s
4194304"  & "cephfs /home/hemant/y set_layout --pool 6 -c 1 -u 4194304
-s 4194304"

PROBLEM:
 $ cephfs /home/hemant/x show_layout
layout.data_pool:     6
layout.object_size:   4194304
layout.stripe_unit:   4194304
layout.stripe_count:  1
cephfs /home/hemant/y show_layout
layout.data_pool:     6
layout.object_size:   4194304
layout.stripe_unit:   4194304
layout.stripe_count:  1

Both dir are using same pool to place data even after I specified to
use separate using "cephfs" cmd.
Please help me figure this out.

-
Hemant Surale.

On Thu, Nov 29, 2012 at 3:45 PM, hemant surale <hemant.surale@xxxxxxxxx> wrote:
>>> does 'ceph mds dump' list pool 3 in teh data_pools line?
>
> Yes. It lists the desired poolids I wanted to put data in.
>
>
> ---------- Forwarded message ----------
> From: hemant surale <hemant.surale@xxxxxxxxx>
> Date: Thu, Nov 29, 2012 at 2:59 PM
> Subject: Re: OSD daemon changes port no
> To: Sage Weil <sage@xxxxxxxxxxx>
>
>
> I used a little different version of "cephfs" as "cephfs
> /home/hemant/a set_layout --pool 3 -c 1 -u  4194304 -s  4194304"
>  and "cephfs /home/hemant/b set_layout --pool 5 -c 1 -u  4194304 -s  4194304".
>
>
> Now cmd didnt showed any error but When I put data to dir "a" & "b"
> ideally it should go to different pool but its not working as of now.
> Whatever I am doing is it possible (to use 2 dir pointing to 2
> different pools for data placement) ?
>
>
>
> -
> Hemant Surale.
>
> On Tue, Nov 27, 2012 at 10:21 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>> On Tue, 27 Nov 2012, hemant surale wrote:
>>> I did "mkdir a " "chmod 777 a" . So dir "a" is /home/hemant/a" .
>>> then I used "mount.ceph 10.72.148.245:/ /ho
>>>
>>> root@hemantsec-virtual-machine:/home/hemant# cephfs /home/hemant/a
>>> set_layout --pool 3
>>> Error setting layout: Invalid argument
>>
>> does 'ceph mds dump' list pool 3 in teh data_pools line?
>>
>> sage
>>
>>>
>>> On Mon, Nov 26, 2012 at 9:56 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>>> > On Mon, 26 Nov 2012, hemant surale wrote:
>>> >> While I was using "cephfs" following error is observed -
>>> >> ------------------------------------------------------------------------------------------------
>>> >> root@hemantsec-virtual-machine:~# cephfs /mnt/ceph/a --pool 3
>>> >> invalid command
>>> >
>>> > Try
>>> >
>>> >  cephfs /mnt/ceph/a set_layout --pool 3
>>> >
>>> > (set_layout is the command)
>>> >
>>> > sage
>>> >
>>> >> usage: cephfs path command [options]*
>>> >> Commands:
>>> >>    show_layout    -- view the layout information on a file or dir
>>> >>    set_layout     -- set the layout on an empty file,
>>> >>                      or the default layout on a directory
>>> >>    show_location  -- view the location information on a file
>>> >> Options:
>>> >>    Useful for setting layouts:
>>> >>    --stripe_unit, -u:  set the size of each stripe
>>> >>    --stripe_count, -c: set the number of objects to stripe across
>>> >>    --object_size, -s:  set the size of the objects to stripe across
>>> >>    --pool, -p:         set the pool to use
>>> >>
>>> >>    Useful for getting location data:
>>> >>    --offset, -l:       the offset to retrieve location data for
>>> >>
>>> >> ------------------------------------------------------------------------------------------------
>>> >> It may be silly question but unable to figure it out.
>>> >>
>>> >> :(
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Wed, Nov 21, 2012 at 8:59 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>>> >> > On Wed, 21 Nov 2012, hemant surale wrote:
>>> >> >> > Oh I see.  Generally speaking, the only way to guarantee separation is to
>>> >> >> > put them in different pools and distribute the pools across different sets
>>> >> >> > of OSDs.
>>> >> >>
>>> >> >> yeah that was correct approach but i found problem doing so from
>>> >> >> abstract level i.e. when I put file inside mounted dir
>>> >> >> "/home/hemant/cephfs " ( mounted using "mount.ceph" cmd ) . At that
>>> >> >> time anyways ceph is going to use default pool data to store files (
>>> >> >> here files were striped into different objects and then sent to
>>> >> >> appropriate osd ) .
>>> >> >>    So how to tell ceph to use different pools in this case ?
>>> >> >>
>>> >> >> Goal : separate read and write operations , where read will be done
>>> >> >> from one group of OSD and write is done to other group of OSD.
>>> >> >
>>> >> > First create the other pool,
>>> >> >
>>> >> >  ceph osd pool create <name>
>>> >> >
>>> >> > and then adjust the CRUSH rule to distributed to a different set of OSDs
>>> >> > for that pool.
>>> >> >
>>> >> > To allow cephfs use it,
>>> >> >
>>> >> >  ceph mds add_data_pool <poolid>
>>> >> >
>>> >> > and then:
>>> >> >
>>> >> >  cephfs /mnt/ceph/foo --pool <poolid>
>>> >> >
>>> >> > will set the policy on the directory such that new files beneath that
>>> >> > point will be stored in a different pool.
>>> >> >
>>> >> > Hope that helps!
>>> >> > sage
>>> >> >
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> -
>>> >> >> Hemant Surale.
>>> >> >>
>>> >> >>
>>> >> >> On Wed, Nov 21, 2012 at 12:33 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>>> >> >> > On Wed, 21 Nov 2012, hemant surale wrote:
>>> >> >> >> Its a little confusing question I believe .
>>> >> >> >>
>>> >> >> >> Actually there are two files X & Y.  When I am reading X from its
>>> >> >> >> primary .I want to make sure simultaneous writing of Y should go to
>>> >> >> >> any other OSD except primary OSD for X (from where my current read is
>>> >> >> >> getting served ) .
>>> >> >> >
>>> >> >> > Oh I see.  Generally speaking, the only way to guarantee separation is to
>>> >> >> > put them in different pools and distribute the pools across different sets
>>> >> >> > of OSDs.  Otherwise, it's all (pseudo)random and you never know.  Usually,
>>> >> >> > they will be different, particularly as the cluster size increases, but
>>> >> >> > sometimes they will be the same.
>>> >> >> >
>>> >> >> > sage
>>> >> >> >
>>> >> >> >
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> -
>>> >> >> >> Hemant Sural.e
>>> >> >> >>
>>> >> >> >> On Wed, Nov 21, 2012 at 11:50 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>>> >> >> >> > On Wed, 21 Nov 2012, hemant surale wrote:
>>> >> >> >> >> >>    and one more thing how can it be possible to read from one osd and
>>> >> >> >> >> >> then simultaneous write to direct on other osd with less/no traffic?
>>> >> >> >> >> >
>>> >> >> >> >> > I'm not sure I understand the question...
>>> >> >> >> >>
>>> >> >> >> >> Scenario :
>>> >> >> >> >>        I have written file X.txt on some osd which is primary for filr
>>> >> >> >> >> X.txt ( direct write operation using rados cmd) .
>>> >> >> >> >>        Now while read on file X.txt is in progress, Can I make sure
>>> >> >> >> >> the simultaneous write request must be directed to other osd using
>>> >> >> >> >> crushmaps/other way?
>>> >> >> >> >
>>> >> >> >> > Nope.  The object location is based on the name.  Reads and writes go to
>>> >> >> >> > the same location so that a single OSD can serialize request.  That means,
>>> >> >> >> > for example, that a read that follows a write returns the just-written
>>> >> >> >> > data.
>>> >> >> >> >
>>> >> >> >> > sage
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >> Goal of task :
>>> >> >> >> >>        Trying to avoid read - write clashes as much as possible to
>>> >> >> >> >> achieve faster operations (I/O) . Although CRUSH selects osd for data
>>> >> >> >> >> placement based on pseudo random function.  is it possible ?
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> -
>>> >> >> >> >> Hemant Surale.
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> On Tue, Nov 20, 2012 at 10:15 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>>> >> >> >> >> > On Tue, 20 Nov 2012, hemant surale wrote:
>>> >> >> >> >> >> Hi Community,
>>> >> >> >> >> >>    I have question about port number used by ceph-osd daemon . I
>>> >> >> >> >> >> observed traffic (inter -osd communication while data ingest happened)
>>> >> >> >> >> >> on port 6802 and then after some time when I ingested second file
>>> >> >> >> >> >> after some delay port no 6804 was used . Is there any specific reason
>>> >> >> >> >> >> to change port no here?
>>> >> >> >> >> >
>>> >> >> >> >> > The ports are dynamic.  Daemons bind to a random (6800-6900) port on
>>> >> >> >> >> > startup and communicate on that.  They discover each other via the
>>> >> >> >> >> > addresses published in the osdmap when the daemon starts.
>>> >> >> >> >> >
>>> >> >> >> >> >>    and one more thing how can it be possible to read from one osd and
>>> >> >> >> >> >> then simultaneous write to direct on other osd with less/no traffic?
>>> >> >> >> >> >
>>> >> >> >> >> > I'm not sure I understand the question...
>>> >> >> >> >> >
>>> >> >> >> >> > sage
>>> >> >> >> >> --
>>> >> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> >> >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> >> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >>
>>> >> >>
>>> >>
>>> >>
>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html