Hi Sage,Community , I am unable to use 2 directories to direct data to 2 different pools. I did following expt. Created 2 pool "host" & "ghost" to seperate data placement . --------------------------------------------------//crushmap file ------------------------------------------------------- # begin crush map # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 # types type 0 osd type 1 host type 2 rack type 3 row type 4 room type 5 datacenter type 6 pool type 7 ghost # buckets host hemantone-mirror-virtual-machine { id -6 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.2 weight 1.000 } host hemantone-virtual-machine { id -7 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.1 weight 1.000 } rack one { id -2 # do not change unnecessarily # weight 2.000 alg straw hash 0 # rjenkins1 item hemantone-mirror-virtual-machine weight 1.000 item hemantone-virtual-machine weight 1.000 } ghost hemant-virtual-machine { id -4 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.0 weight 1.000 } ghost hemant-mirror-virtual-machine { id -5 # do not change unnecessarily # weight 1.000 alg straw hash 0 # rjenkins1 item osd.3 weight 1.000 } rack two { id -3 # do not change unnecessarily # weight 2.000 alg straw hash 0 # rjenkins1 item hemant-virtual-machine weight 1.000 item hemant-mirror-virtual-machine weight 1.000 } pool default { id -1 # do not change unnecessarily # weight 4.000 alg straw hash 0 # rjenkins1 item one weight 2.000 item two weight 2.000 } # rules rule data { ruleset 0 type replicated min_size 1 max_size 10 step take default step take one step chooseleaf firstn 0 type host step emit } rule metadata { ruleset 1 type replicated min_size 1 max_size 10 step take default step take one step chooseleaf firstn 0 type host step emit } rule rbd { ruleset 2 type replicated min_size 1 max_size 10 step take default step take one step chooseleaf firstn 0 type host step emit } rule forhost { ruleset 3 type replicated min_size 1 max_size 10 step take default step take one step chooseleaf firstn 0 type host step emit } rule forghost { ruleset 4 type replicated min_size 1 max_size 10 step take default step take two step chooseleaf firstn 0 type ghost step emit } # end crush map ------------------------------------------------------------------------------------------------------------------------ 1) set replication factor to 2. and crushrule accordingly . ( "host" got crush_ruleset = 3 & "ghost" pool got crush_ruleset = 4). 2) Now I mounted data to dir. using "mount.ceph 10.72.148.245:6789:/ /home/hemant/x" & "mount.ceph 10.72.148.245:6789:/ /home/hemant/y" 3) then "mds add_data_pool 5" & "mds add_data_pool 6" ( here pool id are host = 5, ghost = 6) 4) "cephfs /home/hemant/x set_layout --pool 5 -c 1 -u 4194304 -s 4194304" & "cephfs /home/hemant/y set_layout --pool 6 -c 1 -u 4194304 -s 4194304" PROBLEM: $ cephfs /home/hemant/x show_layout layout.data_pool: 6 layout.object_size: 4194304 layout.stripe_unit: 4194304 layout.stripe_count: 1 cephfs /home/hemant/y show_layout layout.data_pool: 6 layout.object_size: 4194304 layout.stripe_unit: 4194304 layout.stripe_count: 1 Both dir are using same pool to place data even after I specified to use separate using "cephfs" cmd. Please help me figure this out. - Hemant Surale. On Thu, Nov 29, 2012 at 3:45 PM, hemant surale <hemant.surale@xxxxxxxxx> wrote: >>> does 'ceph mds dump' list pool 3 in teh data_pools line? > > Yes. It lists the desired poolids I wanted to put data in. > > > ---------- Forwarded message ---------- > From: hemant surale <hemant.surale@xxxxxxxxx> > Date: Thu, Nov 29, 2012 at 2:59 PM > Subject: Re: OSD daemon changes port no > To: Sage Weil <sage@xxxxxxxxxxx> > > > I used a little different version of "cephfs" as "cephfs > /home/hemant/a set_layout --pool 3 -c 1 -u 4194304 -s 4194304" > and "cephfs /home/hemant/b set_layout --pool 5 -c 1 -u 4194304 -s 4194304". > > > Now cmd didnt showed any error but When I put data to dir "a" & "b" > ideally it should go to different pool but its not working as of now. > Whatever I am doing is it possible (to use 2 dir pointing to 2 > different pools for data placement) ? > > > > - > Hemant Surale. > > On Tue, Nov 27, 2012 at 10:21 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: >> On Tue, 27 Nov 2012, hemant surale wrote: >>> I did "mkdir a " "chmod 777 a" . So dir "a" is /home/hemant/a" . >>> then I used "mount.ceph 10.72.148.245:/ /ho >>> >>> root@hemantsec-virtual-machine:/home/hemant# cephfs /home/hemant/a >>> set_layout --pool 3 >>> Error setting layout: Invalid argument >> >> does 'ceph mds dump' list pool 3 in teh data_pools line? >> >> sage >> >>> >>> On Mon, Nov 26, 2012 at 9:56 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: >>> > On Mon, 26 Nov 2012, hemant surale wrote: >>> >> While I was using "cephfs" following error is observed - >>> >> ------------------------------------------------------------------------------------------------ >>> >> root@hemantsec-virtual-machine:~# cephfs /mnt/ceph/a --pool 3 >>> >> invalid command >>> > >>> > Try >>> > >>> > cephfs /mnt/ceph/a set_layout --pool 3 >>> > >>> > (set_layout is the command) >>> > >>> > sage >>> > >>> >> usage: cephfs path command [options]* >>> >> Commands: >>> >> show_layout -- view the layout information on a file or dir >>> >> set_layout -- set the layout on an empty file, >>> >> or the default layout on a directory >>> >> show_location -- view the location information on a file >>> >> Options: >>> >> Useful for setting layouts: >>> >> --stripe_unit, -u: set the size of each stripe >>> >> --stripe_count, -c: set the number of objects to stripe across >>> >> --object_size, -s: set the size of the objects to stripe across >>> >> --pool, -p: set the pool to use >>> >> >>> >> Useful for getting location data: >>> >> --offset, -l: the offset to retrieve location data for >>> >> >>> >> ------------------------------------------------------------------------------------------------ >>> >> It may be silly question but unable to figure it out. >>> >> >>> >> :( >>> >> >>> >> >>> >> >>> >> >>> >> On Wed, Nov 21, 2012 at 8:59 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: >>> >> > On Wed, 21 Nov 2012, hemant surale wrote: >>> >> >> > Oh I see. Generally speaking, the only way to guarantee separation is to >>> >> >> > put them in different pools and distribute the pools across different sets >>> >> >> > of OSDs. >>> >> >> >>> >> >> yeah that was correct approach but i found problem doing so from >>> >> >> abstract level i.e. when I put file inside mounted dir >>> >> >> "/home/hemant/cephfs " ( mounted using "mount.ceph" cmd ) . At that >>> >> >> time anyways ceph is going to use default pool data to store files ( >>> >> >> here files were striped into different objects and then sent to >>> >> >> appropriate osd ) . >>> >> >> So how to tell ceph to use different pools in this case ? >>> >> >> >>> >> >> Goal : separate read and write operations , where read will be done >>> >> >> from one group of OSD and write is done to other group of OSD. >>> >> > >>> >> > First create the other pool, >>> >> > >>> >> > ceph osd pool create <name> >>> >> > >>> >> > and then adjust the CRUSH rule to distributed to a different set of OSDs >>> >> > for that pool. >>> >> > >>> >> > To allow cephfs use it, >>> >> > >>> >> > ceph mds add_data_pool <poolid> >>> >> > >>> >> > and then: >>> >> > >>> >> > cephfs /mnt/ceph/foo --pool <poolid> >>> >> > >>> >> > will set the policy on the directory such that new files beneath that >>> >> > point will be stored in a different pool. >>> >> > >>> >> > Hope that helps! >>> >> > sage >>> >> > >>> >> > >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> - >>> >> >> Hemant Surale. >>> >> >> >>> >> >> >>> >> >> On Wed, Nov 21, 2012 at 12:33 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: >>> >> >> > On Wed, 21 Nov 2012, hemant surale wrote: >>> >> >> >> Its a little confusing question I believe . >>> >> >> >> >>> >> >> >> Actually there are two files X & Y. When I am reading X from its >>> >> >> >> primary .I want to make sure simultaneous writing of Y should go to >>> >> >> >> any other OSD except primary OSD for X (from where my current read is >>> >> >> >> getting served ) . >>> >> >> > >>> >> >> > Oh I see. Generally speaking, the only way to guarantee separation is to >>> >> >> > put them in different pools and distribute the pools across different sets >>> >> >> > of OSDs. Otherwise, it's all (pseudo)random and you never know. Usually, >>> >> >> > they will be different, particularly as the cluster size increases, but >>> >> >> > sometimes they will be the same. >>> >> >> > >>> >> >> > sage >>> >> >> > >>> >> >> > >>> >> >> >> >>> >> >> >> >>> >> >> >> - >>> >> >> >> Hemant Sural.e >>> >> >> >> >>> >> >> >> On Wed, Nov 21, 2012 at 11:50 AM, Sage Weil <sage@xxxxxxxxxxx> wrote: >>> >> >> >> > On Wed, 21 Nov 2012, hemant surale wrote: >>> >> >> >> >> >> and one more thing how can it be possible to read from one osd and >>> >> >> >> >> >> then simultaneous write to direct on other osd with less/no traffic? >>> >> >> >> >> > >>> >> >> >> >> > I'm not sure I understand the question... >>> >> >> >> >> >>> >> >> >> >> Scenario : >>> >> >> >> >> I have written file X.txt on some osd which is primary for filr >>> >> >> >> >> X.txt ( direct write operation using rados cmd) . >>> >> >> >> >> Now while read on file X.txt is in progress, Can I make sure >>> >> >> >> >> the simultaneous write request must be directed to other osd using >>> >> >> >> >> crushmaps/other way? >>> >> >> >> > >>> >> >> >> > Nope. The object location is based on the name. Reads and writes go to >>> >> >> >> > the same location so that a single OSD can serialize request. That means, >>> >> >> >> > for example, that a read that follows a write returns the just-written >>> >> >> >> > data. >>> >> >> >> > >>> >> >> >> > sage >>> >> >> >> > >>> >> >> >> > >>> >> >> >> >> Goal of task : >>> >> >> >> >> Trying to avoid read - write clashes as much as possible to >>> >> >> >> >> achieve faster operations (I/O) . Although CRUSH selects osd for data >>> >> >> >> >> placement based on pseudo random function. is it possible ? >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> - >>> >> >> >> >> Hemant Surale. >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> On Tue, Nov 20, 2012 at 10:15 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: >>> >> >> >> >> > On Tue, 20 Nov 2012, hemant surale wrote: >>> >> >> >> >> >> Hi Community, >>> >> >> >> >> >> I have question about port number used by ceph-osd daemon . I >>> >> >> >> >> >> observed traffic (inter -osd communication while data ingest happened) >>> >> >> >> >> >> on port 6802 and then after some time when I ingested second file >>> >> >> >> >> >> after some delay port no 6804 was used . Is there any specific reason >>> >> >> >> >> >> to change port no here? >>> >> >> >> >> > >>> >> >> >> >> > The ports are dynamic. Daemons bind to a random (6800-6900) port on >>> >> >> >> >> > startup and communicate on that. They discover each other via the >>> >> >> >> >> > addresses published in the osdmap when the daemon starts. >>> >> >> >> >> > >>> >> >> >> >> >> and one more thing how can it be possible to read from one osd and >>> >> >> >> >> >> then simultaneous write to direct on other osd with less/no traffic? >>> >> >> >> >> > >>> >> >> >> >> > I'm not sure I understand the question... >>> >> >> >> >> > >>> >> >> >> >> > sage >>> >> >> >> >> -- >>> >> >> >> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> >> >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> >> >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >>> >>> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html