RE: ceph osd on shared storage

Somnath Roy <Somnath.Roy@xxxxxxxxxxx> · Fri, 13 May 2016 18:55:11 +0000

Thanks Greg !
The CRUSH map point I completely forgot , failure domain on OSD will not be acceptable :-(

Regards
Somnath

-----Original Message-----
From: Gregory Farnum [mailto:gfarnum@xxxxxxxxxx] 
Sent: Friday, May 13, 2016 11:34 AM
To: Somnath Roy
Cc: nick@xxxxxxxxxx; ceph-devel@xxxxxxxxxxxxxxx
Subject: Re: ceph osd on shared storage

On Fri, May 13, 2016 at 7:23 AM, Somnath Roy <Somnath.Roy@xxxxxxxxxxx> wrote:
> Yeah probably..
> The whole reason I am forced to think this way because people (not 
> familiar with ceph) are asking if you have fully shared storage why a 
> node failure will trigger recovery since storage is fine, which I 
> believe is make sense.. :-)

This does make some sense, but Ceph is really designed for shared-nothing hardware. So anybody selling a shared-disk system with Ceph probably wants to implement this stuff, but none of the upstream management is designed to be friendly for it (except for the portability of OSDs!).

I imagine you'd designate CRUSH maps in terms of backing drives instead of OSD hosts, so that moving the drives doesn't change the mappings at all. And then do as suggested with the process management.
-Greg

>
> -----Original Message-----
> From: Nick Fisk [mailto:nick@xxxxxxxxxx]
> Sent: Friday, May 13, 2016 1:43 AM
> To: Somnath Roy; ceph-devel@xxxxxxxxxxxxxxx
> Subject: RE: ceph osd on shared storage
>
>
>
>> -----Original Message-----
>> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel- 
>> owner@xxxxxxxxxxxxxxx] On Behalf Of Somnath Roy
>> Sent: 13 May 2016 03:36
>> To: ceph-devel@xxxxxxxxxxxxxxx
>> Subject: ceph osd on shared storage
>>
>> Hi,
>> I have a storage array that is shared between say 4 hosts. Each host 
>> can see all the drives and that's why trying to mount OSDs configured to the drives.
>> End result is not good.
>> I want a specific OSD to come up on a specific host even if a host is 
>> seeing all the drives on a chassis. Is there any way in the ceph 
>> deployment script so that I can address this ?
>> This will be very helpful in case of shared storage model in the following way.
>>
>> 1. Today if we do a zone (HW or SW) and attach some set of OSDs to a 
>> particular host , that host down OSDs will be inaccessible and 
>> recovery will kick off even if storage is just fine.
>>
>> 2. But, in the shared model we can have an external agent that can 
>> detect host failure and can make the same OSD pop up on the other available host.
>>
>> 3. Once the faulty host is replaced, same OSD can go back to old host.
>>
>> 4. This will save a lot of time cluster will be spending on recovery otherwise.
>>
>> I know there are some dev effort required , but, is this sound sane 
>> and worth an effort ?
>> Any feedback is much appreciated.
>
> To me it sounds like you would stop using udev to auto mount the disks and rely on something like pacemaker to mount the FS and start the OSD's. Without pacemaker controlling the fencing, there are probably too many things that can go wrong.
>
>>
>> Thanks & Regards
>> Somnath
>> PLEASE NOTE: The information contained in this electronic mail 
>> message is intended only for the use of the designated recipient(s) 
>> named above. If the reader of this message is not the intended 
>> recipient, you are hereby notified that you have received this 
>> message in error and that any review, dissemination, distribution, or 
>> copying of this message is strictly prohibited. If you have received 
>> this communication in error, please notify the sender by telephone or 
>> e-mail (as shown above) immediately and destroy any and all copies of 
>> this message in your possession (whether hard copies or electronically stored copies).
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo 
>> info at http://vger.kernel.org/majordomo-info.html
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f