Re: rbd locking and handling broken clients

Florian Haas <florian@xxxxxxxxxxx> · Thu, 14 Jun 2012 12:37:28 +0200

On Thu, Jun 14, 2012 at 1:41 AM, Greg Farnum <greg@xxxxxxxxxxx> wrote:
> On Wednesday, June 13, 2012 at 1:37 PM, Florian Haas wrote:
>> Greg,
>>
>> My understanding of Ceph code internals is far too limited to comment on
>> your specific points, but allow me to ask a naive question.
>>
>> Couldn't you be stealing a lot of ideas from SCSI-3 Persistent
>> Reservations? If you had server-side (OSD) persistence of information of
>> the "this device is in use by X" type (where anything other than X would
>> get an I/O error when attempting to access data), and you had a manual,
>> authenticated override akin to SCSI PR preemption, plus key
>> registration/exchange for that authentication, then you would at least
>> have to have the combination of a misbehaving OSD plus a malicious
>> client for data corruption. A non-malicious but just broken client
>> probably won't do.
>>
>> Clearly I may be totally misguided, as Ceph is fundamentally
>> decentralized and SCSI isn't, but if PR-ish behavior comes even close to
>> what you're looking for, grabbing those ideas would look better to me
>> than designing your own wheel.
>
> Yeah, the problem here is exactly that Ceph (and RBD) are fundamentally decentralized. :)

True, but as a general comment I do posit that to say "X is not
exactly like Y, thus nothing applicable to X applies to Y" is a
fallacy. :)

> I'm not familiar with the SCSI PR mechanism either, but it looks to me like it deals in entirely local information — the equivalent with RBD would require performing a locking operation on every object in the RBD image before you accessed it. We could do that, but then opening an image would take time linear in its size… :(

Well you would make this configurable and optional, wouldn't you? Kind
of like no-one forces people to use PRs on SCSI LUs. When this is
being used, however, taking a performance hit on open sounds like a
reasonable price to pay for not shredding data. TANSTAAFL.

Again, this is just my poorly informed two cents. :)

Cheers,
Florian
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html