Re: crash of osd using cephfs jewel 10.2.2, and corruption

Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> · Wed, 21 Sep 2016 20:00:29 +0200

It seems the trigger for the problem is this:
> 24.9141130d 10000000527.00000000 [write 0~242] snapc 1=[] ondisk+write
> e320)
>    -40> 2016-09-20 20:38:02.007942 708f67bbd700  0
> filestore(/var/lib/ceph/osd/ceph-0) write couldn't open
> 24.32_head/#24:4d11884b:::10000000504.00000000:head#: (24) Too many
> open files
>    -39> 2016-09-20 20:38:02.007759 708f673ae700  0
> filestore(/var/lib/ceph/osd/ceph-0) write couldn't open

(and the compressed log is 23MB... do you really want it still?)

So I can understand the osd has little choice but to give up, but
corruption is not okay.

So I set the open files limit to 10000 soft 13000 hard, and it seems
fine now. (the default was only 1024.) Grsecurity kernels actually
enforce these things, so maybe that's why nobody else noticed this
corruption problem.

BTW since the symptom is similar, let me mention another bug I found in
hammer 0.94.9... I was playing around with cephfs and `ceph osd
blacklist add ...` and blacklisted a client that wrote some files, and
then whether I kill -9, umount -l, unblacklist right away, etc. it will
show non-corrupt files on the client until cache flush (umount or sysctl
vm.drop_caches=3), and then the files I wrote will be all nulls, even
though no osds crashed. I couldn't reproduce that in jewel 10.2.2
though. So that's 2 ways to cause this... so it should be solved other
than just the limits.conf adjustment.

Peter

On 09/21/16 13:02, Samuel Just wrote:
> Looks like the OSD didn't like an error return it got from the
> underlying fs.  Can you reproduce with
>
> debug filestore = 20
> debug osd = 20
> debug ms = 1
>
> on the osd and post the whole log?
> -Sam
>
> On Wed, Sep 21, 2016 at 12:10 AM, Peter Maloney
> <peter.maloney@xxxxxxxxxxxxxxxxxxxx> wrote:
>> Hi,
>>
>> I created a one disk osd with data and separate journal on the same lvm
>> volume group just for test, one mon, one mds on my desktop.
>>
>> I managed to crash the osd just by mounting cephfs and doing cp -a of
>> the linux-stable git tree into it. It crashed after copying 2.1G which
>> only covers some of the .git dir and none of the rest. And then when I
>> killed ceph-mds and restarted the osd and mds, ceph -s said something
>> about the pgs being stuck or unclean or something, and the computer
>> froze. :/ After booting again, everything is fine, and the problem was
>> reproducable the same way...just copying the files again.[but after
>> writing this mail, I can't seem to cause it as easily again... copying
>> again works, but sha1sum doesn't, even if I drop caches]
>>
>> Also reading seems to do the same.
>>
>> And then I tried adding a 2nd osd (also from vlm, with osd and journal
>> on same volume group). And that seemed to stop the crashing, but not
>> sure about corruption.I guess the corruption was on the cephfs but RAM
>> had good copies or something, so rebooting, etc. is what made the
>> corruption appear? (I tried to reproduce, but couldn't...didn't try
>> killing daemons)
>>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com