Dear Ilya,
Thank you very much for your reply. Here are more problems to the problem.
On Thu, Dec 18, 2014 at 12:09 AM, Ilya Dryomov <ilya.dryomov@xxxxxxxxxxx> wrote:
On Tue, Dec 16, 2014 at 6:19 AM, Cyan Cheng <cheng.1986@xxxxxxxxx> wrote:
> Dear All,
>
> We have set up ceph and used it for about one year already.
>
> Here is a summary of the setting. We used 3 servers to run the ceph.
>
> cs02, cs03, cs04
>
> Here is how we set up the ceph:
>
> 1. We created several OSDs on three of these servers. using command like:
>
>> ceph-deploy osd create cs02:/dev/sdc …. cs03:/dev/… cs04:/dev/….
>
> 2. And have created MDS on cs02:
>
>> ceph-deploy mds create ilab-cs02
>
> 3. After that, we have created a RADOS block device on cs02 by
>
>> rbd create rbd-research --size 10240000
>
> 4. Then mapped rbd-research
>
>> sudo tbd map rbd-resrearch —pool rbd
I assume that's "rbd map". Mapping images on the same physical box
that's also running OSDs works in general but isn't a very good idea.
Yes, we did "rbd map", not "tbd map"
We will map image to other servers after we solve this critical situation of our system.
>
> 5. Then make file system
>
>> sudo mkfs.ext4 /dev/rbd/rbd/rbd-research
>
> 6. Then mkdir and mount the rbd by adding this line to /etc/fstab
>
> /dev/rbd/rbd/rbd-research /mnt/retinadata ext4 defaults,users 0 2
>
> 7. Then mount
>
>> mount /mnt/retinadata
>
> It worked reliably until recently we had a power off of our servers
> accidentally.
>
> After power recovered. cs03, cs04 were automatically boot up, while cs02
> were not automatically boot up. There is a message shown on cs02 telling
> something like “not able to mount /mnt/retinadata, device not found, press S
> to ignore and continue booting, press M to manual configure”. We selected S
> and booted up the system.
>
> Then we found that, /mnt/retinadata was not mount and the rbd image at
> /dev/rbd/rbd/rbd1 was not there.
>
> We map the rbd image once again by.
>
>> sudo tbd map rbd-research —pool rbd
>
> Then we were able to mount /mnt/retinadata
>
> But the result we have now is :
> 1. All the file system structures are there.
> 2. All the files are of 0 byte size.
All files or just those that you (your system) were working with at the
time of the power reset?
Actually we had this 0 byte file problem two times, let me explain in more detail:
1. 1st time is caused by power off of all three servers. Part of our files were affected (e.g. 1700 out of 2000 zip files were zero byte, whole postgres database folder were not affected). But the affected files were not only the files that the system was working on.
After noticing there were files lost in the system we created an new image (rbd-research2) and copied the files in the old image (rbd-research) and let the system run again on rbd-research2.
2. After running a few days, server cs02 was down. At that time, some users were uploading images for processing. We cool rebooted cs02, and after mapping again, all the files in rbd-research2 were affected.
>
> Could anybody help on this issue? Thank you very much in advance.
>
> Some more information. We tried to reboot cs02 again. And we see a full
> screen of error message like:
>
> [44038.215233] libceph: connect 192.168.1.31:6789 socket error on write
> [44038.215308] libceph: mon1 192.1.168.31:6789 error -101
> libceph: connect 192.168.1.41:6812 error -101
> libceph: osd22 192.168.1.41:6812 socket error on write
That's "Network is unreachable" so probably something's wrong with your
network.
BTW, IPs of those servers are
cs02: 192.168.1.21
cs03: 192.168.1.31
cs04: 192.168.1.41
cs03 and cs04 were alive, at the time of rebooting cs02. It does not report any connection problem when we check ceph status by
>ceph -s
Thank you very much again, Ilya.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com