Re: rbd map issues: no such file or directory (ENOENT) AND map wrong image

Josh Durgin <josh.durgin@xxxxxxxxxxx> · Mon, 12 Aug 2013 19:41:14 -0700

[re-adding ceph-users so others can benefit from the archives]

On 08/12/2013 07:18 PM, PJ wrote:
2013/8/13 Josh Durgin <josh.durgin@xxxxxxxxxxx>:
On 08/12/2013 10:19 AM, PJ wrote:

Hi All,

Before go on the issue description, here is our hardware configurations:
- Physical machine * 3: each has quad-core CPU * 2, 64+ GB RAM, HDD * 12
(500GB ~ 1TB per drive; 1 for system, 11 for OSD). ceph OSD are on
physical machines.
- Each physical machine runs 5 virtual machines. One VM as ceph MON
(i.e. totally 3 MONs), the other 4 VMs provides either iSCSI or FTP/NFS
service
- Physical machines and virtual machines are based on the same software
condition: Ubuntu 12.04 + kernel 3.6.11, ceph v0.61.7

The issues we met are,

1. Right after ceph installation, create pool then create image and map
is no problem. But if we do not use the whole environment more than half
day, do the same process (create pool -> create image -> map image) will
return error: no such file or directory (ENOENT). Once the issue occurs,
it could be easily reproduce by the same process. But this issue may be
disappear if wait 10+ minutes after pool creation. Reboot system also
could avoid it.

This sounds similar to http://tracker.ceph.com/issues/5925 - and
your case suggests it may be a monitor bug, since that test is userspace
and you're using the kernel client. Could you reproduce
this with logs from your monitors from the time of pool creation to
after the map fails with ENOENT, and these log settings on all mons:

debug ms = 1
debug mon = 20
debug paxos = 10

If you could attach those logs to the bug or otherwise make them
available that'd be great.

We will add these settings to gather the log. By the way, we try to
avoid this issue by using the default pool (rbd) only. Will it be
useful in this case?

No, the case I'm interested in is when the 'rbd map' fails because
there's a new pool.

I had success and failed straces logged on the same virtual machine (the
one provides FTP/NFS):
success: https://www.dropbox.com/s/u8jc4umak24kr1y/rbd_done.txt
failed: https://www.dropbox.com/s/ycuupmmrlc4d0ht/rbd_failed.txt

Unfortunately these won't tell us much since the kernel is doing all the
work with rbd map.

2. The second issue is to create two images (AAA and BBB) under one pool
(xxx), if we map "rbd -p xxx image AAA", the result is success but it
shows BBB under /dev/rbd/xxx/. Use "rbd showmapped", it shows "AAA" of
pool xxx is mapped. I am not sure which one is really mapped because
both images are empty. This issue is hard to reproduce but once happens
/dev/rbd/ are mess-up.

That sounds very strange, since 'rbd showmapped' and the udev rule that
creates the /dev/rbd/pool/image symlinks use the same data source -
/sys/bus/rbd/N/name. This sounds like a race condition where sysfs is
being read (and reading stale memory) before the kernel finishes
populating it. Could you file this in the tracker?

I will file to tracker.

Checking whether it still occurs in linux 3.10 would be great too. It doesn't seem
possible with the current code.

Current code means Linux kernel 3.10 or 3.6?

Current code in 3.10 doesn't look like this issue is possible, unless
I'm missing something. There's been a lot of refactoring since 3.6
though, so it's possible the bug was fixed accidentally.

One more question but not about rbd map issues. Our usage is to map one
rbd device and mount in several places (in one virtual machine) for
iSCSI, FTP and NFS, does that cause any problem to ceph operation?

If it's read-only everywhere, it's fine, but otherwise you'll run into
problems unless you've got something on top of rbd managing access to
it, like ocfs2. You could use nfs on top of one rbd device, but having
multiple nfs servers on top of the same rbd device won't work unless
they can coordinate with each other. The same applies to iscsi and ftp.

If the target rbd device only map on one virtual machine, format it as
ext4 and mount to two places
   mount /dev/rbd0 /nfs --> for nfs server usage
   mount /dev/rbd0 /ftp  --> for ftp server usage
nfs and ftp servers run on the same virtual machine. Will file system
(ext4) help to handle the simultaneous access from nfs and ftp?

I doubt that'll work perfectly on a normal disk, although rbd should
behave the same in this case. Consider what happens when to be some
issues when the same files are modified at once by the ftp and nfs
servers. You could run ftp on an nfs client on a different machine
safely.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com