Re: rados rm: device or resource busy

Brad Hubbard <bhubbard@xxxxxxxxxx> · Fri, 9 Jun 2017 10:13:47 +1000

I can reproduce this.

The key is to look at debug logging on the primary.

2017-06-09 09:30:14.776355 7f9cf26a4700 20 <cls>
/home/brad/working/src/ceph3/src/cls/lock/cls_lock.cc:247: lock_op
2017-06-09 09:30:14.776359 7f9cf26a4700 20 <cls>
/home/brad/working/src/ceph3/src/cls/lock/cls_lock.cc:162: requested
lock_type=exclusive fail_if_exists=1
2017-06-09 09:30:14.776363 7f9cf26a4700 10 osd.0 pg_epoch: 10 pg[0.6(
v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f
9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27
active+clean] do_osd_op 0:6d521d9c:::testfile.0000000000000000:head
[getxattr lock.striper.lock]
2017-06-09 09:30:14.776372 7f9cf26a4700 10 osd.0 pg_epoch: 10 pg[0.6(
v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f
9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27
active+clean] do_osd_op  getxattr lock.striper.lock
2017-06-09 09:30:14.776383 7f9cf26a4700 15
filestore(/home/brad/working/src/ceph3/build/dev/osd0) getattr
0.6_head/#0:6d521d9c:::testfile.0000000000000000:head#
'_lock.striper.lock'
2017-06-09 09:30:14.776408 7f9cf26a4700 10
filestore(/home/brad/working/src/ceph3/build/dev/osd0) getattr
0.6_head/#0:6d521d9c:::testfile.0000000000000000:head#
'_lock.striper.lock' = 126
2017-06-09 09:30:14.776419 7f9cf26a4700 20 <cls>
/home/brad/working/src/ceph3/src/cls/lock/cls_lock.cc:189: cannot take
lock on object, conflicting tag
2017-06-09 09:30:14.776422 7f9cf26a4700 10 osd.0 pg_epoch: 10 pg[0.6(
v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f
9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27
active+clean] method called response length=0
2017-06-09 09:30:14.776432 7f9cf26a4700 10 osd.0 pg_epoch: 10 pg[0.6(
v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f
9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27
active+clean]  dropping ondisk_read_lock
2017-06-09 09:30:14.776445 7f9cf26a4700 20 osd.0 pg_epoch: 10 pg[0.6(
v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f
9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27
active+clean]  op order client.4122 tid 1 (first)
2017-06-09 09:30:14.776453 7f9cf26a4700 20 osd.0 pg_epoch: 10 pg[0.6(
v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f
9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27
active+clean] execute_ctx update_log_only -- result=-16
2017-06-09 09:30:14.776468 7f9cf26a4700 20 osd.0 pg_epoch: 10 pg[0.6(
v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f
9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27
active+clean] record_write_error r=-16
2017-06-09 09:30:14.776478 7f9cf26a4700 10 osd.0 pg_epoch: 10 pg[0.6(
v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f
9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27
active+clean] submit_log_entries 10'32 (0'0) error
0:6d521d9c:::testfile.0000000000000000:head by client.4122.0:1
0.000000 -16
2017-06-09 09:30:14.776490 7f9cf26a4700 10 osd.0 pg_epoch: 10 pg[0.6(
v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f
9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27
active+clean] new_repop: repgather(0x565246704a80 10'32 rep_tid=33
committed?=0 applied?=0 r=-16)
2017-06-09 09:30:14.776502 7f9cf26a4700 10 osd.0 pg_epoch: 10 pg[0.6(
v 10'31 (0'0,10'31] local-lis/les=8/9 n=2 ec=1/1 lis/c 8/8 les/c/f
9/9/0 8/8/4) [0,1,2] r=0 lpr=8 crt=10'28 lcod 10'30 mlcod 10'27
active+clean] merge_new_log_entries 10'32 (0'0) error
0:6d521d9c:::testfile.0000000000000000:head by client.4122.0:1
0.000000 -16
2017-06-09 09:30:14.776514 7f9cf26a4700 20 update missing, append
10'32 (0'0) error    0:6d521d9c:::testfile.0000000000000000:head by
client.4122.0:1 0.000000 -16

Specifically this.

/home/brad/working/src/ceph3/src/cls/lock/cls_lock.cc:189: cannot take
lock on object, conflicting tag

That's here where you will notice it is returning EBUSY which is error
code 16, "Device or resource busy".

https://github.com/badone/ceph/blob/wip-ceph_test_admin_socket_output/src/cls/lock/cls_lock.cc#L189

In order to remove the existing parts of the file you should be able
to just run "rados --pool testpool ls" and remove the listed objects
belonging to "testfile".

Example:
rados --pool testpool ls
testfile.0000000000000004
testfile.0000000000000001
testfile.0000000000000000
testfile.0000000000000003
testfile.0000000000000005
testfile.0000000000000002

rados --pool testpool rm testfile.0000000000000000
rados --pool testpool rm testfile.0000000000000001
...

Please open a tracker for this so it can be investigated further.

On Fri, Jun 9, 2017 at 1:43 AM, Jan Kasprzak <kas@xxxxxxxxxx> wrote:
>         Hello,
>
> David Turner wrote:
> : How long have you waited?
>
>         About a day.
>
> : I don't do much with rados objects directly.  I usually use RBDs and
> : cephfs.  If you just need to clean things up, you can delete the pool and
> : recreate it since it looks like it's testing.  However this is probably a
> : prime time to figure out how to get past this in case it happens in the
> : future in production.
>
>         Yes. This is why I am asking now.
>
> -Yenya
>
> : On Thu, Jun 8, 2017 at 11:04 AM Jan Kasprzak <kas@xxxxxxxxxx> wrote:
> : > I have created a RADOS striped object using
> : >
> : > $ dd someargs | rados --pool testpool --striper put testfile -
> : >
> : > and interrupted it in the middle of writing. Now I cannot remove this
> : > object:
> : >
> : > $ rados --pool testpool --striper rm testfile
> : > error removing testpool>testfile: (16) Device or resource busy
> : >
> : > How can I tell CEPH that the writer is no longer around and does not come
> : > back,
> : > so that I can remove the object "testfile"?
>
> --
> | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
> | http://www.fi.muni.cz/~kas/                         GPG: 4096R/A45477D5 |
>> That's why this kind of vulnerability is a concern: deploying stuff is  <
>> often about collecting an obscene number of .jar files and pushing them <
>> up to the application server.                          --pboddie at LWN <
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com