Discussion:
[PATCH] [SCSI] sr: Fix multi-drive performance by using per-device mutexes
Otto Meta
2013-01-01 14:20:10 UTC
Permalink
The single mutex for the sr module, introduced as a BKL replacement,
globally serialises all sr ioctls, which hurts multi-drive performance.

This patch replaces sr_mutex with per-device mutexes in struct scsi_cd,
allowing concurrent ioctls on different sr devices.

Signed-off-by: Otto Meta <***@sister-shadow.de>
---

This issue was discussed before, so here are some relevant archived messages:

sr_mutex was introduced as a BKL replacement:
"[PATCH 7/7] block: autoconvert trivial BKL users to private mutex"
https://lkml.org/lkml/2010/9/14/338

The resulting performance penalty was first mentioned here along with an
easy to reproduce demonstration, but apparently received no attention:
"scsi/sr - Impossible to write CDs/DVDs simultaneously (since BKL removal ?)"
https://lkml.org/lkml/2011/10/29/117

A while later, the problem was discussed in this thread:
"[PATCH] [SCSI] sr: fix multi-drive performance, remove BKL replacement"
https://lkml.org/lkml/2012/2/28/230

More discussion in this thread:
"Burning multiple DVDs at one time"
https://lkml.org/lkml/2012/3/15/558

A short summary of the discussion:
- The driver-global mutex serialises all sr ioctls because it's not released
while the calling process sleeps, unlike the BKL.
- This hurts when using multiple drives, such as during burning or raw reads.
- Removing the mutex completely fixes the performance penalty but exposes
some data to race conditions.
- Arnd Bergmann had an idea similar to mine, namely replacing sr_mutex
with a per-device mutex inside of cdrom_device_info. James Bottomley and
Stefan Richter agreed with that:
https://lkml.org/lkml/2012/2/28/312

After running into the issue myself while using the CD recovery program
dvdisaster, I believe the following patch fixes the issue and should
provide the same safety as the previous sr_mutex.

Unlike Arnd's suggestion, this solution places the mutex in struct scsi_cd
and thus only affects the sr driver, not the cdrom driver.

sr.h and sr.c haven't seen any changes since at least 3.0.57 and I've
tested the patch with both 3.2.35 and 3.8-rc1.


diff -Naurp linux-3.8-rc1.orig/drivers/scsi/sr.c linux-3.8-rc1.fixed/drivers/scsi/sr.c
--- linux-3.8-rc1.orig/drivers/scsi/sr.c 2012-12-22 02:19:00.000000000 +0100
+++ linux-3.8-rc1.fixed/drivers/scsi/sr.c 2013-01-01 12:56:47.127203406 +0100
@@ -75,7 +75,6 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_WORM);
CDC_CD_R|CDC_CD_RW|CDC_DVD|CDC_DVD_R|CDC_DVD_RAM|CDC_GENERIC_PACKET| \
CDC_MRW|CDC_MRW_W|CDC_RAM)

-static DEFINE_MUTEX(sr_mutex);
static int sr_probe(struct device *);
static int sr_remove(struct device *);
static int sr_done(struct scsi_cmnd *);
@@ -511,24 +510,24 @@ static int sr_block_open(struct block_de
struct scsi_cd *cd;
int ret = -ENXIO;

- mutex_lock(&sr_mutex);
cd = scsi_cd_get(bdev->bd_disk);
if (cd) {
+ mutex_lock(&cd->lock);
ret = cdrom_open(&cd->cdi, bdev, mode);
+ mutex_unlock(&cd->lock);
if (ret)
scsi_cd_put(cd);
}
- mutex_unlock(&sr_mutex);
return ret;
}

static int sr_block_release(struct gendisk *disk, fmode_t mode)
{
struct scsi_cd *cd = scsi_cd(disk);
- mutex_lock(&sr_mutex);
+ mutex_lock(&cd->lock);
cdrom_release(&cd->cdi, mode);
+ mutex_unlock(&cd->lock);
scsi_cd_put(cd);
- mutex_unlock(&sr_mutex);
return 0;
}

@@ -540,7 +539,7 @@ static int sr_block_ioctl(struct block_d
void __user *argp = (void __user *)arg;
int ret;

- mutex_lock(&sr_mutex);
+ mutex_lock(&cd->lock);

/*
* Send SCSI addressing ioctls directly to mid level, send other
@@ -570,7 +569,7 @@ static int sr_block_ioctl(struct block_d
ret = scsi_ioctl(sdev, cmd, argp);

out:
- mutex_unlock(&sr_mutex);
+ mutex_unlock(&cd->lock);
return ret;
}

@@ -660,6 +659,8 @@ static int sr_probe(struct device *dev)
if (!disk)
goto fail_free;

+ mutex_init(&cd->lock);
+
spin_lock(&sr_index_lock);
minor = find_first_zero_bit(sr_index_bits, SR_DISKS);
if (minor == SR_DISKS) {
@@ -722,6 +723,7 @@ static int sr_probe(struct device *dev)

fail_put:
put_disk(disk);
+ mutex_destroy(&cd->lock);
fail_free:
kfree(cd);
fail:
@@ -958,6 +960,8 @@ static void sr_kref_release(struct kref

put_disk(disk);

+ mutex_destroy(&cd->lock);
+
kfree(cd);
}

diff -Naurp linux-3.8-rc1.orig/drivers/scsi/sr.h linux-3.8-rc1.fixed/drivers/scsi/sr.h
--- linux-3.8-rc1.orig/drivers/scsi/sr.h 2012-12-22 02:19:00.000000000 +0100
+++ linux-3.8-rc1.fixed/drivers/scsi/sr.h 2013-01-01 12:56:47.127203406 +0100
@@ -19,6 +19,7 @@

#include <linux/genhd.h>
#include <linux/kref.h>
+#include <linux/mutex.h>

#define MAX_RETRIES 3
#define SR_TIMEOUT (30 * HZ)
@@ -49,6 +50,7 @@ typedef struct scsi_cd {
bool ignore_get_event:1; /* GET_EVENT is unreliable, use TUR */

struct cdrom_device_info cdi;
+ struct mutex lock;
/* We hold gendisk and scsi_device references on probe and use
* the refs on this kref to decide when to release them */
struct kref kref;
--
Otto Meta
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Otto Meta
2013-01-03 23:11:29 UTC
Permalink
Post by Otto Meta
The single mutex for the sr module, introduced as a BKL replacement,
globally serialises all sr ioctls, which hurts multi-drive performance.
This patch replaces sr_mutex with per-device mutexes in struct scsi_cd,
allowing concurrent ioctls on different sr devices.
Unfortunately it wasn't as easy as that. The patch seems to introduce
a race condition that corrupts a drive's state under certain circumstances.

When two drives (e.g. sr0 and sr1) are attached to the same IDE cable, one
drive has its door locked, which will usually be the case after any operation
on the drive with inserted media (and whenever it feels like it, even with
dev.cdrom.lock=0), and the other drive is unlocked, then executing

$ eject sr0 & eject sr1

will eject the unlocked drive and the locked drive will return

eject: unable to eject, last error: Inappropriate ioctl for device


Other drivers down the road probably don't expect concurrent ioctls, so this
patch cannot be applied safely at this time. Sorry about the noise.

For the record: Tested with kernels 3.2.35 and 3.8.0-rc1, using IDE CD/DVD
drives connected via the drivers ata_piix and pata_pdc202xx_old.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Richter
2013-01-04 19:50:27 UTC
Permalink
Post by Otto Meta
Post by Otto Meta
The single mutex for the sr module, introduced as a BKL replacement,
globally serialises all sr ioctls, which hurts multi-drive performance.
This patch replaces sr_mutex with per-device mutexes in struct scsi_cd,
allowing concurrent ioctls on different sr devices.
Unfortunately it wasn't as easy as that. The patch seems to introduce
a race condition that corrupts a drive's state under certain circumstances.
When two drives (e.g. sr0 and sr1) are attached to the same IDE cable, one
drive has its door locked, which will usually be the case after any operation
on the drive with inserted media (and whenever it feels like it, even with
dev.cdrom.lock=0), and the other drive is unlocked, then executing
$ eject sr0 & eject sr1
will eject the unlocked drive and the locked drive will return
eject: unable to eject, last error: Inappropriate ioctl for device
Other drivers down the road probably don't expect concurrent ioctls, so this
patch cannot be applied safely at this time. Sorry about the noise.
For the record: Tested with kernels 3.2.35 and 3.8.0-rc1, using IDE CD/DVD
drives connected via the drivers ata_piix and pata_pdc202xx_old.
As yo may have seen in the mailinglist archive, when Wakko and I tested
with sr_mutex removed without any replacement, we were not able to trigger
any race condition. However, we certainly did not attempt this very
particular test (two drives on the same PATA cable, one locked and one
unlocked, and "eject" called on both of them at the same time). I wonder
if this is a PATA idiosyncrasy.

I will see whether I can do some tests tomorrow. I can easily test master
and slave PATA drives on a single cable behind a PATA-to-1394 bridge; but
testing two drives on a single cable behind a PATA-to-PCI controller would
be a bit more involved because the case of my PATA-equipped Linux PC is
rather cramped.
--
Stefan Richter
-=====-===-= ---= --=--
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Wakko Warner
2013-01-04 21:26:46 UTC
Permalink
Post by Stefan Richter
Post by Otto Meta
Post by Otto Meta
The single mutex for the sr module, introduced as a BKL replacement,
globally serialises all sr ioctls, which hurts multi-drive performance.
This patch replaces sr_mutex with per-device mutexes in struct scsi_cd,
allowing concurrent ioctls on different sr devices.
Unfortunately it wasn't as easy as that. The patch seems to introduce
a race condition that corrupts a drive's state under certain circumstances.
When two drives (e.g. sr0 and sr1) are attached to the same IDE cable, one
drive has its door locked, which will usually be the case after any operation
on the drive with inserted media (and whenever it feels like it, even with
dev.cdrom.lock=0), and the other drive is unlocked, then executing
$ eject sr0 & eject sr1
will eject the unlocked drive and the locked drive will return
eject: unable to eject, last error: Inappropriate ioctl for device
Other drivers down the road probably don't expect concurrent ioctls, so this
patch cannot be applied safely at this time. Sorry about the noise.
For the record: Tested with kernels 3.2.35 and 3.8.0-rc1, using IDE CD/DVD
drives connected via the drivers ata_piix and pata_pdc202xx_old.
As yo may have seen in the mailinglist archive, when Wakko and I tested
with sr_mutex removed without any replacement, we were not able to trigger
any race condition. However, we certainly did not attempt this very
particular test (two drives on the same PATA cable, one locked and one
unlocked, and "eject" called on both of them at the same time). I wonder
if this is a PATA idiosyncrasy.
I will see whether I can do some tests tomorrow. I can easily test master
and slave PATA drives on a single cable behind a PATA-to-1394 bridge; but
testing two drives on a single cable behind a PATA-to-PCI controller would
be a bit more involved because the case of my PATA-equipped Linux PC is
rather cramped.
I myself have not tried this specific test. I do not have any systems with
2 PATA DVDroms on the same cable. The only system I have that has PATA, the
drives are on seperate cables with nothing else on the cable. My other
system is all SATA. I have no other systems with more than 1 CD/DVD drive.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Otto Meta
2013-01-04 23:05:13 UTC
Permalink
Post by Stefan Richter
As yo may have seen in the mailinglist archive, when Wakko and I tested
with sr_mutex removed without any replacement, we were not able to trigger
any race condition. However, we certainly did not attempt this very
particular test (two drives on the same PATA cable, one locked and one
unlocked, and "eject" called on both of them at the same time). I wonder
if this is a PATA idiosyncrasy.
Reading from two devices connected to different PATA cables or controllers
via ioctl is working quite fine with the patch, so that seems to be in line
with your and Wakko's findings.

Running the eject test on the same device combination seems to be mostly
fine. I'm saying "mostly" because sometimes drives stay locked when I expect
them to be unlocked, but I haven't been able to reproduce that reliably.
A few times drives even stayed locked with the tray ejected(!), i.e. I
couldn't close the drive via its eject button and had to run eject -t.
As dev.cdrom.lock=0 seems to be ignored even without the patch, I'm not sure
whether the locking issue is caused or just amplified by the patch.

Another note regarding the eject test with two devices connected to the same
PATA cable: If both drives are unlocked, they eject fine, but sequentially.
Something seems to be trying to prevent interference between the two drives,
but it's not working properly once unlocking is involved.
--
Otto Meta
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...