Discussion:
list_for_each_entry_safe() regarded as unsafe
Alan Stern
2005-06-09 16:27:02 UTC
Permalink
Mike and whoever else may be interested:

The scsi_forget_host() and __scsi_remove_target() routines (in scsi_scan.c
and scsi_sysfs.c) contain these lines respectively:

list_for_each_entry_safe(starget, tmp, &shost->__targets, siblings) {

list_for_each_entry_safe(sdev, tmp, &shost->__devices, siblings) {

Neither loop is truly safe because they release shost->host_lock to do the
actual removals. I've just seen a couple of different oopses caused when
__scsi_remove_target() was called during scanning. Details available if
you want them.

I don't know what the best way is fix this. Even if scsi_forget_host()
acquired the host's scan_mutex, that wouldn't be enough to guarantee the
__targets and __devices lists won't change, would it? And it might cause
interference with other pathways.

Maybe it's best simply to avoid using list_for_each_entry_safe, as in
the example below:

Alan Stern


Index: usb-2.6/drivers/scsi/scsi_sysfs.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_sysfs.c
+++ usb-2.6/drivers/scsi/scsi_sysfs.c
@@ -653,17 +653,19 @@ void __scsi_remove_target(struct scsi_ta
{
struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
unsigned long flags;
- struct scsi_device *sdev, *tmp;
+ struct scsi_device *sdev;

spin_lock_irqsave(shost->host_lock, flags);
starget->reap_ref++;
- list_for_each_entry_safe(sdev, tmp, &shost->__devices, siblings) {
+restart:
+ list_for_each_entry(sdev, &shost->__devices, siblings) {
if (sdev->channel != starget->channel ||
sdev->id != starget->id)
continue;
spin_unlock_irqrestore(shost->host_lock, flags);
scsi_remove_device(sdev);
spin_lock_irqsave(shost->host_lock, flags);
+ goto restart;
}
spin_unlock_irqrestore(shost->host_lock, flags);
scsi_target_reap(starget);


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Mike Anderson
2005-06-09 21:59:15 UTC
Permalink
Post by Alan Stern
The scsi_forget_host() and __scsi_remove_target() routines (in scsi_scan.c
list_for_each_entry_safe(starget, tmp, &shost->__targets, siblings) {
list_for_each_entry_safe(sdev, tmp, &shost->__devices, siblings) {
Neither loop is truly safe because they release shost->host_lock to do the
actual removals. I've just seen a couple of different oopses caused when
__scsi_remove_target() was called during scanning. Details available if
you want them.
Well we need a updated scsi_host state model that would prevent scanning
while we are removing the host. I would believe that if the oopses in
__scsi_remove_target where prevent there maybe some other oopses showing
up as the host started going away.
Post by Alan Stern
I don't know what the best way is fix this. Even if scsi_forget_host()
acquired the host's scan_mutex, that wouldn't be enough to guarantee the
__targets and __devices lists won't change, would it? And it might cause
interference with other pathways.
Yes if scsi_forget_host acquired the scan_mutex it would deadlock when
scsi_remove_device acquired it later on in the call stack.
Post by Alan Stern
Maybe it's best simply to avoid using list_for_each_entry_safe, as in
.. snip ..
+ list_for_each_entry(sdev, &shost->__devices, siblings) {
if (sdev->channel != starget->channel ||
sdev->id != starget->id)
continue;
spin_unlock_irqrestore(shost->host_lock, flags);
scsi_remove_device(sdev);
spin_lock_irqsave(shost->host_lock, flags);
+ goto restart;
}
spin_unlock_irqrestore(shost->host_lock, flags);
scsi_target_reap(starget);
Since we are not guaranteed that scsi_remove_device will remove the device
off the list (i.e. the release may not be called if unexpected disconnect)
you may get stuck on the same device for a bit.

-andmike
--
Michael Anderson
***@us.ibm.com

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Alan Stern
2005-06-09 23:19:13 UTC
Permalink
Post by Mike Anderson
Well we need a updated scsi_host state model that would prevent scanning
while we are removing the host. I would believe that if the oopses in
__scsi_remove_target where prevent there maybe some other oopses showing
up as the host started going away.
More than that is needed -- you have to guarantee that two threads won't
try to add or remove a target or device to the same host at the same time.
Post by Mike Anderson
Post by Alan Stern
I don't know what the best way is fix this. Even if scsi_forget_host()
acquired the host's scan_mutex, that wouldn't be enough to guarantee the
__targets and __devices lists won't change, would it? And it might cause
interference with other pathways.
Yes if scsi_forget_host acquired the scan_mutex it would deadlock when
scsi_remove_device acquired it later on in the call stack.
How about not acquiring the scan_mutex in scsi_remove_device, and
insisting that the caller hold it instead? There aren't that many places
where it gets called. In fact, one of those places (an error pathway in
scsi_sysfs_add_sdev) looks like it already will cause a deadlock.

Then it would be necessary also to have scanning threads check whether the
host is in the process of removal. This means that scsi_forget_host will
have to change the host state somehow. What do you think would be the
best to mark a host being removed?

On the plus side, neither forget_host nor remove_target would need to
acquire the host_lock, because holding the scan_mutex would already
guarantee the necessary exclusion.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Brian King
2005-06-10 13:39:58 UTC
Permalink
Post by Alan Stern
Post by Mike Anderson
Well we need a updated scsi_host state model that would prevent scanning
while we are removing the host. I would believe that if the oopses in
__scsi_remove_target where prevent there maybe some other oopses showing
up as the host started going away.
More than that is needed -- you have to guarantee that two threads won't
try to add or remove a target or device to the same host at the same time.
Post by Mike Anderson
Post by Alan Stern
I don't know what the best way is fix this. Even if scsi_forget_host()
acquired the host's scan_mutex, that wouldn't be enough to guarantee the
__targets and __devices lists won't change, would it? And it might cause
interference with other pathways.
Yes if scsi_forget_host acquired the scan_mutex it would deadlock when
scsi_remove_device acquired it later on in the call stack.
How about not acquiring the scan_mutex in scsi_remove_device, and
insisting that the caller hold it instead? There aren't that many places
where it gets called. In fact, one of those places (an error pathway in
scsi_sysfs_add_sdev) looks like it already will cause a deadlock.
scsi_remove_device is an exported symbol, so requiring the caller to obtain
the scan_mutex prior to calling it would not work. A __scsi_remove_device
could be created, however, which would not grab the scan_mutex so that scsi
core could do the right thing.
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Alan Stern
2005-06-10 15:26:11 UTC
Permalink
Post by Brian King
Post by Alan Stern
How about not acquiring the scan_mutex in scsi_remove_device, and
insisting that the caller hold it instead? There aren't that many places
where it gets called. In fact, one of those places (an error pathway in
scsi_sysfs_add_sdev) looks like it already will cause a deadlock.
scsi_remove_device is an exported symbol, so requiring the caller to obtain
the scan_mutex prior to calling it would not work. A __scsi_remove_device
could be created, however, which would not grab the scan_mutex so that scsi
core could do the right thing.
Okay.

How should a host be marked to indicate it's being removed? Add another
bit to shost_state?

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...