[Bug 81861] New: mvsas.ko v0.8.16 error messages and kernel crashes attaching 4 SATA drives to specific HP SAS expander ports

Discussion:

b***@bugzilla.kernel.org

2014-08-07 17:33:26 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=3D81861

Bug ID: 81861
Summary: mvsas.ko v0.8.16 error messages and kernel crashes
attaching 4 SATA drives to specific HP SAS expander
ports
Product: SCSI Drivers
Version: 2.5
Kernel Version: 3.16.0-031600rc6
Hardware: x86-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: blocking
Priority: P1
Component: Other
Assignee: scsi_drivers-***@kernel-bugs.osdl.org
Reporter: linux-***@crashplan.pro
Regression: No

The issues are (1) error messages and (2) kernel crashes when attaching=
4
drives (1 SFF SAS cable) to specific ports of a SAS expander.

The issue is only tested with HP SAS port expander (PMC Sierra PM8005 c=
hip)
running firmware 2.08. This expander has 36/4=3D9 SAS ports.
1 port of type SFF-8088, labelled 1C on the PCB.
8 port of type SFF-8087, labelled 2C till 9C on the PCB.
Port =E2=80=9C1C=E2=80=9D is connected to a Supermicro SAS2LP-MV8, Marv=
ell 88SE9485 based chip,
lspci output is inserted below.

The issue is not always identical. When attaching the 4 drives to diffe=
rent
port numbers on the port multiplier, this is what happens in this order=
:
2C, 3C, 4C =3D ok
5C =3D error
6C, 7C, 8C =3D kernel crash
9C =3D error

After that first run from port 2 till 9, the issue seems more random:
9C =3D kernel crash
4C =3D kernel crash
3C =3D error
9C =3D error
7C =3D kernel crash
3C =3D error
2C =3D ok
4C =3D kernel crash

The =E2=80=9Cerror message=E2=80=9D on ports 5C and 9C is:
scsi 5:0:4:0: Failed to get diagnostic page 0x8000002
scsi 5:0:4:0: Failed to bind enclosure -19

=3D=3D=3D=3D
Most testing is done with Ubuntu 14.04.1 running Ubuntu=E2=80=99s suppl=
ied mainline
kernel 3.16.0-rc6.=20
# modprobe -v mvsas
insmod
/lib/modules/3.16.0-031600rc6-generic/kernel/drivers/scsi/scsi_transpor=
t_sas.ko
insmod
/lib/modules/3.16.0-031600rc6-generic/kernel/drivers/scsi/libsas/libsas=
=2Eko
insmod /lib/modules/3.16.0-031600rc6-generic/kernel/drivers/scsi/mvsas/=
mvsas.ko
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Other tested kernels, with similar results
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
kernel Mainline 3.16-20140724
kernel Ubuntu 3.13.11
kernel Ubuntu 3.13.0-24
kernel Ubuntu 3.12.25
kernel Ubuntu 2.6.32 =3D no SAS expander detected -> no further testing
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
No drives attached to expander
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
# lsscsi
[4:0:0:0] disk ATA OCZ-VERTEX 1.3 /dev/sda
[5:0:0:0] enclosu HP HP SAS EXP Card 2.08 -
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
With 4 drives (brown#4) attached to expander port 2C
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
# lsscsi
[4:0:0:0] disk ATA OCZ-VERTEX 1.3 /dev/sda
[6:0:0:0] disk ATA Hitachi HDS5C302 AAB0 /dev/sdb
[6:0:1:0] disk ATA Hitachi HDS5C302 AAB0 /dev/sdc
[6:0:2:0] disk ATA Hitachi HDS5C302 AAB0 /dev/sdd
[6:0:3:0] disk ATA Hitachi HDS5C302 AAB0 /dev/sde
[6:0:4:0] enclosu HP HP SAS EXP Card 2.08 -
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
With 4 drives (brown#4) attached to expander port 3C
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
# lsscsi
[4:0:0:0] disk ATA OCZ-VERTEX 1.3 /dev/sda
[6:0:4:0] enclosu HP HP SAS EXP Card 2.08 -
[6:0:5:0] disk ATA Hitachi HDS5C302 AAB0 /dev/sdb
[6:0:6:0] disk ATA Hitachi HDS5C302 AAB0 /dev/sdc
[6:0:7:0] disk ATA Hitachi HDS5C302 AAB0 /dev/sdd
[6:0:8:0] disk ATA Hitachi HDS5C302 AAB0 /dev/sde
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
With 4 drives (brown#4) attached to expander port 4C
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
# lsscsi
[4:0:0:0] disk ATA OCZ-VERTEX 1.3 /dev/sda
[6:0:4:0] enclosu HP HP SAS EXP Card 2.08 -
[6:0:9:0] disk ATA Hitachi HDS5C302 AAB0 /dev/sdb
[6:0:10:0] disk ATA Hitachi HDS5C302 AAB0 /dev/sdc
[6:0:11:0] disk ATA Hitachi HDS5C302 AAB0 /dev/sdd
[6:0:12:0] disk ATA Hitachi HDS5C302 AAB0 /dev/sde
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
With 4 drives (brown#4) attached to expander port 5C
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
scsi 5:0:4:0: Failed to get diagnostic page 0x8000002
scsi 5:0:4:0: Failed to bind enclosure -19
# lsscsi
[4:0:0:0] disk ATA OCZ-VERTEX 1.3 /dev/sda
[5:0:4:0] enclosu HP HP SAS EXP Card 2.08 -
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
With 4 drives (brown#4) attached to expander port 6C
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Kernel crash (data from OCR-ed screenshot):
[ 263.190030] R13 ffff88020e837808 R14: ffff88021b4a0080 R15: ffff88003=
6cll200
[ 269.130052] FS: 00007f9ef5abb740(0000) GS:ffff88021b200000(0000)
knlGS:0000000000000000 =20
[ 269.190074] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 269.190091] CR2 00007f9ef5ac2000 CR3: 000000020fbd8000 CR4: 000000000=
00407f0
[ 269.190111] Stack: =20
[ 269.190118] 0000000000000000 0000000000000002 ffff88021f5f7f08
dead000000200200 =20
[ 269.190145] ffff38020dl037b0 0000000000000046 ffff88020eb81e38
ffffffff811b06ae =20
[ 269.190171] ffff38020e837798 ffff88020d69bl40 ffff88020dl037b0
ffff88020dl00000 =20
[ 269.190197] Call Trace:
[ 269.190210] [<fffffff811b06ae>] ? dma_pool_alloc+0xce/0x100
[ 269.190229] [<fffffffc06e44ab>] mvs_task_prep+0x58b/0x620 [mvsas]
[ 269.190248] [<fffffffc06e45a8>] mvs_task_exec.isra.14+0x68/0xf0 [mvsa=
s]
[ 269.190269] [<fffffffc06e5149>] mvs_queue_command+0x39/0x40 [mvsas]
[ 269.190291] [<fffffffc06d48ab>] sas_ata_qc_issue+0x28b/0x2d0 [libsas]
[ 269.190312] [<fffffff8153102f>] ata_qc_issue+0xl8f/0x2d0
[ 269.190331] [<fffffff81537dc0>] ? ata_scsi_rw_xlat+0x230/0x230
[ 269.190349] [<fffffff81535fe4>] ata_scsi_translate+0xb4/0x1b0
[ 269.190369] [<fffffff81539aal>] ata_sas_queuecmd+0xl21/0x2b0
[ 269.190389] [<fffffffc06d387f>] sas_queuecommand+0x20f/0x280 [libsas]
[ 269.190409] [<fffffff8150d6ce>] scsi_dispatch_cmd+0xce/0x280
[ 269.190428] [<fffffff81515dd2>] scsi_request_fn+0x372/0x490
[ 269.190447] [<fffffff813541c7>] __blk_run_queue+0x37/0x50
[ 269.190465] [<fffffff8135305f>] __elv_add_request+0xef/0x310
[ 269.190483] [<fffffff8135el23>] blk_execute_rq_noujait+0xb3/0x190
[ 269.190504] [<fffffff811c2653>] ? kmem_cache_alloc_node+0xle3/0x200
[ 269.190523] [<fffffff8135e28d>] blk_execute_rq+0x8d/0x160
[ 269.190542] [<fffffff812f8bf8>] ? security_capable+0x18/0x20
[ 269.190561] [<fffffff81079el0>] ? ns_capable+0x30/0x60
[ 269.190578] [<fffffff81079ed7>] ? capable+0x17/0x20
[ 269.191191] [<fffffff81369b85>] ? blk_verify_command+0x25/0x70
[ 269.191806] [<fffffff8136ald8>] sg_io+0x168/0x2c0
[ 269.192422] [<fffffff8136a557>] scsi_cmd_ioct1+0x227/0x520
[ 269.193030] [<fffffff81198bfb>] ? __handle_mm_fault+0x1db/0x360
[ 269.193631] [<fffffff8136a89e>] scsi_cmd_blk_ioctl+0x4e/0x60
[ 269.194231] [<fffffff81520ab7>] sd_ioctl+0xd7/0xl60
[ 269.194810] [<fffffff81366b9e>] blkdev_ioctl+0xde/0x810
[ 269.195373] [<fffffff810a8ead>] ? vtime_account_user+0x5d/0x70
[ 269.195921] [<fffffff812152d0>] block_ioct1+0x40/0x50
[ 269.196449] [<fffffffSllf1805>] do_vfs_ioct1+0x75/0x2c0
[ 269.196966] [<fffffff810247b5>] ? syscall_trace_enter+0x165/0x280
[ 269.197475] [<fffffff81168835>] ? context_tracking_user_enter+0x25/0x=
30
[ 269.197972] [<fffffff811flael>] SyS_ioctl+0x91/0xb0
[ 269.198458] [<fffffff817913bf>] tracesys+0xe1/0xe6
[ 269.198930] Code: 00 00 48 8b 0c c8 0f 84 a7 02 00 00 44 89 c0 41 b9=
00 10
00 00 48 8d 34 80 48 3d 04 70 48 3d b4 c3 b3 55 02
00 8b 43 58 89 46 lc <8b> 89 54 02 00 00 44 89 C0 8b 7b 58 0d 00 00 00 =
70 4c 8b
53 48
[ 269.200019] RIP [<ffffffffc06e35a0>] mvs_task_prep_ata+0x80/0x3a0 [mv=
sas] =20
[ 269.200534] RSP <ffff88020e837738> =20
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
With 4 drives (brown#4) attached to expander port 7C
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Kernel crash (from OCR-ed screenshot):
[ 38.934484] OS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 38.934501] CR2: 0000000000000254 CR3: 0000000001C12000 CR4: 000000000=
00407e0
[ 38.934522] Stack:
[ 38.934529] ffff88021b214400 ffff880200000000 0000000000000282
0000000000000000
[ 38.934556] ffff8300d4c03618 0000000000000046 ffff8300d5b01e38
ffffffff811b06ae
[ 38.934582] ffff88021b214400 ffff88020d65el40 ffff8800d4c03618
ffff8800d4c00000
[ 38.934608] Call Trace:
[ 38.934619] [<ffffffff811b06ae>] ? dma_pool_alloc+0xce/0xl00
[ 38.934638] [<ffffffffc03c04ab>] mvs_task_prep+0x58b/0x620 [mvsas]
[ 38.934659] [<ffffffff810a29e6>] ? ttwu_do_activate.constprop.111+0x66=
/0x70
[ 38.934682] [<ffffffffc03c05a8>] mvs_task_exec.isra.14+0x68/0xf0 [mvsa=
s]
[ 38.934703] [<ffffffffc03cll49>] mvs_queue_command+0x39/0x40 [mvsas]
[ 38.934725] [<ffffffffc03a88ab>] sas_ata_qc_issue+0x28b/0x2d0 [libsas]
[ 38.934747] [<ffffffff8153102f>] ata_qc_issue+0xl8f/0x2d0
[ 38.934764] [<ffffffff81531468>] ata_exec_internal_sg+0x2f8/0x5d0
[ 38.934783] [<ffffffff315317b2>] ata_exec_internal+0x72/0xb0
[ 38.934802] [<ffffffff8153Ifaa>] ata_do_dev_read_id+0x2a/0x30
[ 38.934821] [<ffffffffc03a84b0>] ? sas_ata_internal_abort+0xl20/0xl20 =
[libsas]
[ 38.934843] [<ffffffff81532If5>] ata_dev_read_id+0x245/0x460
[ 38.934861] [<ffffffff3153e99c>] ? ata_eh_reset+0x24c/0xe20
[ 38.934878] [<ffffffff8153d8f8>] ata_eh_revalidate_and_attach+0xl98/0x=
3a0
[ 38.934899] [<ffffffff8153fd69>] ata_eh_recover+0x599/0x7e0
[ 38.934917] [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0
[ 38.934937] [<ffffffffc03a84b0>] ? sas_ata_internal_abort+0xl20/0xl20 =
[libsas]
[ 38.934959] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[ 38.934978] [<ffffffffc03a84b0>] ? sas_ata_internal_abort+0xl20/0xl20 =
[libsas]
[ 38.935618] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[ 38.936257] [<ffffffffc03a84b0>] ? sas_ata_internal_abort+0xl20/0xl20 =
[libsas]
[ 38.936905] [<ffffffff31540742>] ata_do_eh+0x52/0xc0
[ 38.937538] [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0
[ 38.938163] [<ffffffff815407f7>] ata_std_error_handler+0x47/0x80
[ 38.938783] [<ffffffff8153b8f8>] ? ata_eh_handle_port_resume+0x38/0xl6=
0
[ 38.939410] [<ffffffff8154041b>] ata_scsi_port_error_handler+0x39b/0x5=
a0
[ 38.940024] [<ffffffffc03a82c5>] async_sas_ata_eh+0x55/0x90 [libsas]
[ 38.940621] [<ffffffff8109a89b>] async_run_entry_fn+0x3b/0xl40
[ 38.941201] [<ffffffff8108c6ff>] process_one_work+0xl7f/0x4c0
[ 38.941767] [<ffffffff8108d46b>] worker_thread+0xllb/0x3f0
[ 38.942320] [<ffffffff8108d350>] ? create_and_start_worker+0x80/0x80
[ 38.942864] [<ffffffff81094479>] kthread+0xc9/0xe0
[ 38.943398] [<ffffffff810943b0>] ? flush_kthread_worker+0xb0/0xb0
[ 38.943927] [<ffffffff817910fc>] ret_from_fork+0x7c/0xb0
[ 38.944443] [<ffffffff810943b0>] ? flush_kthread_worker+0xb0/0xb0
[ 38.944956] Code: 00 00 48 8b 0c c8 0f 84 a7 02 00 00 44 89 c0 41 b9 0=
0 10 00
00 48 8d 34 80 48 8d 04 70 48 8d b4 c3 b8 55 02
00 8b 43 58 89 46 lc <8b> 89 54 02 00 00 44 89 c0 8b 7b 58 Od 00 00 00 =
70 4c 8b
53 48
[ 38.946132] RIP [<ffffffffc03bf5a0>] mvs_task_prep_ata+0x80/0x3a0 [mvs=
as]
38.9466881 RSP <ffff88020d7bb7c8>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
With 4 drives (brown#4) attached to expander port 8C
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Kernel crash (text from OCR-ed screenshot):
[ 335.117520] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 335.117537] CR2: 00007fff5S6452C0 CR3: 0000000001C12000 CR4: 00000000=
000407e0
[ 335.117557] Stack:
[ 335.117565] ffff88021b214400 ffff880200000000 0000000000000282
74737572745f7374
[ 335.117591] ffff8800d5b03618 0000000000000046 ffff88020f301e38
ffffffff311b06ae
[ 335.117617] ffff8802lb214400 ffff8800d4bda280 ffff8800d5b03618
ffff8800d5b00000
[ 335.117644] Call Trace:
[ 335.117656] [<ffffffff811b06ae>] ? dma_pool_alloc+0xce/0x100
[ 335.117676] [<ffffffffc03fb4ab>] mvs_task_prep+0x58b/0x620 [mvsas]
[ 335.117697] [<ffffffff310a29e6>] ? ttwu_do_activate.constprop.111+0x6=
6/0x70
[ 335.117720] [<ffffffffc03fb5a8>] mvs_task_exec.isra.14+0x68/0xf0 [mvs=
as]
[ 335.117741] [<ffffffffc03fcl49>] mvs_queue_command+0x39/0x40 [mvsas]
[ 335.117764] [<ffffffffc03e38ab>] sas_ata_qc_issue+0x28b/0x2d0 [libsas=
]
[ 335.117786] [<ffffffff8153102f>] ata_qc_issue+0xl8f/0x2d0
[ 335.117804] [<ffffffff81531468>] ata_exec_internal_sg+0x2f8/0x5d0
[ 335.117823] [<ffffffff815317b2>] ata_exec_internal+0x72/0xb0
[ 335.117842] [<ffffffff8153Ifaa>] ata_do_dev_read_id+0x2a/0x30
[ 335.117861] [<ffffffffc03e34b0>] ? sas_ata_internal_abort+0xl20/0xl20
[libsas]
[ 335.117883] [<ffffffff81532If5>] ata_dev_read_id+0x245/0x460
[ 335.117901] [<ffffffff8153e99c>] ? ata_eh_reset+0x24c/0xe20
[ 335.117919] [<ffffffff8153d8f8>] ata_eh_revalidate_and_attach+0xl98/0=
x3a0
[ 335.117940] [<ffffffff8153fd69>] ata_eh_recover+0x599/0x7e0
[ 335.117959] [<ffffffff31534200>] ? sata_print_link_status+0xc0/0xc0
[ 335.117979] [<ffffffffc03e34b0>] ? sas_ata_internal_abort+0xl20/0xl20
[libsas]
[ 335.118001] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[ 335.118019] [<ffffffffc03e34b0>] ? sas_ata_internal_abort+0xl20/0xl20
[libsas]
[ 335.118041] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[ 335.118061] [<ffffffffc03e34b0>] ? sas_ata_internal_abort+0xl20/0xl20
[libsas]
[ 335.118083] [<ffffffff81540742>] ata_do_eh+0x52/0xc0
[ 335.118709] [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0
[ 335.119338] [<ffffffff815407f7>] ata_std_error_handler+0x47/0x80
[ 335.119970] [<ffffffff8153b8f8>] ? ata_eh_handle_port_resume+0x38/0xl=
60
[ 335.120600] [<ffffffff8154041b>] ata_scsi_port_error_handler+0x39b/0x=
5a0
[ 335.121215] [<ffffffffc03e32c5>] async_sas_ata_eh+0x55/0x90 [libsas]
[ 335.121812] [<ffffffff8109a89b>] async_run_entry_fn+0x3b/0xl40
[ 335.122394] [<ffffffff8108c6ff>] process_one_work+0xl7f/0x4c0
[ 335.122963] [<ffffffff81776ba3>] ? maybe_create_worker+0xbb/0xlc5
[ 335.123520] [<ffffffff8108d46b>] worker_thread+0xllb/0x3f0
[ 335.124064] [<ffffffff8108d350>] ? create_and_start_worker+0x80/0x80
[ 335.124605] [<ffffffff81094479>] kthread+0xc9/0xe0
[ 335.125133] [<ffffffff810943b0>] ? flush_kthread_worker+0xb0/0xb0
[ 335.125654] [<ffffffff8179l0fc>] ret_from_fork+0x7c/0xb0
[ 335.126169] [<ffffffff310943b0>] ? flush_kthread_worker+0xb0/0xb0
[ 335.126685] Code: 00 00 48 8b 0c C8 0f 84 a7 02 00 00 44 89 C0 41 b9 =
00 10 00
00 48 8d 34 80 48 8d 04 70 48 8d b4 c3 b8 55 02
00 8b 43 58 89 46 lc <8b> 89 54 02 00 00 44 89 C0 8b 7b 58 0d 00 00 00 =
70 4c 8b
53 48
[ 335.127858] RIP [<ffffffffc03fa5a0>] mvs_task_prep_ata+0x80/0x3a0 [mv=
sas]
[ 335.128415] RSP <ffff8800d60237c8>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
With 4 drives (brown#4) attached to expander port 9C
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
scsi 5:0:4:0: Failed to get diagnostic page 0x8000002
scsi 5:0:4:0: Failed to bind enclosure -19
# lsscsi
[4:0:0:0] disk ATA OCZ-VERTEX 1.3 /dev/sda
[5:0:4:0] enclosu HP HP SAS EXP Card 2.08 -
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
With 4 drives (brown#4) attached to expander port 9C [a second time],
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Kernel crash (text from screen OCR):
[ 35.957789] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 35.957806] CR2: 00007f6c3faf8000 CR3: 0000000001c12000 CR4: 000000000=
00407f0
[ 35.957826] Stack:
[ 35.957833] ffff88021b314400 ffff880200000000 0000000000000282
eb3377d73948ca01
[ 35.957860] ffff88020ed037b0 0000000000000046 ffff88020ec01e38
ffffffff811b06ae
[ 35.957885] ffff88021b314400 ffff88020d66ddc0 ffff88020ed037b0
ffff88020ed00000
[ 35.957912] Call Trace:
[ 35.957924] [<ffffffff811b06ae>] ? dma_pool_alloc+0xce/0x100
[ 35.957944] [<ffffffffc05dl4ab>] mvs_task_prep+0x58b/0x620 [mvsas]
[ 35.957965] [<ffffffff810a29e6>] ? ttwu_do_activate.constprop.111+0x66=
/0x70
[ 35.957987] [<ffffffffc05dl5a8>] mvs_task_exec.isra.14+0x68/0xf0 [mvsa=
s]
[ 35.958008] [<ffffffffc05d2149>] mvs_queue_command+0x39/0x40 [mvsas]
[ 35.958030] [<ffffffffc05b98ab>] sas_ata_qc_issue+0x28b/0x2d0 [libsas]
[ 35.958052] [<ffffffff8153102f>] ata_qc_issue+0xl8f/0x2d0
[ 35.958069] [<ffffffff81531468>] ata_exec_internal_sg+0x2f8/0x5d0
[ 35.958089] [<ffffffff815317b2>] ata_exec_internal+0x72/0xb0
[ 35.958107] [<ffffffff81531faa>] ata_do_dev_read_id+0x2a/0x30
[ 35.958126] [<ffffffffc05b94b0>] ? sas_ata_internal_abort+0xl20/0xl20 =
[libsas]
[ 35.958148] [<ffffffff81532If5>] ata_dev_read_id+0x245/0x460
[ 35.958166] [<ffffffff8153e99c>] ? ata_eh_reset+0x24c/0xe20
[ 35.958185] [<ffffffff8153d8f8>] ata_eh_revalidate_and_attach+0xl98/0x=
3a0
[ 35.958205] [<ffffffff8153fd69>] ata_eh_recover+0x599/0x7e0
[ 35.958223] [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0
[ 35.958243] [<ffffffffc05b94b0>] ? sas_ata_internal_abort+0xl20/0xl20 =
[libsas]
[ 35.958265] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[ 35.958283] [<ffffffffc05b94b0>] ? sas_ata_internal_abort+0xl20/0xl20 =
[libsas]
[ 35.958305] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[ 35.958324] [<ffffffffc05b94b0>] ? sas_ata_internal_abort+0xl20/0xl20 =
[libsas]
[ 35.958346] [<ffffffff81540742>] ata_do_eh+0x52/0xc0
[ 35.958971] [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0
[ 35.959600] [<ffffffff815407f7>] ata_std_error_handler+0x47/0x80
[ 35.960231] [<ffffffff8153b8f8>] ? ata_eh_handle_port_resume+0x38/0xl6=
0
[ 35.960861] [<ffffffff8154041b>] ata_scsi_port_error_handler+0x39b/0x5=
a0
[ 35.961475] [<ffffffffc05b92c5>] async_sas_ata_eh+0x55/0x90 [libsas]
[ 35.962071] [<ffffffff8109a89b>] async_run_entry_fn+0x3b/0xl40
[ 35.962652] [<ffffffff8108c6ff>] process_one_work+0xl7f/0x4c0
[ 35.963218] [<ffffffff81776ba8>] ? maybe_create_worker+0xbb/0xlc5
[ 35.963775] [<ffffffff8108d46b>] worker_thread+0x11b/0x3f0
[ 35.964319] [<ffffffff8108d350>] ? create_and_start_worker+0x80/0x80
[ 35.964858] [<ffffffff81094479>] kthread+0xc9/0xe0
[ 35.965385] [<ffffffff810943b0>] ? flush_kthread_worker+0xb0/0xb0
[ 35.965904] [<ffffffff8179l0fc>] ret_from_fork+0x7c/0xb0
[ 35.966418] [<ffffffff810943b0>] ? fIush_kthread_worker+0xb0/0xb0
[ 35.966932] Code: 00 00 48 8b 0c c8 0f 84 a7 02 00 00 44 89 C0 41 b9 0=
0 10 00
00 48 8d 34 80 48 8d 04 70 48 8d b4 c3 b8 55 02
00 8b 43 58 89 46 lc <8b> 89 54 02 00 00 44 89 C0 8b 7b 58 0d 00 00 00 =
70 4c 8b
53 48
[ 35.968100] RIP [<ffffffffc05d05a0>] mvs_task_prep_ata+0x80/0x3a0 [mvs=
as]
[ 35.968656] RSP <ffff8800d4b077c8>
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
# lspci -nn -s 01: -vv
01:00.0 RAID bus controller [0104]: Marvell Technology Group Ltd. 88SE9=
485
SAS/SATA 6Gb/s controller [1b4b:9485] (rev 03)
Subsystem: Marvell Technology Group Ltd. Device [1b4b:9480]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=3Dfast >TAbort- <T=
Abort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f0540000 (64-bit, non-prefetchable) [size=3D128=
K]
Region 2: Memory at f0500000 (64-bit, non-prefetchable) [size=3D256=
K]
Expansion ROM at f0560000 [disabled] [size=3D64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2- AuxCurrent=3D375mA
PME(D0+,D1+,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=3D0 DScale=3D0 PME-
Capabilities: [50] MSI: Enable- Count=3D1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <1us=
, L1
<8us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupp=
orted-
RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- Tra=
nsPend-
LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Exit La=
tency
L0s <512ns, L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActi=
ve-
BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, =
OBFF Not
Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, O=
BFF
Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedComp=
liance-
ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-=
,
EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualization=
Request-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- Rx=
OF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- Rx=
OF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- R=
xOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalEr=
r+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalEr=
r+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ Chk=
En-
Capabilities: [140 v1] Virtual Channel
Caps: LPEVC=3D0 RefClk=3D100ns PATEntryBits=3D1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=3DFixed
Status: InProgress-
VC0: Caps: PATOffset=3D00 MaxTimeSlots=3D1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=3D0 ArbSelect=3DFixed TC/VC=3Dff
Status: NegoPending- InProgress-
Kernel driver in use: mvsas
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

A Highpoint Rocket 2720SGL controller (also a Marvell 9485 based chip a=
s far as
I know) ran with identical SAS expander and disk drives and power suppl=
y
without errors/crashes using the Highpoint 4.0.0.1528N driver (mv94xx.k=
o) on
Debian 6.0.6/kernel 2.6.32-46.

--=20
You are receiving this mail because:
You are watching the assignee of the bug.--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" i=
n
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

b***@bugzilla.kernel.org

2014-08-07 20:29:52 UTC

Permalink

--
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

b***@bugzilla.kernel.org

2014-08-08 08:19:37 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #1 from linux-***@crashplan.pro ---
After setting up netconsole using <https://wiki.ubuntu.com/Kernel/Netconsole>,
and enabling kernel boot parameters debug and ignore_loglevel there is are more
kernel crash log lines available:
============
[ 77.094783] mvsas 0000:01:00.0: mvsas: driver version 0.8.16
[ 77.095405] mvsas 0000:01:00.0: mvsas: PCI-E x8, Bandwidth Usage: 5.0 Gbps
[ 83.881049] scsi5 : mvsas
[ 83.883157] sas: phy-5:4 added to port-5:0, phy_mask:0x1 (50014380182cf0e6)
[ 83.883190] /home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1218:set wide
port phy map 1
[ 83.893532] sas: phy1 matched wide port0
[ 83.893558] sas: phy-5:5 added to port-5:0, phy_mask:0x3 (50014380182cf0e6)
[ 83.893580] /home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1218:set wide
port phy map 3
[ 83.913447] sas: phy2 matched wide port0
[ 83.913468] sas: phy-5:6 added to port-5:0, phy_mask:0x7 (50014380182cf0e6)
[ 83.913491] /home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1218:set wide
port phy map 7
[ 83.943257] sas: phy3 matched wide port0
[ 83.943274] sas: phy-5:7 added to port-5:0, phy_mask:0xf (50014380182cf0e6)
[ 83.943294] /home/apw/COD/linux/drivers/scsi/mvsas/mv_sas.c 1218:set wide
port phy map f
[ 83.982994] sas: DOING DISCOVERY on port 0, pid:6
[ 83.984660] sas: ex 50014380182cf0e6 phy00:D:0 attached: 0000000000000000
(no device)
[ 83.985256] sas: ex 50014380182cf0e6 phy01:D:0 attached: 0000000000000000
(no device)
[ 83.985851] sas: ex 50014380182cf0e6 phy02:D:0 attached: 0000000000000000
(no device)
[ 83.986372] sas: ex 50014380182cf0e6 phy03:D:0 attached: 0000000000000000
(no device)
[ 83.986933] sas: ex 50014380182cf0e6 phy04:D:0 attached: 0000000000000000
(no device)
[ 83.987488] sas: ex 50014380182cf0e6 phy05:D:0 attached: 0000000000000000
(no device)
[ 83.988086] sas: ex 50014380182cf0e6 phy06:D:0 attached: 0000000000000000
(no device)
[ 83.988603] sas: ex 50014380182cf0e6 phy07:D:0 attached: 0000000000000000
(no device)
[ 83.989197] sas: ex 50014380182cf0e6 phy08:D:0 attached: 0000000000000000
(no device)
[ 83.989766] sas: ex 50014380182cf0e6 phy09:D:0 attached: 0000000000000000
(no device)
[ 83.990300] sas: ex 50014380182cf0e6 phy10:D:0 attached: 0000000000000000
(no device)
[ 83.990872] sas: ex 50014380182cf0e6 phy11:D:0 attached: 0000000000000000
(no device)
[ 83.991401] sas: ex 50014380182cf0e6 phy12:D:0 attached: 0000000000000000
(no device)
[ 83.991978] sas: ex 50014380182cf0e6 phy13:D:0 attached: 0000000000000000
(no device)
[ 83.992515] sas: ex 50014380182cf0e6 phy14:D:0 attached: 0000000000000000
(no device)
[ 83.993098] sas: ex 50014380182cf0e6 phy15:D:0 attached: 0000000000000000
(no device)
[ 83.993625] sas: ex 50014380182cf0e6 phy16:D:0 attached: 0000000000000000
(no device)
[ 83.994213] sas: ex 50014380182cf0e6 phy17:D:0 attached: 0000000000000000
(no device)
[ 83.994785] sas: ex 50014380182cf0e6 phy18:D:0 attached: 0000000000000000
(no device)
[ 83.995316] sas: ex 50014380182cf0e6 phy19:D:0 attached: 0000000000000000
(no device)
[ 83.995890] sas: ex 50014380182cf0e6 phy20:D:0 attached: 0000000000000000
(no device)
[ 83.996432] sas: ex 50014380182cf0e6 phy21:D:0 attached: 0000000000000000
(no device)
[ 83.996998] sas: ex 50014380182cf0e6 phy22:D:0 attached: 0000000000000000
(no device)
[ 83.997540] sas: ex 50014380182cf0e6 phy23:D:0 attached: 0000000000000000
(no device)
[ 83.998189] sas: ex 50014380182cf0e6 phy24:U:A attached: 5005043011ab0000
(host)
[ 83.998812] sas: ex 50014380182cf0e6 phy25:U:A attached: 5005043011ab0000
(host)
[ 83.999386] sas: ex 50014380182cf0e6 phy26:U:A attached: 5005043011ab0000
(host)
[ 84.000012] sas: ex 50014380182cf0e6 phy27:U:A attached: 5005043011ab0000
(host)
[ 84.000575] sas: ex 50014380182cf0e6 phy28:S:0 attached: 0000000000000000
(no device)
[ 84.001581] sas: ex 50014380182cf0e6 phy29:S:0 attached: 0000000000000000
(no device)
[ 84.002561] sas: ex 50014380182cf0e6 phy30:S:0 attached: 0000000000000000
(no device)
[ 84.003550] sas: ex 50014380182cf0e6 phy31:S:0 attached: 0000000000000000
(no device)
[ 84.004573] sas: ex 50014380182cf0e6 phy32:S:9 attached: 50014380182cf0e0
(stp)
[ 84.005580] sas: ex 50014380182cf0e6 phy33:S:9 attached: 50014380182cf0e1
(stp)
[ 84.006543] sas: ex 50014380182cf0e6 phy34:S:9 attached: 50014380182cf0e2
(stp)
[ 84.007442] sas: ex 50014380182cf0e6 phy35:S:9 attached: 50014380182cf0e3
(stp)
[ 84.008136] sas: ex 50014380182cf0e6 phy36:D:A attached: 50014380182cf0e5
(host+target)
[ 84.009969] sas: DONE DISCOVERY on port 0, pid:6, result:0
[ 84.010274] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[ 84.010569] sas: ata6: end_device-5:0:32: dev error handler
[ 84.010873] sas: ata7: end_device-5:0:33: dev error handler
[ 84.011160] sas: ata8: end_device-5:0:34: dev error handler
[ 84.011424] sas: ata9: end_device-5:0:35: dev error handler
[ 84.164663] general protection fault: 0000 [#1] SMP
[ 84.164897] Modules linked in: mvsas libsas scsi_transport_sas ppdev
intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i915 kvm
crct10dif_pclmul drm_kms_helper crc32_pclmul drm ghash_clmulni_intel cryptd
i2c_algo_bit lpc_ich mei_me microcode mei serio_raw soc_button_array video
parport_pc mac_hid netconsole configfs lp parport psmouse ahci libahci r8169
mii
[ 84.165752] CPU: 0 PID: 1008 Comm: kworker/u4:5 Not tainted
3.16.0-031600rc6-generic #201407210035
[ 84.166027] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H81
Pro BTC, BIOS P1.50 02/14/2014
[ 84.166325] Workqueue: events_unbound async_run_entry_fn
[ 84.166630] task: ffff880036d5ef60 ti: ffff8800d4b34000 task.ti:
ffff8800d4b34000
[ 84.166953] RIP: 0010:[<ffffffffc028e5a0>] [<ffffffffc028e5a0>]
mvs_task_prep_ata+0x80/0x3a0 [mvsas]
[ 84.167364] RSP: 0018:ffff8800d4b377c8 EFLAGS: 00010097
[ 84.167714] RAX: 000000000000002c RBX: ffff88020f200000 RCX:
dead000000200200
[ 84.168078] RDX: ffff88020f2037b0 RSI: ffff88020f2255b8 RDI:
ffff88020f200000
[ 84.168451] RBP: ffff8800d4b37838 R08: 0000000000000000 R09:
0000000000001000
[ 84.168834] R10: 0000000000000000 R11: ffff88020f2255b0 R12:
ffff88020fbab640
[ 84.169228] R13: ffff8800d4b37898 R14: ffff88021b4a0000 R15:
ffff880036f19a00
[ 84.169628] FS: 0000000000000000(0000) GS:ffff88021b200000(0000)
knlGS:0000000000000000
[ 84.170044] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 84.170467] CR2: 00007f0031fbf000 CR3: 0000000001c12000 CR4:
00000000000407f0
[ 84.170907] Stack:
[ 84.171345] ffff88021b314400 ffff880200000000 0000000000000282
dead000000200200
[ 84.171818] ffff88020f2037b0 0000000000000046 ffff88020cd81e38
ffffffff811b06ae
[ 84.172300] ffff88021b314400 ffff88020fbab640 ffff88020f2037b0
ffff88020f200000
[ 84.172791] Call Trace:
[ 84.173280] [<ffffffff811b06ae>] ? dma_pool_alloc+0xce/0x100
[ 84.173785] [<ffffffffc028f4ab>] mvs_task_prep+0x58b/0x620 [mvsas]
[ 84.174298] [<ffffffff810a29e6>] ? ttwu_do_activate.constprop.111+0x66/0x70
[ 84.174823] [<ffffffffc028f5a8>] mvs_task_exec.isra.14+0x68/0xf0 [mvsas]
[ 84.175358] [<ffffffffc0290149>] mvs_queue_command+0x39/0x40 [mvsas]
[ 84.175901] [<ffffffffc02778ab>] sas_ata_qc_issue+0x28b/0x2d0 [libsas]
[ 84.176446] [<ffffffff8153102f>] ata_qc_issue+0x18f/0x2d0
[ 84.176997] [<ffffffff81531468>] ata_exec_internal_sg+0x2f8/0x5d0
[ 84.177554] [<ffffffff815317b2>] ata_exec_internal+0x72/0xb0
[ 84.178113] [<ffffffff81531faa>] ata_do_dev_read_id+0x2a/0x30
[ 84.178673] [<ffffffffc02774b0>] ? sas_ata_internal_abort+0x120/0x120
[libsas]
[ 84.179245] [<ffffffff815321f5>] ata_dev_read_id+0x245/0x460
[ 84.179825] [<ffffffff8153e99c>] ? ata_eh_reset+0x24c/0xe20
[ 84.180409] [<ffffffff8153d8f8>] ata_eh_revalidate_and_attach+0x198/0x3a0
[ 84.181002] [<ffffffff810cd4d1>] ? vprintk_emit+0x1b1/0x560
[ 84.181598] [<ffffffff8153fd69>] ata_eh_recover+0x599/0x7e0
[ 84.182200] [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0
[ 84.182809] [<ffffffffc02774b0>] ? sas_ata_internal_abort+0x120/0x120
[libsas]
[ 84.183427] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[ 84.184037] [<ffffffffc02774b0>] ? sas_ata_internal_abort+0x120/0x120
[libsas]
[ 84.184710] [<ffffffff81534750>] ? sata_std_hardreset+0x50/0x50
[ 84.185323] [<ffffffffc02774b0>] ? sas_ata_internal_abort+0x120/0x120
[libsas]
[ 84.185945] [<ffffffff81540742>] ata_do_eh+0x52/0xc0
[ 84.186574] [<ffffffff81534200>] ? sata_print_link_status+0xc0/0xc0
[ 84.187213] [<ffffffff815407f7>] ata_std_error_handler+0x47/0x80
[ 84.187850] [<ffffffff8153b8f8>] ? ata_eh_handle_port_resume+0x38/0x160
[ 84.188473] [<ffffffff8154041b>] ata_scsi_port_error_handler+0x39b/0x5a0
[ 84.189081] [<ffffffffc02772c5>] async_sas_ata_eh+0x55/0x90 [libsas]
[ 84.189673] [<ffffffff8109a89b>] async_run_entry_fn+0x3b/0x140
[ 84.190248] [<ffffffff8108c6ff>] process_one_work+0x17f/0x4c0
[ 84.190812] [<ffffffff81776ba8>] ? maybe_create_worker+0xbb/0x1c5
[ 84.191364] [<ffffffff8108d46b>] worker_thread+0x11b/0x3f0
[ 84.191910] [<ffffffff8108d350>] ? create_and_start_worker+0x80/0x80
[ 84.192446] [<ffffffff81094479>] kthread+0xc9/0xe0
[ 84.192971] [<ffffffff810943b0>] ? flush_kthread_worker+0xb0/0xb0
[ 84.193495] [<ffffffff817910fc>] ret_from_fork+0x7c/0xb0
[ 84.194015] [<ffffffff810943b0>] ? flush_kthread_worker+0xb0/0xb0
[ 84.194534] Code: 00 00 48 8b 0c c8 0f 84 a7 02 00 00 44 89 c0 41 b9 00 10
00 00 48 8d 34 80 48 8d 04 70 48 8d b4 c3 b8 55 02 00 8b 43 58 89 46 1c <8b> 89
54 02 00 00 44 89 c0 8b 7b 58 0d 00 00 00 70 4c 8b 53 48
[ 84.195858] RIP [<ffffffffc028e5a0>] mvs_task_prep_ata+0x80/0x3a0 [mvsas]
[ 84.196412] RSP <ffff8800d4b377c8>

b***@bugzilla.kernel.org

2014-08-08 08:24:01 UTC

Permalink

b***@bugzilla.kernel.org

2014-08-08 08:34:10 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #2 from linux-***@crashplan.pro ---
Created attachment 145681
--> https://bugzilla.kernel.org/attachment.cgi?id=145681&action=edit
Dmesg output from boot

b***@bugzilla.kernel.org

2014-08-12 20:09:15 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #3 from linux-***@crashplan.pro ---
Because Ubuntu doesn't provide debug symbols for their mainline kernel builds
<http://comments.gmane.org/gmane.linux.ubuntu.devel.kernel.general/40661> I am
reverting back to their kernel version 3.13.0-24.46

That results in a kernel crash on port 8C:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000255

Full output:
[ 25.212661] mvsas 0000:01:00.0: mvsas: driver version 0.8.16
[ 25.212703] mvsas 0000:01:00.0: enabling device (0000 -> 0002)
[ 25.213249] mvsas 0000:01:00.0: mvsas: PCI-E x8, Bandwidth Usage: 5.0 Gbps
[ 31.994771] scsi5 : mvsas
[ 31.995530] sas: phy-5:0 added to port-5:0, phy_mask:0x1 (50014380182cf0e6)
[ 31.995564] /build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c 1218:set
wide port phy map 1
[ 32.005672] sas: phy1 matched wide port0
[ 32.005695] sas: phy-5:1 added to port-5:0, phy_mask:0x3 (50014380182cf0e6)
[ 32.005720] /build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c 1218:set
wide port phy map 3
[ 32.025591] sas: phy2 matched wide port0
[ 32.025611] sas: phy-5:2 added to port-5:0, phy_mask:0x7 (50014380182cf0e6)
[ 32.025635] /build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c 1218:set
wide port phy map 7
[ 32.055410] sas: phy3 matched wide port0
[ 32.055427] sas: phy-5:3 added to port-5:0, phy_mask:0xf (50014380182cf0e6)
[ 32.055452] /build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c 1218:set
wide port phy map f
[ 32.095144] sas: DOING DISCOVERY on port 0, pid:127
[ 32.096843] sas: ex 50014380182cf0e6 phy00:D:0 attached: 0000000000000000
(no device)
[ 32.097408] sas: ex 50014380182cf0e6 phy01:D:0 attached: 0000000000000000
(no device)
[ 32.097917] sas: ex 50014380182cf0e6 phy02:D:0 attached: 0000000000000000
(no device)
[ 32.098503] sas: ex 50014380182cf0e6 phy03:D:0 attached: 0000000000000000
(no device)
[ 32.099044] sas: ex 50014380182cf0e6 phy04:D:0 attached: 0000000000000000
(no device)
[ 32.099628] sas: ex 50014380182cf0e6 phy05:D:0 attached: 0000000000000000
(no device)
[ 32.100205] sas: ex 50014380182cf0e6 phy06:D:0 attached: 0000000000000000
(no device)
[ 32.100739] sas: ex 50014380182cf0e6 phy07:D:0 attached: 0000000000000000
(no device)
[ 32.101310] sas: ex 50014380182cf0e6 phy08:D:0 attached: 0000000000000000
(no device)
[ 32.101840] sas: ex 50014380182cf0e6 phy09:D:0 attached: 0000000000000000
(no device)
[ 32.102412] sas: ex 50014380182cf0e6 phy10:D:0 attached: 0000000000000000
(no device)
[ 32.102959] sas: ex 50014380182cf0e6 phy11:D:0 attached: 0000000000000000
(no device)
[ 32.103545] sas: ex 50014380182cf0e6 phy12:D:0 attached: 0000000000000000
(no device)
[ 32.104128] sas: ex 50014380182cf0e6 phy13:D:0 attached: 0000000000000000
(no device)
[ 32.104661] sas: ex 50014380182cf0e6 phy14:D:0 attached: 0000000000000000
(no device)
[ 32.105273] sas: ex 50014380182cf0e6 phy15:D:0 attached: 0000000000000000
(no device)
[ 32.105781] sas: ex 50014380182cf0e6 phy16:D:0 attached: 0000000000000000
(no device)
[ 32.106385] sas: ex 50014380182cf0e6 phy17:D:0 attached: 0000000000000000
(no device)
[ 32.106904] sas: ex 50014380182cf0e6 phy18:D:0 attached: 0000000000000000
(no device)
[ 32.107486] sas: ex 50014380182cf0e6 phy19:D:0 attached: 0000000000000000
(no device)
[ 32.108020] sas: ex 50014380182cf0e6 phy20:D:0 attached: 0000000000000000
(no device)
[ 32.108605] sas: ex 50014380182cf0e6 phy21:D:0 attached: 0000000000000000
(no device)
[ 32.109183] sas: ex 50014380182cf0e6 phy22:D:0 attached: 0000000000000000
(no device)
[ 32.109714] sas: ex 50014380182cf0e6 phy23:D:0 attached: 0000000000000000
(no device)
[ 32.110357] sas: ex 50014380182cf0e6 phy24:U:A attached: 5005043011ab0000
(host)
[ 32.110929] sas: ex 50014380182cf0e6 phy25:U:A attached: 5005043011ab0000
(host)
[ 32.111558] sas: ex 50014380182cf0e6 phy26:U:A attached: 5005043011ab0000
(host)
[ 32.112181] sas: ex 50014380182cf0e6 phy27:U:A attached: 5005043011ab0000
(host)
[ 32.112774] sas: ex 50014380182cf0e6 phy28:S:9 attached: 50014380182cf0dc
(stp)
[ 32.113366] sas: ex 50014380182cf0e6 phy29:S:9 attached: 50014380182cf0dd
(stp)
[ 32.113934] sas: ex 50014380182cf0e6 phy30:S:9 attached: 50014380182cf0de
(stp)
[ 32.114557] sas: ex 50014380182cf0e6 phy31:S:9 attached: 50014380182cf0df
(stp)
[ 32.115138] sas: ex 50014380182cf0e6 phy32:S:0 attached: 0000000000000000
(no device)
[ 32.115654] sas: ex 50014380182cf0e6 phy33:S:0 attached: 0000000000000000
(no device)
[ 32.116198] sas: ex 50014380182cf0e6 phy34:S:0 attached: 0000000000000000
(no device)
[ 32.116711] sas: ex 50014380182cf0e6 phy35:S:0 attached: 0000000000000000
(no device)
[ 32.117003] sas: ex 50014380182cf0e6 phy36:D:A attached: 50014380182cf0e5
(host+target)
[ 32.118398] sas: DONE DISCOVERY on port 0, pid:127, result:0
[ 32.118435] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[ 32.118465] sas: ata6: end_device-5:0:28: dev error handler
[ 32.119140] sas: ata7: end_device-5:0:29: dev error handler
[ 32.119333] sas: ata8: end_device-5:0:30: dev error handler
[ 32.119368] sas: ata9: end_device-5:0:31: dev error handler
[ 32.271218] BUG: unable to handle kernel NULL pointer dereference at
0000000000000255
[ 32.271791] IP: [<ffffffffa02d381e>] mvs_task_prep+0x72e/0xd50 [mvsas]
[ 32.272365] PGD 0
[ 32.272928] Oops: 0000 [#1] SMP
[ 32.273480] Modules linked in: mvsas libsas scsi_transport_sas hid_generic
usbhid hid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd i915 drm_kms_helper
serio_raw lpc_ich mei_me mei drm i2c_algo_bit netconsole configfs lp parport
video mac_hid psmouse ahci libahci r8169 mii
[ 32.275388] CPU: 0 PID: 54 Comm: kworker/u4:1 Not tainted 3.13.0-24-generic
#47-Ubuntu
[ 32.276028] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H81
Pro BTC, BIOS P1.80 07/21/2014
[ 32.276745] Workqueue: events_unbound async_run_entry_fn
[ 32.277389] task: ffff88020fe6afe0 ti: ffff8802136aa000 task.ti:
ffff8802136aa000
[ 32.278032] RIP: 0010:[<ffffffffa02d381e>] [<ffffffffa02d381e>]
mvs_task_prep+0x72e/0xd50 [mvsas]
[ 32.278691] RSP: 0018:ffff8802136ab8c0 EFLAGS: 00010097
[ 32.279337] RAX: 000000000000002c RBX: 0000000000000001 RCX:
0000000000000000
[ 32.279980] RDX: 0000000000000000 RSI: ffff8800d8c255b8 RDI:
ffff8800d8c00000
[ 32.280619] RBP: ffff8802136ab958 R08: ffff8800d8c03618 R09:
ffff8800363a0000
[ 32.281246] R10: ffff880212977600 R11: 0000000000000000 R12:
ffff8800d8c00000
[ 32.281861] R13: 0000000000000000 R14: ffff8800d8c03618 R15:
ffff88020f8dedc0
[ 32.282474] FS: 0000000000000000(0000) GS:ffff88021f200000(0000)
knlGS:0000000000000000
[ 32.283082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 32.283679] CR2: 0000000000000255 CR3: 0000000002c0e000 CR4:
00000000000407f0
[ 32.284278] Stack:
[ 32.284880] ffff88020fe6afe0 ffff880212977200 ffff8802136ab8e0
ffffffff81719ee9
[ 32.285520] ffff880212977600 ffff8800363a0000 ffff8800d8c03618
ffff8800d8c255b0
[ 32.286167] ffff8800d8c02678 0000000000000000 00000001d8c00008
ffff8800d8c255b8
[ 32.286821] Call Trace:
[ 32.287473] [<ffffffff81719ee9>] ? schedule+0x29/0x70
[ 32.288144] [<ffffffffa02d3e9d>] mvs_task_exec.isra.13+0x5d/0xe0 [mvsas]
[ 32.288832] [<ffffffffa02d49dc>] mvs_queue_command+0x30c/0x320 [mvsas]
[ 32.289530] [<ffffffff811a013f>] ? kmem_cache_free+0xef/0x120
[ 32.290232] [<ffffffff8119f692>] ? kmem_cache_alloc+0x132/0x140
[ 32.290942] [<ffffffffa028601d>] ? sas_alloc_task+0x1d/0x40 [libsas]
[ 32.291662] [<ffffffffa028fcab>] sas_ata_qc_issue+0x24b/0x290 [libsas]
[ 32.292392] [<ffffffff814f7762>] ata_qc_issue+0x172/0x380
[ 32.293128] [<ffffffff814f7c23>] ata_exec_internal_sg+0x2b3/0x570
[ 32.293875] [<ffffffff814f7f3a>] ata_exec_internal+0x5a/0xa0
[ 32.294624] [<ffffffff814f8334>] ata_dev_read_id+0x274/0x550
[ 32.295380] [<ffffffffa028f8f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[ 32.296148] [<ffffffff81505bab>] ata_eh_recover+0x74b/0x1310
[ 32.296923] [<ffffffff810bcfe8>] ? console_unlock+0x208/0x400
[ 32.297707] [<ffffffff814facd0>] ? ata_phys_link_online+0x30/0x30
[ 32.298503] [<ffffffffa028f8f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[ 32.299367] [<ffffffff814fae50>] ? ata_phys_link_offline+0x30/0x30
[ 32.300179] [<ffffffffa028f8f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[ 32.301001] [<ffffffff814fae50>] ? ata_phys_link_offline+0x30/0x30
[ 32.301826] [<ffffffffa028f8f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[ 32.302661] [<ffffffff81507299>] ata_do_eh+0x49/0xc0
[ 32.303503] [<ffffffff814facd0>] ? ata_phys_link_online+0x30/0x30
[ 32.304357] [<ffffffff8150734e>] ata_std_error_handler+0x3e/0x80
[ 32.305215] [<ffffffff81506dba>] ata_scsi_port_error_handler+0x56a/0x940
[ 32.306086] [<ffffffffa02900aa>] async_sas_ata_eh+0x4a/0x80 [libsas]
[ 32.306963] [<ffffffff81091517>] async_run_entry_fn+0x37/0x130
[ 32.307849] [<ffffffff810838a2>] process_one_work+0x182/0x450
[ 32.308735] [<ffffffff81084641>] worker_thread+0x121/0x410
[ 32.309629] [<ffffffff81084520>] ? rescuer_thread+0x3e0/0x3e0
[ 32.310530] [<ffffffff8108b312>] kthread+0xd2/0xf0
[ 32.311437] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
[ 32.312351] [<ffffffff817263fc>] ret_from_fork+0x7c/0xb0
[ 32.313255] [<ffffffff8108b240>] ? kthread_create_on_node+0x1d0/0x1d0
[ 32.314160] Code: 63 92 a0 02 00 00 41 80 b8 84 00 00 00 7f 48 8b 80 58 01
00 00 48 8b 1c d0 0f 84 a0 05 00 00 41 8b 44 24 58 48 8b 75 c0 89 46 1c <8b> 8b
54 02 00 00 be 00 10 00 00 41 8b 54 24 58 49 8b 44 24 48
[ 32.316308] RIP [<ffffffffa02d381e>] mvs_task_prep+0x72e/0xd50 [mvsas]
[ 32.317292] RSP <ffff8802136ab8c0>
[ 32.318278] CR2: 0000000000000255

b***@bugzilla.kernel.org

2014-08-12 22:02:09 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #4 from linux-***@crashplan.pro ---
Trying to debug mvs_task_prep with the help of the tutorial at
<http://www.opensourceforu.com/2011/01/understanding-a-kernel-oops/>.

# cat /sys/module/mvsas/sections/.init.text
0xffffffffa00c8000

# cd /lib/modules/3.13.0-24-generic/kernel/drivers/scsi/mvsas

# gdb mvsas.ko

(gdb) add-symbol-file
/usr/lib/debug/lib/modules/3.13.0-24-generic/kernel/drivers/scsi/mvsas/mvsas.ko
0xffffffffa00c8000

(gdb) disassemble mvs_task_prep

Hex to decimal: 0x72e = <+1838>

0xffffffffa00ca81e <+1838>: mov 0x254(%rbx),%ecx

Thanks to the trick from
<https://blogs.oracle.com/ksplice/entry/8_gdb_tricks_you_should>
(gdb) set substitute-path /build/buildd /home/user/src

(gdb) list *0xffffffffa00ca81e
0xffffffffa00ca81e is in mvs_task_prep
(/build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c:471).
Line number 466 out of range;
/build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c has 306 lines.

I guess my gdb version 7.7 has a line counting bug according to
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730630>

A manual approach using
<http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-trusty.git;a=blob;f=drivers/scsi/mvsas/mv_sas.c;h=6c1f223a8e1d335fa7c86a374e470e666e848906;hb=HEAD>:

467 slot = &mvi->slot_info[tag];
468 slot->tx = mvi->tx_prod;
469 del_q = TXQ_MODE_I | tag |
470 (TXQ_CMD_STP << TXQ_CMD_SHIFT) |
471 (MVS_PHY_ID << TXQ_PHY_SHIFT) |
472 (mvi_dev->taskfileset << TXQ_SRS_SHIFT);
473 mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q);

Results that "(MVS_PHY_ID << TXQ_PHY_SHIFT)" is the offending code.

How should that be patched?

b***@bugzilla.kernel.org

2014-08-21 18:35:45 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

Alan <***@lxorguk.ukuu.org.uk> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@lxorguk.ukuu.org.uk
Kernel Version|3.16.0-031600rc6.x86_64 |3.16.0

--- Comment #5 from Alan <***@lxorguk.ukuu.org.uk> ---
Thats not a sensible resolution, it can't be faulting on that line.

b***@bugzilla.kernel.org

2014-08-22 12:13:42 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #6 from linux-***@crashplan.pro ---
When connecting just a single 4 drive group to the good ports (for example 2C)
of the external PCIe expander card:
cold boot = doesn't detect any of the 4 PUIS drives
warm boot = does detect all 4 PUIS drives

When powering up using the warm boot method there don't seem to be errors
reported by smartctl neither sg_ses.

However this cold boot issue might be a different issue from this kernel crash.
According to debug messages first a "Set Features" (0xEF) is being sent. My
guess is that this set features issues subcommand (0x07): spin up media.

And later on the "Identify Device" (0xEC) is sent.

When I correctly read the Hitachi specification the Spin Up (Set Features)
should be sent after "Drive Identify". For this Hitachi HDS5C3020BLE630 the
Drive_Identify (# sg_sat_identify -v /dev/sdb) word 2 outputs "738c" (hex),
which translates to specification "Need Set Feature for spin-up after power-up
Identify Device is complete" according to HGST specification page 127.

Is there a boot parameter (or similar way) to load the mvsas driver without
sending the "Set Features" (0xEF) command?

b***@bugzilla.kernel.org

2014-08-22 12:16:07 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #7 from linux-***@crashplan.pro ---
Created attachment 147751
--> https://bugzilla.kernel.org/attachment.cgi?id=147751&action=edit
smartctl -a /dev/sdb (HDS5C3020BLE630)

b***@bugzilla.kernel.org

2014-08-22 12:17:30 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #8 from linux-***@crashplan.pro ---
Comment on attachment 145681
--> https://bugzilla.kernel.org/attachment.cgi?id=145681
Dmesg output from boot

This is without loading the mvsas kernel module.

b***@bugzilla.kernel.org

2014-08-22 13:19:17 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #9 from linux-***@crashplan.pro ---
re: Thats not a sensible resolution, it can't be faulting on that line.

Another try using a newer version of package gdb-minimal (Ubuntu 7.7-0ubuntu3.2
from trusty-proposed) gives these identical results where address <+1838> maps
to line 471 in mvsas.c and that points to "(MVS_PHY_ID << TXQ_PHY_SHIFT) |".

# cat /sys/module/mvsas/sections/.init.text
0xffffffffa01c2000

(gdb) add-symbol-file
/usr/lib/debug/lib/modules/3.13.0-24-generic/kernel/drivers/scsi/mvsas/mvsas.ko
0xffffffffa01c2000
add symbol table from file
"/usr/lib/debug/lib/modules/3.13.0-24-generic/kernel/drivers/scsi/mvsas/mvsas.ko"
at
.text_addr = 0xffffffffa01c2000

0xffffffffa01c481e <+1838>: mov 0x254(%rbx),%ecx

(gdb) list *0xffffffffa01c481e
0xffffffffa01c481e is in mvs_task_prep
(/build/buildd/linux-3.13.0/drivers/scsi/mvsas/mv_sas.c:471).
466 }
467 slot = &mvi->slot_info[tag];
468 slot->tx = mvi->tx_prod;
469 del_q = TXQ_MODE_I | tag |
470 (TXQ_CMD_STP << TXQ_CMD_SHIFT) |
471 (MVS_PHY_ID << TXQ_PHY_SHIFT) |
472 (mvi_dev->taskfileset << TXQ_SRS_SHIFT);
473 mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q);
474
475 if (task->data_dir == DMA_FROM_DEVICE)

b***@bugzilla.kernel.org

2014-08-22 14:05:25 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #10 from linux-***@crashplan.pro ---
Another test round to see whether there is a difference in crash whether using
cold or warm boot:
5C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas]
5C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas]
6C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas]
6C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas]
7C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas]
7C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas]
8C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas]
8C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas]
9C + cold boot = mvs_task_prep+0x72e/0xd50 [mvsas]
9C + warm boot = mvs_task_prep+0x72e/0xd50 [mvsas]

In cases 6C, 7C and 9C the r8169 nic
doesn't come up after the first automatic reboot after cold boot ("Waiting for
network configuration..." and "Waiting up to 60 more seconds for network
configuration...")
does come up after the second automatic reboot after cold boot
[reproduceable=yes]

b***@bugzilla.kernel.org

2014-08-22 17:00:19 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #11 from linux-***@crashplan.pro ---
Created attachment 147771
--> https://bugzilla.kernel.org/attachment.cgi?id=147771&action=edit
sg_ses PCIe port expander card output

b***@bugzilla.kernel.org

2014-08-22 17:36:22 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #12 from Alan <***@lxorguk.ukuu.org.uk> ---
0xffffffffa01c481e <+1838>: mov 0x254(%rbx),%ecx

is loading an offset from something. It can't be line 471.

It could be line 472, or could be 468. but the offset looks way too big to be
either unless its been optimised somewhat. It's not always entirely accurate.

At this point what might be useful is to add lines between then and rebuild ...
ie

printk("[");
467 slot = &mvi->slot_info[tag];
printk("%d ", tag);
468 slot->tx = mvi->tx_prod;
printk("%p ", slot);
469 del_q = TXQ_MODE_I | tag |
470 (TXQ_CMD_STP << TXQ_CMD_SHIFT) |
471 (MVS_PHY_ID << TXQ_PHY_SHIFT) |
472 (mvi_dev->taskfileset << TXQ_SRS_SHIFT);
printk("%d", mvi->tx_prod]);
473 mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q);
printk("]\n");

and try again. When it dies just before the oops you should have lines of the
form

[num num num]

the final one of which is incomplete. Where it ends tells us where it died and
the values may even give
us a guess at why. If the final [ .. ] sequence is complete then it crashed
somewhere else in the routine and gdb is confused.

b***@bugzilla.kernel.org

2014-08-23 20:04:10 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #13 from linux-***@crashplan.pro ---
It dies between printing the second and the third variable:

[ 30.455440] sas: DONE DISCOVERY on port 0, pid:128, result:0
[ 30.455502] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[ 30.455534] sas: ata6: end_device-5:0:20: dev error handler
[ 30.455744] sas: ata7: end_device-5:0:21: dev error handler
[ 30.456186] sas: ata8: end_device-5:0:22: dev error handler
[ 30.456367] sas: ata9: end_device-5:0:23: dev error handler
[ 30.611146] [0 ffff8800d8e255b8 44]
[ 30.611959] [0 ffff8800d8e255b8 46]
[ 30.612511] [2 ffff8800d8e25668
[ 30.612537] BUG: unable to handle kernel NULL pointer dereference at
0000000000000255
[ 30.613511] IP: [<ffffffffa022c872>] mvs_task_prep+0x782/0xdd0 [mvsas]
[ 30.614003] PGD 0
[ 30.614486] Oops: 0000 [#1] SMP
[ 30.614967] Modules linked in: mvsas(OF) libsas scsi_transport_sas
x86_pkg_temp_thermal intel_powerclamp hid_generic coretemp usbhid kvm_intel
i915 kvm hid crct10dif_pclmul drm_kms_helper crc32_pclmul ghash_clmulni_intel
cryptd drm netconsole configfs i2c_algo_bit serio_raw mei_me lpc_ich mei lp
video mac_hid parport psmouse r8169 mii ahci libahci
[ 30.616702] CPU: 0 PID: 6 Comm: kworker/u4:0 Tainted: GF O
3.13.0-35-generic #62-Ubuntu
[ 30.617279] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H81
Pro BTC, BIOS P1.80 07/21/2014
[ 30.617853] Workqueue: events_unbound async_run_entry_fn
[ 30.618426] task: ffff8802139b0000 ti: ffff8802139ae000 task.ti:
ffff8802139ae000
[ 30.619007] RIP: 0010:[<ffffffffa022c872>] [<ffffffffa022c872>]
mvs_task_prep+0x782/0xdd0 [mvsas]
[ 30.619604] RSP: 0018:ffff8802139af8c0 EFLAGS: 00010096
[ 30.620188] RAX: ffff8800d8e03618 RBX: 0000000000000002 RCX:
0000000000002ace
[ 30.620779] RDX: 00000000000064e6 RSI: 0000000000000046 RDI:
0000000000000046
[ 30.621363] RBP: ffff8802139af958 R08: 0000000000000086 R09:
0000000000000426
[ 30.621941] R10: ffff880213bf4098 R11: 0000000000000001 R12:
0000000000000001
[ 30.622508] R13: ffff8800d8e00000 R14: ffff8800d8e03618 R15:
ffff88007f912500
[ 30.623068] FS: 0000000000000000(0000) GS:ffff88021f200000(0000)
knlGS:0000000000000000
[ 30.623649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 30.624238] CR2: 0000000000000255 CR3: 0000000002c0e000 CR4:
00000000000407f0
[ 30.624844] Stack:
[ 30.625450] ffffffff8109d415 ffff88021f314440 ffff88021f314440
ffff88021f314440
[ 30.626097] ffff88020f97064c ffff8800d8e01e38 ffff880211d6fe00
ffff8800d8e25660
[ 30.626752] ffff88007f740080 ffff8800d8e02678 0000000181098129
ffff8800d8e25668
[ 30.627413] Call Trace:
[ 30.628072] [<ffffffff8109d415>] ? sched_clock_cpu+0xb5/0x100
[ 30.628753] [<ffffffffa022cf1d>] mvs_task_exec.isra.13+0x5d/0xe0 [mvsas]
[ 30.629450] [<ffffffffa022da5c>] mvs_queue_command+0x30c/0x320 [mvsas]
[ 30.630155] [<ffffffff811a2362>] ? kmem_cache_alloc+0x1b2/0x1e0
[ 30.630867] [<ffffffffa020c787>] ? sas_free_task+0x37/0x40 [libsas]
[ 30.631593] [<ffffffffa0215cab>] sas_ata_qc_issue+0x24b/0x290 [libsas]
[ 30.632326] [<ffffffff814fe742>] ata_qc_issue+0x172/0x380
[ 30.633067] [<ffffffff814fec03>] ata_exec_internal_sg+0x2b3/0x570
[ 30.633817] [<ffffffff814fef1a>] ata_exec_internal+0x5a/0xa0
[ 30.634570] [<ffffffff814ff314>] ata_dev_read_id+0x274/0x550
[ 30.635332] [<ffffffffa02158f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[ 30.636166] [<ffffffff8150cbab>] ata_eh_recover+0x74b/0x1310
[ 30.636938] [<ffffffff81501cb0>] ? ata_phys_link_online+0x30/0x30
[ 30.637721] [<ffffffffa02158f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[ 30.638512] [<ffffffff81501e30>] ? ata_phys_link_offline+0x30/0x30
[ 30.639314] [<ffffffffa02158f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[ 30.640119] [<ffffffff81501e30>] ? ata_phys_link_offline+0x30/0x30
[ 30.640933] [<ffffffffa02158f0>] ? sas_ata_printk+0x80/0x80 [libsas]
[ 30.641758] [<ffffffff8150e299>] ata_do_eh+0x49/0xc0
[ 30.642588] [<ffffffff81501cb0>] ? ata_phys_link_online+0x30/0x30
[ 30.643425] [<ffffffff8150e34e>] ata_std_error_handler+0x3e/0x80
[ 30.644271] [<ffffffff8150ddba>] ata_scsi_port_error_handler+0x56a/0x940
[ 30.645128] [<ffffffffa02160aa>] async_sas_ata_eh+0x4a/0x80 [libsas]
[ 30.645996] [<ffffffff81091657>] async_run_entry_fn+0x37/0x130
[ 30.646871] [<ffffffff810839d2>] process_one_work+0x182/0x450
[ 30.647750] [<ffffffff810847c1>] worker_thread+0x121/0x410
[ 30.648638] [<ffffffff810846a0>] ? rescuer_thread+0x430/0x430
[ 30.649534] [<ffffffff8108b4a2>] kthread+0xd2/0xf0
[ 30.650429] [<ffffffff8108b3d0>] ? kthread_create_on_node+0x1c0/0x1c0
[ 30.651321] [<ffffffff8172ecbc>] ret_from_fork+0x7c/0xb0
[ 30.652211] [<ffffffff8108b3d0>] ? kthread_create_on_node+0x1c0/0x1c0
[ 30.653102] Code: 03 47 23 a0 31 c0 e8 62 b7 4e e1 48 8b 4d c0 41 8b 45 58
48 c7 c7 07 47 23 a0 89 41 1c 48 89 ce 31 c0 e8 46 b7 4e e1 48 8b 45 d0 <41> 8b
8c 24 54 02 00 00 41 bc 00 10 00 00 41 8b 75 58 48 c7 c7
[ 30.655215] RIP [<ffffffffa022c872>] mvs_task_prep+0x782/0xdd0 [mvsas]
[ 30.656195] RSP <ffff8802139af8c0>
[ 30.657163] CR2: 0000000000000255

b***@bugzilla.kernel.org

2014-08-23 20:06:31 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #14 from linux-***@crashplan.pro ---
By the way:

printk("%d", mvi->tx_prod]);

was changed to:

printk("%d", mvi->tx_prod);

The square bracket after tx_prod was removed.

b***@bugzilla.kernel.org

2014-08-23 22:12:23 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #15 from linux-***@crashplan.pro ---
Created attachment 147881
--> https://bugzilla.kernel.org/attachment.cgi?id=147881&action=edit
Ubuntu Linux/x86_64 3.13.0-35-generic Kernel Configuration

This kernel configuration was used to build both the patched and unpatched
mvsas.ko

b***@bugzilla.kernel.org

2014-09-23 21:56:05 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

--- Comment #16 from linux-***@crashplan.pro ---
When line-by-line dumping the called constants/vars from:
469 del_q = TXQ_MODE_I | tag |
470 (TXQ_CMD_STP << TXQ_CMD_SHIFT) |
471 (MVS_PHY_ID << TXQ_PHY_SHIFT) |
472 (mvi_dev->taskfileset << TXQ_SRS_SHIFT);

using the prepended statements:
printk("slot=%p ", slot);
printk(KERN_INFO "TXQ_MODE_I=%d ", TXQ_MODE_I);
printk(KERN_INFO "tag=%d ", tag);
printk(KERN_INFO "TXQ_CMD_STP=%d ", TXQ_CMD_STP);
printk(KERN_INFO "TXQ_CMD_SHIFT=%d ", TXQ_CMD_SHIFT);
printk(KERN_INFO "MVS_PHY_ID=%d ", MVS_PHY_ID);
printk(KERN_INFO "TXQ_PHY_SHIFT=%d ", TXQ_PHY_SHIFT);
del_q = TXQ_MODE_I | tag |
(TXQ_CMD_STP << TXQ_CMD_SHIFT) |
(MVS_PHY_ID << TXQ_PHY_SHIFT) |
(mvi_dev->taskfileset << TXQ_SRS_SHIFT);

the kernel crash occurs after printing "TXQ_CMD_SHIFT" or when trying to output
the value of "MVS_PHY_ID":
[ 529.113152] sas: DONE DISCOVERY on port 0, pid:133, result:0
[ 529.114313] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[ 529.115460] sas: ata7: end_device-6:0:28: dev error handler
[ 529.115522] sas: ata8: end_device-6:0:29: dev error handler
[ 529.118706] sas: ata9: end_device-6:0:30: dev error handler
[ 529.119840] sas: ata10: end_device-6:0:31: dev error handler
[ 529.271634] [mvi=ffff8800d3680000, mvi_dev=ffff8800d36836a0 tag=0
slot=ffff8800d36a55b8
[ 529.271753] TXQ_MODE_I=268435456 tag=0
[ 529.272679] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
[ 529.273618] MVS_PHY_ID=32768 TXQ_PHY_SHIFT=12 tx_prod=44]
[ 529.276091] [mvi=ffff8800d3680000, mvi_dev=ffff8800d3683618 tag=1
slot=ffff8800d36a5610
[ 529.276207] TXQ_MODE_I=268435456 tag=1
[ 529.277095] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
[ 529.278038] MVS_PHY_ID=1 TXQ_PHY_SHIFT=12 tx_prod=46]
[ 529.280271] [mvi=ffff8800d3680000, mvi_dev=ffff8800d3683618 tag=1
slot=ffff8800d36a5610
[ 529.280385] TXQ_MODE_I=268435456 tag=1
[ 529.281445] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
[ 529.282562] MVS_PHY_ID=1 TXQ_PHY_SHIFT=12 tx_prod=48]
[ 529.284894] [mvi=ffff8800d3680000, mvi_dev=ffff8800d36837b0 tag=2
slot=ffff8800d36a5668
[ 529.285010] TXQ_MODE_I=268435456 tag=2
[ 529.286248] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
[ 529.287555] BUG: unable to handle kernel NULL pointer dereference at
0000000000000257
[ 529.290225] IP: [<ffffffffa02888bb>] mvs_task_prep+0x7cb/0xe50 [mvsas]
[ 529.291686] PGD 0
[ 529.293141] Oops: 0000 [#1] SMP
[ 529.294630] Modules linked in: mvsas(OF) libsas scsi_transport_sas
x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel cryptd serio_raw lpc_ich i915 mei_me mei
drm_kms_helper video netconsole drm configfs mac_hid i2c_algo_bit psmouse r8169
ahci mii libahci

Any suggestions why accessing "MVS_PHY_ID" leads to the kernel NULL pointer
dereference oops?

Elliott, Robert (Server Storage)

2014-09-24 00:32:06 UTC

Permalink

-----Original Message-----
Sent: Tuesday, 23 September, 2014 4:56 PM
Subject: [Bug 81861] Oops by mvsas v0.8.16: sas: ataX: end_device-Y:0:Z: dev
error handler -> general protection fault, RIP: mvs_task_prep_ata+0x80/0x3a0
https://bugzilla.kernel.org/show_bug.cgi?id=81861
469 del_q = TXQ_MODE_I | tag |
470 (TXQ_CMD_STP << TXQ_CMD_SHIFT) |
471 (MVS_PHY_ID << TXQ_PHY_SHIFT) |
472 (mvi_dev->taskfileset << TXQ_SRS_SHIFT);
printk("slot=%p ", slot);
printk(KERN_INFO "TXQ_MODE_I=%d ", TXQ_MODE_I);
printk(KERN_INFO "tag=%d ", tag);
printk(KERN_INFO "TXQ_CMD_STP=%d ", TXQ_CMD_STP);
printk(KERN_INFO "TXQ_CMD_SHIFT=%d ", TXQ_CMD_SHIFT);
printk(KERN_INFO "MVS_PHY_ID=%d ", MVS_PHY_ID);
printk(KERN_INFO "TXQ_PHY_SHIFT=%d ", TXQ_PHY_SHIFT);
del_q = TXQ_MODE_I | tag |
(TXQ_CMD_STP << TXQ_CMD_SHIFT) |
(MVS_PHY_ID << TXQ_PHY_SHIFT) |
(mvi_dev->taskfileset << TXQ_SRS_SHIFT);
the kernel crash occurs after printing "TXQ_CMD_SHIFT" or when trying to output
[ 529.113152] sas: DONE DISCOVERY on port 0, pid:133, result:0
[ 529.114313] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
[ 529.115460] sas: ata7: end_device-6:0:28: dev error handler
[ 529.115522] sas: ata8: end_device-6:0:29: dev error handler
[ 529.118706] sas: ata9: end_device-6:0:30: dev error handler
[ 529.119840] sas: ata10: end_device-6:0:31: dev error handler
[ 529.271634] [mvi=ffff8800d3680000, mvi_dev=ffff8800d36836a0 tag=0
slot=ffff8800d36a55b8
[ 529.271753] TXQ_MODE_I=268435456 tag=0
[ 529.272679] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
[ 529.273618] MVS_PHY_ID=32768 TXQ_PHY_SHIFT=12 tx_prod=44]
[ 529.276091] [mvi=ffff8800d3680000, mvi_dev=ffff8800d3683618 tag=1
slot=ffff8800d36a5610
[ 529.276207] TXQ_MODE_I=268435456 tag=1
[ 529.277095] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
[ 529.278038] MVS_PHY_ID=1 TXQ_PHY_SHIFT=12 tx_prod=46]
[ 529.280271] [mvi=ffff8800d3680000, mvi_dev=ffff8800d3683618 tag=1
slot=ffff8800d36a5610
[ 529.280385] TXQ_MODE_I=268435456 tag=1
[ 529.281445] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
[ 529.282562] MVS_PHY_ID=1 TXQ_PHY_SHIFT=12 tx_prod=48]
[ 529.284894] [mvi=ffff8800d3680000, mvi_dev=ffff8800d36837b0 tag=2
slot=ffff8800d36a5668
[ 529.285010] TXQ_MODE_I=268435456 tag=2
[ 529.286248] TXQ_CMD_STP=3 TXQ_CMD_SHIFT=29
[ 529.287555] BUG: unable to handle kernel NULL pointer dereference at
0000000000000257
[ 529.290225] IP: [<ffffffffa02888bb>] mvs_task_prep+0x7cb/0xe50 [mvsas]
[ 529.291686] PGD 0
[ 529.293141] Oops: 0000 [#1] SMP
[ 529.294630] Modules linked in: mvsas(OF) libsas scsi_transport_sas
x86_pkg_temp_thermal intel_powerclamp coretemp kvm crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel cryptd serio_raw lpc_ich i915 mei_me mei
drm_kms_helper video netconsole drm configfs mac_hid i2c_algo_bit psmouse r8169
ahci mii libahci
Any suggestions why accessing "MVS_PHY_ID" leads to the kernel NULL pointer
dereference oops?

1. Although MVS_PHY_ID looks like a constant, it's really not:
#define MVS_PHY_ID (1U << sas_phy->id)

2. This fault:
[ 32.271218] BUG: unable to handle kernel NULL pointer dereference at 0000000000000255
(although 255 looks like a decimal number 0xff, it's really hex 0x255)

at this line:
0xffffffffa01c481e <+1838>: mov 0x254(%rbx),%ecx

implies that rbx contains 1, so 0x254 + 1 = 0x255.

3. pahole drivers/scsi/mvsas/mv_sas.o
shows there are two structures with fields at offset 596:
* asd_sas_phy.id
* asd_sas_port.sas_addr[8]

4. objdump -drS drivers/scsi/mvsas/mv_sas.o
shows only a few lines with 0x254(%something), one of which
is the del_q line you've identified:

mvs_task_prep_ata(struct mvs_info *mvi, struct mvs_task_exec_info *tei):
struct sas_ha_struct *sha = mvi->sas;
struct sas_task *task = tei->task;
struct domain_device *dev = task->dev;
struct sas_phy *sphy = dev->phy;
struct asd_sas_phy *sas_phy = sha->sas_phy[sphy->number];

...
del_q = TXQ_MODE_I | tag |
(TXQ_CMD_STP << TXQ_CMD_SHIFT) |
(MVS_PHY_ID << TXQ_PHY_SHIFT) |
(mvi_dev->taskfileset << TXQ_SRS_SHIFT);
mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q);

MVS_PHY_ID =
sas_phy->id =
sha->sas_phy[sphy->number] =
mvi->sas->sas_phy[dev->phy->number] =
mvi->sas->sas_phy[task->dev->phy->number]->id
mvi->sas->sas_phy[tei->task->dev->phy->number]->id

Looking at the offsets reported by pahole, that means:
%rdi->56->344[%rsi->0->0->56->688]->254

mvi->sas->sas_phy is a pointer to a pointer:
struct sas_ha_struct {
...
struct asd_sas_phy * * sas_phy; /* 344 8 */

You might look for somewhere that could accidentally
be setting sas_phy[something] to a for loop index,
with a typecast hiding the problem from the compiler.
Or, the phy->number value being passed might be
out of range; if there were discovery errors, something
might not have been initialized like this function expects.

---
Rob Elliott HP Server Storage

��{.n�+��+%��lzwm��b�맲��r��zX��(��ܨ}��Ơz�&j:+v��zZ+

b***@bugzilla.kernel.org

2014-09-26 07:04:54 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

Leon Woestenberg <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #17 from Leon Woestenberg <***@gmail.com> ---

With TXQ_PHY_SHIFT being 12, and TXQ_CMD_SHIFT being 29, it seems the PHY
one-bit-hot coding appears in bits 12 through 28 inclusive.

I.e. 16 bits or PHY ID's are supported.

The register transmitted to the controller seems a 32-bit fixed register, so
this seems a hardware limitation rather than software driver limitation.

469 del_q = TXQ_MODE_I | tag |
470 (TXQ_CMD_STP << TXQ_CMD_SHIFT) |
471 (MVS_PHY_ID << TXQ_PHY_SHIFT) |
472 (mvi_dev->taskfileset << TXQ_SRS_SHIFT);
printk("%d", mvi->tx_prod]);
473 mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q);

Remaining question: how is this supposed to fly with port expanders where PHY
ID's get >16?

Thanks to an extensive debug report by e-mail from Rob Elliott (HP Server
Storage) --- thanks! --- which I copied ad verbatim:

---
1. Although MVS_PHY_ID looks like a constant, it's really not:
#define MVS_PHY_ID (1U << sas_phy->id)

2. This fault:
[ 32.271218] BUG: unable to handle kernel NULL pointer dereference at
0000000000000255
(although 255 looks like a decimal number 0xff, it's really hex 0x255)

at this line:
0xffffffffa01c481e <+1838>: mov 0x254(%rbx),%ecx

implies that rbx contains 1, so 0x254 + 1 = 0x255.

3. pahole drivers/scsi/mvsas/mv_sas.o
shows there are two structures with fields at offset 596:
* asd_sas_phy.id
* asd_sas_port.sas_addr[8]

4. objdump -drS drivers/scsi/mvsas/mv_sas.o
shows only a few lines with 0x254(%something), one of which
is the del_q line you've identified:

mvs_task_prep_ata(struct mvs_info *mvi, struct mvs_task_exec_info *tei):
struct sas_ha_struct *sha = mvi->sas;
struct sas_task *task = tei->task;
struct domain_device *dev = task->dev;
struct sas_phy *sphy = dev->phy;
struct asd_sas_phy *sas_phy = sha->sas_phy[sphy->number];

...
del_q = TXQ_MODE_I | tag |
(TXQ_CMD_STP << TXQ_CMD_SHIFT) |
(MVS_PHY_ID << TXQ_PHY_SHIFT) |
(mvi_dev->taskfileset << TXQ_SRS_SHIFT);
mvi->tx[mvi->tx_prod] = cpu_to_le32(del_q);

MVS_PHY_ID =
sas_phy->id =
sha->sas_phy[sphy->number] =
mvi->sas->sas_phy[dev->phy->number] =
mvi->sas->sas_phy[task->dev->phy->number]->id
mvi->sas->sas_phy[tei->task->dev->phy->number]->id

Looking at the offsets reported by pahole, that means:
%rdi->56->344[%rsi->0->0->56->688]->254

mvi->sas->sas_phy is a pointer to a pointer:
struct sas_ha_struct {
...
struct asd_sas_phy * * sas_phy; /* 344 8 */

You might look for somewhere that could accidentally
be setting sas_phy[something] to a for loop index,
with a typecast hiding the problem from the compiler.
Or, the phy->number value being passed might be
out of range; if there were discovery errors, something
might not have been initialized like this function expects.

Rob Elliott HP Server Storage
---

b***@bugzilla.kernel.org

2014-10-19 15:56:21 UTC

Permalink

https://bugzilla.kernel.org/show_bug.cgi?id=81861

linux-***@crashplan.pro changed:

What |Removed |Added
----------------------------------------------------------------------------
Kernel Version|3.16.0 |3.17.1

--- Comment #18 from linux-***@crashplan.pro ---
Even after flashing the SAS2LP-MV8 its firmware from version 4.0.0.1800 to
version 4.0.0.1812 the mvs_task_prep_ata+0x80/0x3a0 [mvsas] kernel oops issue
persists on kernel:

1. "Linux ubuntu25 3.17.1-031701-generic #201410150735 SMP Wed Oct 15 11:36:31
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux" and
2. "Linux ubuntu25 3.17.0-999-generic #201410182205 SMP Sun Oct 19 02:06:22 UTC
2014 x86_64 x86_64 x86_64 GNU/Linux"