Promise VTrack, два конца scsi, два хоста, драйвы поделены по каналам. Спонтанно и без видимой причины отпадают диски на хостах, иногда на одном, иногда на обоих. Период стабильной работы от пары минут до полугода+, как звезды встанут.
Лечится заменой scsi карты на любой LSI, какая-то несовместимость с железом или дровами Adaptec.
Судя по гуглу этому фейлу уже лет 10.
---------------------------------------------------------------
kernel: [125252.989174] sd 3:0:1:2: [sdc] Attempting to queue an ABORT message:CDB: 0x28 0x0 0x6 0x61 0xd0 0x21 0x0 0x0 0x20 0x0
kernel: [125252.989737] scsi3: At time of recovery, card was not paused
kernel: [125252.989746] >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
kernel: [125252.989750] scsi3: Dumping Card State at program address 0x3 Mode 0x33
kernel: [125252.989754] Card was paused
kernel: [125252.992295] <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
kernel: [125252.992339] (scsi3:A:1:2): Device is disconnected, re-queuing SCB
kernel: [125252.992370] scsi3: Recovery code sleeping
kernel: [125252.992404] (scsi3:A:1:2): Abort Tag Message Sent
kernel: [125252.992587] (scsi3:A:1:2): SCB 18 - Abort Completed.
kernel: [125252.992632] Recovery SCB completes
kernel: [125252.992644] scsi3: device overrun (status 9) on 0:1:2
kernel: [125252.992649] found == 0x1
kernel: [125252.992713] Recovery code awake
kernel: [125252.992835] sd 3:0:1:2: [sdc] Attempting to queue a TARGET RESET message:CDB: 0x28 0x0 0x6 0x61 0xd0 0x21 0x0 0x0 0x20 0x
0
kernel: [125252.992856] scsi3: Device reset code sleeping
kernel: [125252.992886] (scsi3:A:1:2): Bus Device Reset Message Sent
kernel: [125253.008886] Recovery SCB completes
kernel: [125253.008985] scsi3: Device reset returning 0x2002
kernel: [125273.021071] sd 3:0:1:2: [sdc] Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0 0x0 0x0
kernel: [125273.021633] scsi3: At time of recovery, card was not paused
kernel: [125273.021643] >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
kernel: [125273.021647] scsi3: Dumping Card State at program address 0x2d Mode 0x33
kernel: [125273.021651] Card was paused
kernel: [125283.026398] (scsi3:A:1:2): Device is disconnected, re-queuing SCB
kernel: [125283.026432] scsi3: Recovery code sleeping
kernel: [125283.026491] (scsi3:A:1:2): Abort Tag Message Sent
kernel: [125283.026761] (scsi3:A:1:2): SCB 18 - Abort Completed.
kernel: [125283.026811] Recovery SCB completes
kernel: [125283.026844] scsi3: device overrun (status 9) on 0:1:2
kernel: [125283.026853] Recovery code awake
kernel: [125283.026855] found == 0x1
kernel: [125283.026868] sd 3:0:1:2: Device offlined - not ready after error recovery
kernel: [125283.026874] sd 3:0:1:2: Device offlined - not ready after error recovery
kernel: [125283.026900] sd 3:0:1:2: [sdc] Unhandled error code
kernel: [125283.026906] sd 3:0:1:2: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
kernel: [125283.026916] sd 3:0:1:2: [sdc] CDB: Read(10): 28 00 06 61 d0 21 00 00 20 00
kernel: [125283.026936] end_request: I/O error, dev sdc, sector 856588552
kernel: [125283.027080] sd 3:0:1:2: [sdc] Unhandled error code
kernel: [125283.027085] sd 3:0:1:2: [sdc] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
kernel: [125283.027092] sd 3:0:1:2: [sdc] CDB: Read(10): 28 00 06 61 d0 41 00 00 20 00
kernel: [125283.027109] end_request: I/O error, dev sdc, sector 856588808
kernel: [125283.027191] sd 3:0:1:2: rejecting I/O to offline device
kernel: [125283.033087] sd 3:0:1:2: rejecting I/O to offline device
---------------------------------------------------------------
Со стороны promise усе ровно, только череда "SCSI Bus reset detected for SCSI chann..." остаётся в логе.
Лечится заменой scsi карты на любой LSI, какая-то несовместимость с железом или дровами Adaptec.
Судя по гуглу этому фейлу уже лет 10.
---------------------------------------------------------------
kernel: [125252.989174] sd 3:0:1:2: [sdc] Attempting to queue an ABORT message:CDB: 0x28 0x0 0x6 0x61 0xd0 0x21 0x0 0x0 0x20 0x0
kernel: [125252.989737] scsi3: At time of recovery, card was not paused
kernel: [125252.989746] >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
kernel: [125252.989750] scsi3: Dumping Card State at program address 0x3 Mode 0x33
kernel: [125252.989754] Card was paused
kernel: [125252.992295] <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
kernel: [125252.992339] (scsi3:A:1:2): Device is disconnected, re-queuing SCB
kernel: [125252.992370] scsi3: Recovery code sleeping
kernel: [125252.992404] (scsi3:A:1:2): Abort Tag Message Sent
kernel: [125252.992587] (scsi3:A:1:2): SCB 18 - Abort Completed.
kernel: [125252.992632] Recovery SCB completes
kernel: [125252.992644] scsi3: device overrun (status 9) on 0:1:2
kernel: [125252.992649] found == 0x1
kernel: [125252.992713] Recovery code awake
kernel: [125252.992835] sd 3:0:1:2: [sdc] Attempting to queue a TARGET RESET message:CDB: 0x28 0x0 0x6 0x61 0xd0 0x21 0x0 0x0 0x20 0x
0
kernel: [125252.992856] scsi3: Device reset code sleeping
kernel: [125252.992886] (scsi3:A:1:2): Bus Device Reset Message Sent
kernel: [125253.008886] Recovery SCB completes
kernel: [125253.008985] scsi3: Device reset returning 0x2002
kernel: [125273.021071] sd 3:0:1:2: [sdc] Attempting to queue an ABORT message:CDB: 0x0 0x0 0x0 0x0 0x0 0x0
kernel: [125273.021633] scsi3: At time of recovery, card was not paused
kernel: [125273.021643] >>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
kernel: [125273.021647] scsi3: Dumping Card State at program address 0x2d Mode 0x33
kernel: [125273.021651] Card was paused
kernel: [125283.026398] (scsi3:A:1:2): Device is disconnected, re-queuing SCB
kernel: [125283.026432] scsi3: Recovery code sleeping
kernel: [125283.026491] (scsi3:A:1:2): Abort Tag Message Sent
kernel: [125283.026761] (scsi3:A:1:2): SCB 18 - Abort Completed.
kernel: [125283.026811] Recovery SCB completes
kernel: [125283.026844] scsi3: device overrun (status 9) on 0:1:2
kernel: [125283.026853] Recovery code awake
kernel: [125283.026855] found == 0x1
kernel: [125283.026868] sd 3:0:1:2: Device offlined - not ready after error recovery
kernel: [125283.026874] sd 3:0:1:2: Device offlined - not ready after error recovery
kernel: [125283.026900] sd 3:0:1:2: [sdc] Unhandled error code
kernel: [125283.026906] sd 3:0:1:2: [sdc] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
kernel: [125283.026916] sd 3:0:1:2: [sdc] CDB: Read(10): 28 00 06 61 d0 21 00 00 20 00
kernel: [125283.026936] end_request: I/O error, dev sdc, sector 856588552
kernel: [125283.027080] sd 3:0:1:2: [sdc] Unhandled error code
kernel: [125283.027085] sd 3:0:1:2: [sdc] Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK
kernel: [125283.027092] sd 3:0:1:2: [sdc] CDB: Read(10): 28 00 06 61 d0 41 00 00 20 00
kernel: [125283.027109] end_request: I/O error, dev sdc, sector 856588808
kernel: [125283.027191] sd 3:0:1:2: rejecting I/O to offline device
kernel: [125283.033087] sd 3:0:1:2: rejecting I/O to offline device
---------------------------------------------------------------
Со стороны promise усе ровно, только череда "SCSI Bus reset detected for SCSI chann..." остаётся в логе.
Комментариев нет:
Отправить комментарий