Wednesday, August 15, 2012

Solaris 10. Enterprise T2000 xt_sync; timeout and kernel panic.

An Enterprise T2000 running Solaris 10 was updated with an extensive patch set, including kernel patches.

After the system rebooted we started getting Kernel panics and the system would not boot up.

 Loading: /platform/SUNW,SPARC-Enterprise-T2000/kernel/sparcv9/unix  
 Loading: /platform/sun4v/kernel/sparcv9/unix  
 SunOS Release 5.10 Version Generic_147440-12 64-bit  
 Copyright (c) 1983, 2012, Oracle and/or its affiliates. All rights reserved.  
 os-io Cross trap sync timeout: at cpu_sync.xword[1]: 0x1010panic: failed to stop cpu8  
 panic: failed to stop cpu9  
 panic: failed to stop cpu10  
 panic: failed to stop cpu11  
 panic: failed to stop cpu12  
 panic: failed to stop cpu13  
 panic: failed to stop cpu14  
 panic: failed to stop cpu15  
   
 panic[cpu1]/thread=2a100bbfca0: xt_sync: timeout  

One patch in-particular was causing the problem: 147440-12. I had found this snippet online which lead me to 147440-12.

 [Patch]  
  147440-02 or newer SunOS 5.10: Solaris kernel patch  
   
 [Note]  
  If you apply this patch to a system with firmware version older than   
  6.4.6, system panic occurs during booting and the following message is   
  output.  
   
  <Example of message>  
   ----------------------------------------------------------------------  
   panic[cpu14]/thread=2a1027f5ca0: xt_sync: timeout  
   ----------------------------------------------------------------------  
   
  <Environment>  
   This problem occurs on the following models.  
    - Sun Fire T1000/T2000  
    - SPARC Enterprise T1000/T2000  
   
  <Conditions>  
   Firmware version is older than 6.4.6. (*1)  
   
   *1) To confirm, execute the showhost command on ALOM.  
    In the following example, firmware version is 6.7.11.  
   
     <Example of command execution>  
      sc> showhost  
      SPARC-Enterprise-T1000 System Firmware 6.7.11 2010/10/12 12:34  
      Host flash versions:  
       OBP 4.30.4.b 2010/07/09 13:43  
       Hypervisor 1.7.3.c 2010/07/09 15:14  
       POST 4.30.4.b 2010/07/09 14:25  

Running showhost on our T2000 revealed we have firmware 6.3.x.  Our system wasn't bootable so I booted single from a cdrom (boot cdrom -s).

I mounted an NFS drive containing the updated firmware;  139434-09 which you can obtain from Oracle.

Next I ran the following commands:

 # ./sysfwdownload Sun_System_Firmware-6_7_12-SPARC_Enterprise_T2000.bin  
 .......... (10%).......... (20%).......... (30%).......... (41%)..........   
 (51%).......... (61%).......... (71%).......... (82%).......... (92%)........ (100%)  
 Download completed successfully  
   
 # init 0  

Now you should be back at the 'ok' prompt.  Now on the ALOM:

 sc> poweroff  
 SC Alert: SC Request to Power Off Host.  
   
 SC Alert: Host system has shut down.  
   
 sc> setkeyswitch -y normal  
 sc> flashupdate -s 127.0.0.1  
 sc> resetsc  
   

When the SC comes back, and you should be at the latest revision. We booted the system, and it came up without errors.

2 comments:

  1. Heya buddy, I'm having issue gaining access to the firmware in question and have come across this issue myself with regards to install Sol10 on an T2000, could I please have the firmware file in question that you have.

    I don't have a CUA account with ORACLE either. darren@wisecorp.co.uk

    ReplyDelete
    Replies
    1. Just seeing this since I haven't kept up with this blog - I no longer have access to the firmware either =(

      Delete