To the top!

Thursday, November 1, 2012

old systems without sed inplace editing.

I have a habit of using sed to in place edit files. But, here is a quick way to inplace edit a file using perl instead if you come across a system with an older version of sed.

 perl -pi -e 's/\n/\r\n/g'

Example above is changing linux to windows line endings. This came about during my previous post of using base64 to attach ascii text files.

Simple enough.

Mailing attachments. No UUencode; no mailx -a or mutt

On a lot of older systems you probably have uuencode to add attachments to mail / mailx from a *nix system. If your system is missing uuencode or doesnt have a mail client that supports attachments, you might need to try base64 instead. There is the base64 command, if not can use 'openssl base64'.
Below is a snippet that works, and gets passed directly to sendmail. In this case it was an ascii text file.
The parenthesis subshell everything so the whole block of text will get passed to sendmail and parsed correctly. Without the headers, just a base64 blob of text would appear to the recipient.

 ( echo "to: $recipients"
  echo "subject: Report for rsync for from_leads on `date`"
  echo "mime-version: 1.0"
  echo "content-type: multipart/related; boundary=xBoundaryStringx"
  echo
  echo "--xBoundaryStringx"
  echo "content-type: text/plain"
  echo "Body Text"
  echo "Body Text"
  echo "--xBoundaryStringx"
  echo "content-type: text/plain; charset=us-ascii; name=$filename"
  echo "content-transfer-encoding: base64"
  echo
  openssl base64 < $filename ) |sendmail -t -i

This was what I had to use to send log attachments on an RHEL 5.8 system that had no mail client supporting attachings, did not have uuencode and was not allowed to have any further packages installed.

Thursday, September 13, 2012

Sabertooth P67 w/ SSD and Stop 0XF4

I upgraded my desktop system at home to a Samsung 830 series 128GB SSD. Before even trying to install windows (gaming PC ) to the drive and just having it in the case it caused issues. The system would randomly blue screen with a stop 0XF4. I started digging around and couldn't really find a definitive answer.

I looked over the specs on my motherboard and realized I had 2 different SATA III controllers. The intel, and marvell. I never had issues in the past so I never thought to really check it. I just knew the brown and grey ports were SATAIII. Apparently the Marvell controller is inferior to the intel. I moved the SATA cables off the marvell controller and onto the Intel controlled ports. I disabled the marvell controller in the bios since I wouldn't be using it. System currently running stable with zero issues.

Now my Linux OS will have an additional TB for storage.

Thursday, August 16, 2012

Solaris 10; SVM mirrors. Maintenence, last-erred

A server recently that was not being used is about to be put back into service. We wanted to check everything out on it and this was the status of its mirrors. I brought the box down to single to begin further investigations.

 # metastat -c  
 d106       m  22GB d16 d26 (resync-66%)  
   d16     s  22GB c1t0d0s6  
   d26     s  22GB c1t1d0s6 (resyncing)  
 d9        m  52GB d19 (maint) d29 (maint)  
   d19     s  52GB c1t2d0s6  
   d29     s  52GB c1t3d0s6  
 d8        m  16GB d18 (maint) d28 (maint)  
   d18     s  16GB c1t2d0s1  
   d28     s  16GB c1t3d0s1  
 d4        m  10GB d14 d24 (maint)  
   d14     s  10GB c1t0d0s4  
   d24     s  10GB c1t1d0s4 (maint)  
 d0        m  20GB d20 (maint) d10 (maint)  
   d20     s  20GB c1t1d0s0 (resyncing)  
   d10     s  20GB c1t0d0s0 (last-erred)  
 d1        m  16GB d11 d21  
   d11     s  16GB c1t0d0s1  
   d21     s  16GB c1t1d0s1

Right before I captured this I had actually already performed the following in single: (which is why it shows syncing)

 metareplace -e d106 c1t1d0s6

I also did the following commands (waiting for each one to finish syncing before the next):

 # metareplace -e d4 c1t1d0s4  
 d4: device c1t1d0s4 is enabled  
 # metasync d8  
 # metasync d9

I wanted to get the other mirrors out of maintenance in-case I had to replace a disk. Now, all I was left with was Mirror d0. Sd1 is having soft and hard errors shown in iostat -E, which is d10.

 # iostat -E  
 sd1    Soft Errors: 6 Hard Errors: 1131 Transport Errors: 0  
 Vendor: FUJITSU Product: MAY2073RCSUN72G Revision: 0401 Serial No: 6643S13H5W  
 Size: 73.41GB <73407865856 bytes>  
 Media Error: 969 Device Not Ready: 0 No Device: 162 Recoverable: 6  
 Illegal Request: 1 Predictive Failure Analysis: 0

Metastat of the mirror itself:

 # metastat d0  
 d0: Mirror  
   Submirror 0: d20  
    State: Needs maintenance  
   Submirror 1: d10  
    State: Needs maintenance  
   Pass: 1  
   Read option: roundrobin (default)  
   Write option: parallel (default)  
   Size: 41945472 blocks (20 GB)  
 d20: Submirror of d0  
   State: Needs maintenance  
   Invoke: metasync d0  
   Size: 41945472 blocks (20 GB)  
   Stripe 0:  
     Device   Start Block Dbase    State Reloc Hot Spare  
     c1t1d0s0     0   No    Resyncing  Yes  
 d10: Submirror of d0  
   State: Needs maintenance  
   Invoke: after replacing "Maintenance" components:  
         metareplace d0 c1t0d0s0 <new device>  
   Size: 41945472 blocks (20 GB)  
   Stripe 0:  
     Device   Start Block Dbase    State Reloc Hot Spare  
     c1t0d0s0     0   No   Last Erred  Yes

Sub mirror d10 'Last Erred' because it was the last valid copy of the volume before needing maintenance. sub mirror d20 is stuck trying to re-sync, but it cannot completed. My only choice was to attempt to fix the errors on the disk. I went through /var/adm/messages and found errors from the disk c1t0d0, but only from the root slice, which is s0; the d0 mirror.

I opted to try and perform an analyze read on just that section of disk. First to get the sectors of the slice:

 # prtvtoc /dev/rdsk/c1t0d0s2  
 * /dev/rdsk/c1t0d0s2 partition map  
 *  
 * Dimensions:  
 *   512 bytes/sector  
 *   424 sectors/track  
 *   24 tracks/cylinder  
 *  10176 sectors/cylinder  
 *  14089 cylinders  
 *  14087 accessible cylinders  
 *  
 * Flags:  
 *  1: unmountable  
 * 10: read-only  
 *  
 * Unallocated space:  
 *    First   Sector  Last  
 *    Sector   Count  Sector  
 *      0 33560448 33560447  
 *  
 *                        First    Sector      Last  
 * Partition Tag Flags   Sector     Count    Sector Mount Directory  
     0        2   00   33560448  41945472  75505919  
     1        3   01          0  33560448  33560447  
     2        5   00          0 143349312 143349311  
     3        0   00   75505920     91584  75597503  
     4        7   00   75597504  20972736  96570239  
     6        0   00   96570240  46687488 143257727  
     7        0   00  143257728     91584 143349311

Next, I went into format, selected the disk and chose analyze. Using the sector information above for slice 0, I entered the following:

 analyze> setup  
 Analyze entire disk[yes]? no  
 Enter starting block number[0, 0/0/0]: 33560448  
 Enter ending block number[143349311, 14086/23/423]: 75505919  
 Loop continuously[no]? no  
 Enter number of passes[2]:  
 Repair defective blocks[yes]?  
 Stop after first error[no]?  
 Use random bit patterns[no]?  
 Enter number of blocks per transfer[126]: 1  
 Verify media after formatting[yes]?  
 Enable extended messages[no]?  
 Restore defect list[yes]?  
 Restore disk label[yes]?  
 analyze> read  
 Ready to analyze (won't harm SunOS). This takes a long time,  
 but is interruptable with CTRL-C. Continue? y

And here I sit...waiting for this analyze to continue.

Wednesday, August 15, 2012

Solaris 10. Enterprise T2000 xt_sync; timeout and kernel panic.

An Enterprise T2000 running Solaris 10 was updated with an extensive patch set, including kernel patches.

After the system rebooted we started getting Kernel panics and the system would not boot up.

 Loading: /platform/SUNW,SPARC-Enterprise-T2000/kernel/sparcv9/unix  
 Loading: /platform/sun4v/kernel/sparcv9/unix  
 SunOS Release 5.10 Version Generic_147440-12 64-bit  
 Copyright (c) 1983, 2012, Oracle and/or its affiliates. All rights reserved.  
 os-io Cross trap sync timeout: at cpu_sync.xword[1]: 0x1010panic: failed to stop cpu8  
 panic: failed to stop cpu9  
 panic: failed to stop cpu10  
 panic: failed to stop cpu11  
 panic: failed to stop cpu12  
 panic: failed to stop cpu13  
 panic: failed to stop cpu14  
 panic: failed to stop cpu15  
   
 panic[cpu1]/thread=2a100bbfca0: xt_sync: timeout

One patch in-particular was causing the problem: 147440-12. I had found this snippet online which lead me to 147440-12.

 [Patch]  
  147440-02 or newer SunOS 5.10: Solaris kernel patch  
   
 [Note]  
  If you apply this patch to a system with firmware version older than   
  6.4.6, system panic occurs during booting and the following message is   
  output.  
   
  <Example of message>  
   ----------------------------------------------------------------------  
   panic[cpu14]/thread=2a1027f5ca0: xt_sync: timeout  
   ----------------------------------------------------------------------  
   
  <Environment>  
   This problem occurs on the following models.  
    - Sun Fire T1000/T2000  
    - SPARC Enterprise T1000/T2000  
   
  <Conditions>  
   Firmware version is older than 6.4.6. (*1)  
   
   *1) To confirm, execute the showhost command on ALOM.  
    In the following example, firmware version is 6.7.11.  
   
     <Example of command execution>  
      sc> showhost  
      SPARC-Enterprise-T1000 System Firmware 6.7.11 2010/10/12 12:34  
      Host flash versions:  
       OBP 4.30.4.b 2010/07/09 13:43  
       Hypervisor 1.7.3.c 2010/07/09 15:14  
       POST 4.30.4.b 2010/07/09 14:25

Running showhost on our T2000 revealed we have firmware 6.3.x. Our system wasn't bootable so I booted single from a cdrom (boot cdrom -s).

I mounted an NFS drive containing the updated firmware; 139434-09 which you can obtain from Oracle.

Next I ran the following commands:

 # ./sysfwdownload Sun_System_Firmware-6_7_12-SPARC_Enterprise_T2000.bin  
 .......... (10%).......... (20%).......... (30%).......... (41%)..........   
 (51%).......... (61%).......... (71%).......... (82%).......... (92%)........ (100%)  
 Download completed successfully  
   
 # init 0

Now you should be back at the 'ok' prompt. Now on the ALOM:

 sc> poweroff  
 SC Alert: SC Request to Power Off Host.  
   
 SC Alert: Host system has shut down.  
   
 sc> setkeyswitch -y normal  
 sc> flashupdate -s 127.0.0.1  
 sc> resetsc

When the SC comes back, and you should be at the latest revision. We booted the system, and it came up without errors.

Monday, August 13, 2012

Migrating old servers...

Migrating an older server to a newer linux distro that uses SElinux? If it's an ftp server setup with chroot jails to /home/ dirs you will need to set the following:

/usr/sbin/setsebool -P ftp_home_dir 1

Friday, August 10, 2012

bash: event not found

There are tons of small differences between bash and sh. Some you will never encounter. The one I get ever so often is !, the exclamation point. It is also the logical not operator.

We have a script that runs Suns' explorer and it tells explorer not to run nbu and fma designated by !nbu,!fma.

The script is a sh script. I had typed the command on another machine that was missing the script but kept getting the error, 'event not found.' It took me a few seconds to figure out what was happening. The ! has special meaning in bash shell and in some occurrences it has to be escaped depending on it's usage.

! can be used for history expansion. Example: 400 commands ago, number 398 of history it was something quite complex. You could retype it, copy it, or just run it again doing !398. So when I typed the command for explorer and did !fmu,!fma its trying to find that event, and it won't exist.

Be mindful using this. A few days ago you were in /tmp/some-directory and did sudo rm -rf ./ to remove some things. That's now on line 189 of your history. Now you're now in the /etc directory and accidentally did '!189', bad things would happen.

Give the bash manual a read regarding history expansion.
http://www.gnu.org/software/bash/manual/bashref.html#Bash-History-Facilities