A server recently that was not being used is about to be put back into service. We wanted to check everything out on it and this was the status of its mirrors. I brought the box down to single to begin further investigations.
# metastat -c
d106 m 22GB d16 d26 (resync-66%)
d16 s 22GB c1t0d0s6
d26 s 22GB c1t1d0s6 (resyncing)
d9 m 52GB d19 (maint) d29 (maint)
d19 s 52GB c1t2d0s6
d29 s 52GB c1t3d0s6
d8 m 16GB d18 (maint) d28 (maint)
d18 s 16GB c1t2d0s1
d28 s 16GB c1t3d0s1
d4 m 10GB d14 d24 (maint)
d14 s 10GB c1t0d0s4
d24 s 10GB c1t1d0s4 (maint)
d0 m 20GB d20 (maint) d10 (maint)
d20 s 20GB c1t1d0s0 (resyncing)
d10 s 20GB c1t0d0s0 (last-erred)
d1 m 16GB d11 d21
d11 s 16GB c1t0d0s1
d21 s 16GB c1t1d0s1
Right before I captured this I had actually already performed the following in single: (which is why it shows syncing)
metareplace -e d106 c1t1d0s6
I also did the following commands (waiting for each one to finish syncing before the next):
# metareplace -e d4 c1t1d0s4
d4: device c1t1d0s4 is enabled
# metasync d8
# metasync d9
I wanted to get the other mirrors out of maintenance in-case I had to replace a disk. Now, all I was left with was Mirror d0. Sd1 is having soft and hard errors shown in iostat -E, which is d10.
# iostat -E
sd1 Soft Errors: 6 Hard Errors: 1131 Transport Errors: 0
Vendor: FUJITSU Product: MAY2073RCSUN72G Revision: 0401 Serial No: 6643S13H5W
Size: 73.41GB <73407865856 bytes>
Media Error: 969 Device Not Ready: 0 No Device: 162 Recoverable: 6
Illegal Request: 1 Predictive Failure Analysis: 0
Metastat of the mirror itself:
# metastat d0
d0: Mirror
Submirror 0: d20
State: Needs maintenance
Submirror 1: d10
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 41945472 blocks (20 GB)
d20: Submirror of d0
State: Needs maintenance
Invoke: metasync d0
Size: 41945472 blocks (20 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 0 No Resyncing Yes
d10: Submirror of d0
State: Needs maintenance
Invoke: after replacing "Maintenance" components:
metareplace d0 c1t0d0s0 <new device>
Size: 41945472 blocks (20 GB)
Stripe 0:
Device Start Block Dbase State Reloc Hot Spare
c1t0d0s0 0 No Last Erred Yes
Sub mirror d10 'Last Erred' because it was the last valid copy of the volume before needing maintenance. sub mirror d20 is stuck trying to re-sync, but it cannot completed. My only choice was to attempt to fix the errors on the disk. I went through /var/adm/messages and found errors from the disk c1t0d0, but only from the root slice, which is s0; the d0 mirror.
I opted to try and perform an analyze read on just that section of disk. First to get the sectors of the slice:
# prtvtoc /dev/rdsk/c1t0d0s2
* /dev/rdsk/c1t0d0s2 partition map
*
* Dimensions:
* 512 bytes/sector
* 424 sectors/track
* 24 tracks/cylinder
* 10176 sectors/cylinder
* 14089 cylinders
* 14087 accessible cylinders
*
* Flags:
* 1: unmountable
* 10: read-only
*
* Unallocated space:
* First Sector Last
* Sector Count Sector
* 0 33560448 33560447
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 2 00 33560448 41945472 75505919
1 3 01 0 33560448 33560447
2 5 00 0 143349312 143349311
3 0 00 75505920 91584 75597503
4 7 00 75597504 20972736 96570239
6 0 00 96570240 46687488 143257727
7 0 00 143257728 91584 143349311
Next, I went into format, selected the disk and chose analyze. Using the sector information above for slice 0, I entered the following:
analyze> setup
Analyze entire disk[yes]? no
Enter starting block number[0, 0/0/0]: 33560448
Enter ending block number[143349311, 14086/23/423]: 75505919
Loop continuously[no]? no
Enter number of passes[2]:
Repair defective blocks[yes]?
Stop after first error[no]?
Use random bit patterns[no]?
Enter number of blocks per transfer[126]: 1
Verify media after formatting[yes]?
Enable extended messages[no]?
Restore defect list[yes]?
Restore disk label[yes]?
analyze> read
Ready to analyze (won't harm SunOS). This takes a long time,
but is interruptable with CTRL-C. Continue? y
And here I sit...waiting for this analyze to continue.
stumbled upon this post. I am too in the exact situation. Can you post results and what had you done please. This will help me immensely.
ReplyDelete