Wednesday, 13 January 2016

Troubleshooting with ILOM when Exalytics Hang


Its been almost few years now that exalytics has been released and many OBIEE and EPM application administrators face this particular situation where the exalytics will hang, Meaning the machine IP will not be reachable.

Best suggestion : Open a SR with Oracle support if you face any issue with Exalytics.

why?

There are not many blogs or questions on the OTN on exalytics when there is a issue in the server. Exalytics is a powerful machine which can hold as many applications and give a great performance. So its always better to take help from Oracle support for any troubleshooting.

Sharing my experience here............

As the blog post says there are some instances when the Exalytics wont respond but the machine state will be in ON state.

The best troubleshooting here would be to use the ILOM ip address configured along with the exalytics machine.


( if the ILOM ip address is reachable)

Login into the ILOM address . (using Putty)

Take a ILOM snapshot using the CLI

-> set /SP/diag/snapshot dataset=normal 

Set transfer method "tftp, ftp, ftps, sftp, http, https", (available protocols depends on the installed system firmware), run ilom snapshot and save it 
-> set /SP/diag/snapshot dump_uri=[tftp|ftp|sftp|scp|http|https]://[username:password@server_ip_or_server_hostname]/[file-path] 

To confirm snapshot running/completed: 
-> show /SP/diag/snapshot 


/SP/diag/snapshot 
Targets: 

Properties: 
dataset = normal 
dump_uri = (Cannot show property) 
encrypt_output = false 
result = Collecting data into 
sftp://admin:*****@127.0.0.1/<folder>/<machinename<_<datetime>.zip 
Snapshot Complete. 
Done. 


Commands: 
cd 
set 
show 
Make sure the Snapshot is in complete position before doing any other troubleshooting.

The ILOM snapshot is very important for post analysis why the issue happened in the Exalytics machine in the first place.

After successful snapshot completion. check the status of the /SYS to make sure the system is still powered ON and you are not able to reach the system.

-> show /SYS 

/SYS 
Targets: 
MB 
SP 
PS0 
PS1 
FB 
DBP 
INTSW 
VPS 
T_AMB 
OK 
SERVICE 
LOCATE 
PS_FAULT 
FAN_FAULT 
TEMP_FAULT 
CPU_FAULT 
MEMORY_FAULT 

Properties: 
type = Host System 
ipmi_name = /SYS 
product_name = 
product_part_number =  
product_serial_number = 
product_manufacturer = Oracle Corporation 
fault_state = OK 
clear_fault_action = (none) 
power_state = ON

Commands: 
cd 
reset 
set 
show 
start 
stop 
If the power_state is ON as shown in the above output. then the exalytics base machine is still ON and cannot be reached. If it is OFF then it is in shutdown state.

in case of OFF state. you can directly start the SYS using start /SYS (start system), This will start system.

But when the machine is in ON state. Make sure you either try to stop the machine using the CLI or using the console.

stop /SYS 

if this doesn't stop the machine ( Power_state = OFF) then try the force option.

stop -force /SYS

this will make sure the system is going to powered off forcefully.

stop -force /SYS 
Are you sure you want to immediately stop /SYS (y/n)? y 
Stopping /SYS immediately 
Check the status again show /SYS  to check the power_state and if it shows off go ahead and do the start /SYS 

This will power on the system and will be available and online. later you can start the applications.

In case you dont want to do the restart using commands but want to use the ILOM web interface and do a powercycle.

refer https://docs.oracle.com/cd/E20815_01/html/E20819/gjfcb.html to do the powercycle using the web interface.

After reboot:

Check all the services are running perfectly or not.
Mount any filesystems if needed.
Collect the OSWATCHER logs.
Collect the SOSREPORT of linux.

Upload all the information to the support and get the root cause to find a permanent resolution.



Note :

- All the commands are case sensitive.
- Some times multiple stop & start are required to get the machine online.

References :

https://docs.oracle.com/cd/E19569-01/820-1188-12/core_ilom_appa.html 
https://docs.oracle.com/cd/E19902-01/html/821-1611/givdc.html
https://docs.oracle.com/cd/E19140-01/html/821-0284/gkbtw.html
https://docs.oracle.com/cd/E19121-01/sf.x2250/820-4592-12/ilom_rem_console.html


The above are the common troubleshooting i would follow when i get the same issue . Please not e the same might not work for you. So its better to check with the support right away.

No comments:

Post a Comment