Its been almost few years now that exalytics has been released and many OBIEE and EPM application administrators face this particular situation where the exalytics will hang, Meaning the machine IP will not be reachable.
Best suggestion : Open a SR with Oracle support if you face any issue with Exalytics.
why?
There are not many blogs or questions on the OTN on exalytics when there is a issue in the server. Exalytics is a powerful machine which can hold as many applications and give a great performance. So its always better to take help from Oracle support for any troubleshooting.
Sharing my experience here............
As the blog post says there are some instances when the Exalytics wont respond but the machine state will be in ON state.
The best troubleshooting here would be to use the ILOM ip address configured along with the exalytics machine.
( if the ILOM ip address is reachable)
Login into the ILOM address . (using Putty)
Take a ILOM snapshot using the CLI
-> set /SP/diag/snapshot dataset=normal
Set transfer method "tftp, ftp, ftps, sftp, http, https", (available protocols depends on the installed system firmware), run ilom snapshot and save it
-> set /SP/diag/snapshot dump_uri=[tftp|ftp|sftp|scp|http|https]://[username:password@server_ip_or_server_hostname]/[file-path]
To confirm snapshot running/completed:
-> show /SP/diag/snapshot
/SP/diag/snapshot
Targets:
Properties:
dataset = normal
dump_uri = (Cannot show property)
encrypt_output = false
result = Collecting data into
sftp://admin:*****@127.0.0.1/<folder>/<machinename<_<datetime>.zip
Snapshot Complete.
Done.
Commands:
cd
set
show
Make sure the Snapshot is in complete position before doing any other troubleshooting.
The ILOM snapshot is very important for post analysis why the issue happened in the Exalytics machine in the first place.
After successful snapshot completion. check the status of the /SYS to make sure the system is still powered ON and you are not able to reach the system.
-> show /SYS
/SYS
Targets:
MB
SP
PS0
PS1
FB
DBP
INTSW
VPS
T_AMB
OK
SERVICE
LOCATE
PS_FAULT
FAN_FAULT
TEMP_FAULT
CPU_FAULT
MEMORY_FAULT
Properties:
type = Host System
ipmi_name = /SYS
product_name =
product_part_number =
product_serial_number =
product_manufacturer = Oracle Corporation
fault_state = OK
clear_fault_action = (none)
power_state = ON
Commands:
cd
reset
set
show
start
stop
If the power_state is ON as shown in the above output. then the exalytics base machine is still ON and cannot be reached. If it is OFF then it is in shutdown state.
in case of OFF state. you can directly start the SYS using
start /SYS (start system), This will start system.
But when the machine is in ON state. Make sure you either try to stop the machine using the CLI or using the console.
stop /SYS
if this doesn't stop the machine ( Power_state = OFF) then try the force option.
stop -force /SYS
this will make sure the system is going to powered off forcefully.
stop -force /SYS
Are you sure you want to immediately stop /SYS (y/n)? y
Stopping /SYS immediately
Check the status again
show /SYS to check the power_state and if it shows
off go ahead and do the
start /SYS
This will power on the system and will be available and online. later you can start the applications.
In case you dont want to do the restart using commands but want to use the ILOM web interface and do a powercycle.
refer
https://docs.oracle.com/cd/E20815_01/html/E20819/gjfcb.html to do the powercycle using the web interface.
After reboot:
Check all the services are running perfectly or not.
Mount any filesystems if needed.
Collect the OSWATCHER logs.
Collect the SOSREPORT of linux.
Upload all the information to the support and get the root cause to find a permanent resolution.
Note :
- All the commands are case sensitive.
- Some times multiple stop & start are required to get the machine online.
References :
https://docs.oracle.com/cd/E19569-01/820-1188-12/core_ilom_appa.html
https://docs.oracle.com/cd/E19902-01/html/821-1611/givdc.html
https://docs.oracle.com/cd/E19140-01/html/821-0284/gkbtw.html
https://docs.oracle.com/cd/E19121-01/sf.x2250/820-4592-12/ilom_rem_console.html
The above are the common troubleshooting i would follow when i get the same issue . Please not e the same might not work for you. So its better to check with the support right away.