Page 3 - Contents; Preface
iii Contents Preface vii 1. Initial Inspection of the Server 1 Service Troubleshooting Flowchart 1 Gathering Service Information 2 System Inspection 3 Troubleshooting Power Problems 3 Externally Inspecting the Server 3 Internally Inspecting the Server 4 2. Using SunVTS Diagnostic Software 7 Running ...
Page 4 - Event Logs and POST Codes
iv Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Uncorrectable DIMM Errors 12 Correctable DIMM Errors 14 BIOS DIMM Error Messages 15 DIMM Fault LEDs 15 Isolating and Correcting DIMM ECC Errors 18 A. Event Logs and POST Codes 21 Viewing Event Logs 21 Power-On Self-Test (POS...
Page 5 - Handling of Uncorrectable Errors; Index
Contents v Handling of Uncorrectable Errors 53 Handling of Correctable Errors 56 Handling of Parity Errors (PERR) 59 Handling of System Errors (SERR) 61 Handling Mismatching Processors 63 Hardware Error Handling Summary 64 Index 69
Page 7 - Before You Read This Document
vii Preface The Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide contains information and procedures for using available tools to diagnose problems with the servers. Before You Read This Document It is important that you review the safety guidelines in the Sun Fire X4140, X4240, and X4440 ...
Page 8 - Related Documentation
viii Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Related Documentation The document set for the Sun Fire X4140, X4240, and X4440 Servers is described inthe Where To Find Sun Fire X4140, X4240, and X4440 Servers Documentation sheet that is packed with your system. You can...
Page 9 - Typographic ConventionsThird-Party
Preface ix Typographic ConventionsThird-Party Web Sites Sun ™ is not responsible for the availability of third-party web sites mentioned in this document. Sun does not endorse and is not responsible or liable for any content,advertising, products, or other materials that are available on or through ...
Page 10 - Sun Welcomes Your Comments
x Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Sun Welcomes Your Comments Sun is interested in improving its documentation and welcomes your comments andsuggestions. You can submit your comments by going to: http://www.sun.com/hwdocs/feedback Please include the title and ...
Page 11 - Initial Inspection of the Server; Service Troubleshooting Flowchart
1 C H A P T E R 1 Initial Inspection of the Server This chapter includes the following topics: ■ “Service Troubleshooting Flowchart” on page 1 ■ “Gathering Service Information” on page 2 ■ “System Inspection” on page 3 Service Troubleshooting Flowchart Use the following flowchart as a guideline for ...
Page 12 - Gathering Service Information; Collect information about the following items:
2 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Gathering Service Information The first step in determining the cause of a problem with the server is to gatherinformation from the service-call paperwork or the onsite personnel. Use thefollowing general guideline steps when...
Page 13 - System Inspection; Troubleshooting Power Problems; and to the AC sources.; Externally Inspecting the Server; contact that could short out power.
Chapter 1 Initial Inspection of the Server 3 System Inspection Controls that have been improperly set and cables that are loose or improperlyconnected are common causes of problems with hardware components. Troubleshooting Power Problems ■ If the server will power on, skip this section and go to “Ex...
Page 14 - Internally Inspecting the Server; standby power mode. See; Caution –
4 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Internally Inspecting the Server To perform a visual inspection of the internal system: 1. Choose a method for shutting down the server from main power mode to standby power mode. See FIGURE 1-1 and FIGURE 1-2 . ■ Graceful sh...
Page 15 - Note –; Verify that there are no loose or improperly seated components.; and
Chapter 1 Initial Inspection of the Server 5 FIGURE 1-2 X4440 Server Front Panel 2. Remove the server cover. For instructions on removing the server cover, refer to your server ’s servicemanual. 3. Inspect the internal status indicator LEDs. These can indicate component malfunction. For the LED loca...
Page 17 - Using SunVTS Diagnostic Software; Running SunVTS Diagnostic Tests
7 C H A P T E R 2 Using SunVTS Diagnostic Software This chapter contains information about the SunVTS™ diagnostic software tool. Running SunVTS Diagnostic Tests The servers are shipped with a Bootable Diagnostics CD that contains the SunValidation Test Suite (SunVTS) software. SunVTS provides a comp...
Page 18 - SunVTS Documentation; Requirements
8 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 ■ QLogic Host Bus Adapter Test (qlctest) ■ RAM Test (ramtest) ■ Serial Port Test (serialtest) ■ System Test (systest) ■ Tape Drive Test (tapetest) ■ Universal Serial Board Test (usbtest) ■ Virtual Memory Test (vmemtest) SunVT...
Page 19 - Using the Bootable Diagnostics CD; change the BIOS setting for boot-device priority.
Chapter 2 Using SunVTS Diagnostic Software 9 Using the Bootable Diagnostics CD To use the diagnostics CD to perform diagnostics: 1. With the server powered on, insert the CD into the DVD-ROM drive. 2. Reboot the server, and press F2 during the start of the reboot so that you can change the BIOS sett...
Page 20 - Print the log file –; If you want to save the log files
10 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 ■ Solaris system message log is a log of all the general Solaris events logged by syslogd . The path name of this log file is /var/adm/messages . a. Click the Log button. The Log file window is displayed. b. Specify the log ...
Page 21 - Troubleshooting DIMM Problems; DIMM Population Rules
11 C H A P T E R 3 Troubleshooting DIMM Problems This chapter describes how to detect and correct problems with the server ’s DualInline Memory Modules (DIMM)s. It includes the following sections: ■ “DIMM Population Rules” on page 11 ■ “DIMM Replacement Policy” on page 12 ■ “How DIMM Errors Are Hand...
Page 22 - DIMM Replacement Policy; Uncorrectable DIMM Errors
12 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 DIMM Replacement Policy Replace a DIMM when one of the following events takes place: ■ The DIMM fails memory testing under BIOS due to Uncorrectable Memory Errors(UCEs). ■ UCEs occur and investigation shows that the errors o...
Page 24 - Correctable DIMM Errors
14 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 The lines in the display start with event numbers (in hex), followed by a descriptionof the event. TABLE 3-1 describes the contents of the display: Correctable DIMM Errors If a DIMM has 24 or more correctable errors in 24 ho...
Page 25 - BIOS DIMM Error Messages; DIMM Fault LEDs
Chapter 3 Troubleshooting DIMM Problems 15 to view ECC errors ■ Linux: The HERD utility can be used to manage DIMM errors in Linux. See the x64 Servers Utilities Reference Manual for details. ■ If HERD is installed, it copies messages from /dev/mcelog to /var/log/messages . ■ If HERD is not installe...
Page 28 - and remove the cover.
18 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 FIGURE 3-2 DIMMs and LEDs on Mezzanine Board Isolating and Correcting DIMM ECCErrors If your log files report an ECC error or a problem with a DIMM, complete the stepsbelow until you can isolate the fault. In this example, t...
Page 30 - Power on the server and run the diagnostics test again.
20 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 11. Power on the server and run the diagnostics test again. 12. Review the log file. If the tests identify the same error, the problem is in the CPU, not the DIMMs.
Page 31 - Viewing Event Logs
21 A P P E N D I X A Event Logs and POST Codes This appendix contains information about the BIOS event log, the BMC system eventlog, the power-on self-test (POST), and console redirection. It contains the followingsections: ■ “Viewing Event Logs” on page 21 ■ “Power-On Self-Test (POST)” on page 25 V...
Page 33 - c. From the Event Logging Details screen, select View Event Log.
Appendix A Event Logs and POST Codes 23 b. From the Advanced Settings screen, select Event Log Configuration. The Advanced Menu Event Logging Details screen is displayed. c. From the Event Logging Details screen, select View Event Log. All unread events are displayed. 4. View the BMC system event lo...
Page 34 - If the problem with the server is not evident, continue with
24 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 c. From the IPMI 2.0 Configuration screen, select View BMC System Event Log. The log takes about 60 seconds to generate, then it is displayed on the screen. 5. If the problem with the server is not evident, continue with “Us...
Page 35 - How BIOS POST Memory Testing Works
Appendix A Event Logs and POST Codes 25 Power-On Self-Test (POST) The system BIOS provides a rudimentary power-on self-test. The basic devicesrequired for the server to operate are checked, memory is tested, the LSI 1064 diskcontroller and attached disks are probed and enumerated, and the two Intel ...
Page 36 - Redirecting Console Output; root
26 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Redirecting Console Output Use the following instructions to access the service processor and redirect theconsole output so that the BIOS POST codes can be read. 1. Initialize the BIOS Setup utility by pressing the F2 key wh...
Page 38 - Changing POST Options
28 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Changing POST Options These instructions are optional, but you can use them to change the operations thatthe server performs during POST testing. To change POST options: 1. Initialize the BIOS Setup utility by pressing the F...
Page 40 - Wait for F1 if Error –; Interrupt 19 Capture –; Default Boot Order –
30 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 ■ Boot Num-Lock – This option is On by default (keyboard Num-Lock is turned on during boot). If you set this to off, the keyboard Num-Lock is not turned onduring boot. ■ Wait for F1 if Error – This option is disabled by defa...
Page 47 - Status Indicator LEDs; External Status Indicator LEDs
37 A P P E N D I X B Status Indicator LEDs This appendix contains information about the locations and behavior of the LEDs onthe server. It describes the external LEDs that can be viewed on the outside of theserver and the internal LEDs that can be viewed only with the main cover removed. External S...
Page 49 - Internal Status Indicator LEDs
Appendix B Status Indicator LEDs 39 Hard Drive LEDs FIGURE B-3 Hard Drive LEDs Internal Status Indicator LEDs The server has internal status indicators on the motherboard, and on the mezzanineboard. For motherboard locations, see FIGURE B-4 . For mezzanine board locations, see FIGURE B-5 . ■ The DIM...
Page 54 - Making a Serial Connection to the SP; terminal device and the ILOM SP.; Log in to the SP and type the default user name,; start
44 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Making a Serial Connection to the SP To make a serial connection to the SP: 1. Connect a serial cable from the RJ-45 Serial Management port on server to a terminal device. 2. Press ENTER on the terminal device to establish a...
Page 55 - Viewing ILOM SP Event Logs; a. Type the IP address of the server ’s SP into your web browser.
Appendix C Using the ILOM Service Processor GUI to View System Information 45 Viewing ILOM SP Event Logs Events are notifications that occur in response to some actions. The IPMI systemevent log (SEL) provides status information about the server ’s hardware andsoftware to the ILOM software, which di...
Page 57 - Click OK to clear all entries in the log.; Interpreting Event Log Time Stamps
Appendix C Using the ILOM Service Processor GUI to View System Information 47 After you have selected a category of event, the Event Log table is updated with thespecified events. The fields in the Event Log are described in TABLE C-1 . 4. To clear the event log, click the Clear Event Log button. A ...
Page 59 - component information, continue with
Appendix C Using the ILOM Service Processor GUI to View System Information 49 2. From the System Information tab, select Components. The Replaceable Component Information page is displayed. See FIGURE C-2 . FIGURE C-2 Replaceable Component Information Page 3. Select a component from the drop-down li...
Page 60 - Viewing Sensors
50 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Viewing Sensors This section describes how to view the server temperature, voltage, and fan sensorreadings. For a complete list of sensors, see Appendix D . To view sensor readings: 1. Log in to the SP as Administrator or Op...
Page 61 - Click a sensor to display its thresholds.
Appendix C Using the ILOM Service Processor GUI to View System Information 51 FIGURE C-3 Sensor Readings Page 3. Click the Refresh button to update the sensor readings to their current status. 4. Click a sensor to display its thresholds. A display of properties and values appears. See the example in...
Page 62 - information, continue with
52 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 FIGURE C-4 Sensor Details Page 5. If the problem with the server is not evident after viewing sensor readings information, continue with “Running SunVTS Diagnostic Tests” on page 7 .
Page 63 - Error Handling
53 A P P E N D I X D Error Handling This appendix contains information about how the servers process and log errors.See the following sections: ■ “Handling of Uncorrectable Errors” on page 53 ■ “Handling of Correctable Errors” on page 56 ■ “Handling of Parity Errors (PERR)” on page 59 ■ “Handling of...
Page 66 - Handling of Correctable Errors
56 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008 Handling of Correctable Errors This section lists facts and considerations about how the server handles correctableerrors. ■ During BIOS POST: ■ The BIOS polls the MCK registers. ■ The BIOS logs to DMI. ■ The BIOS logs to th...
Page 73 - Handling Mismatching Processors
Appendix D Error Handling 63 Handling Mismatching Processors This section lists facts and considerations about how the server handles mismatchingprocessors. ■ The BIOS performs a complete POST. ■ The BIOS displays a report of any mismatching CPUs, as shown in the followingexample: ■ No SEL or DMI ev...