Page 5 - Table of Contents; Section 1; Configuring
IB6054601-00 D Page v Table of Contents Section 1 Introduction 1.1 Who Should Read this Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.2 How this Guide is Organized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.3 Overview . . . . . . . . ....
Page 6 - Section 3; Configuring for
InfiniPath User GuideVersion 2.0 Page vi IB6054601-00 D Q 2.10 Performance and Management Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 2.10.1 Remove Unneeded Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 2.10.2 Disable Powersaving Features . . . ....
Page 7 - Generating the
InfiniPath User Guide IB6054601-00 D Page vii Q InfiniPath User Guide Version 2.0 3.11 Debugging MPI Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20 3.11.1 MPI Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2...
Page 8 - ssh
InfiniPath User GuideVersion 2.0 Page viii IB6054601-00 D Q C.4.5 OpenFabrics Load Errors If ib_ipath Driver Load Fails . . . . . . . . . . C-10 C.4.6 InfiniPath ib_ipath Initialization Failure . . . . . . . . . . . . . . . . . . . . . . C-11 C.4.7 MPI Job Failures Due to Initialization Problems . ....
Page 10 - Notes
InfiniPath User GuideVersion 2.0 Page x IB6054601-00 D Q Notes
Page 11 - Introduction; InfiniPath User Guide.; Who Should Read this Guide; InfiniPath User Guide
IB6054601-00 D 1-1 Section 1 Introduction This chapter describes the objectives, intended audience, and organization of the InfiniPath User Guide. The InfiniPath User Guide is intended to give the end users of an InifiniPath cluster what they need to know to use it. In this case, end users are under...
Page 12 - Overview; Use the OpenSM component of OpenFabrics.; Switches
1 – IntroductionInteroperability 1-2 IB6054601-00 D Q ■ Appendix E Glossary of technical terms ■ Index In addition, the InfiniPath Install Guide contains information on InfiniPath hardware and software installation. 1.3 Overview The material in this documentation pertains to an InfiniPath cluster. T...
Page 13 - What’s New in this Release
1 – Introduction What’s New in this Release IB6054601-00 D 1-3 Q NOTE: OpenFabrics was known as OpenIB until March 2006. All relevant references to OpenIB in this documentation have been updated to reflect this change. See the OpenFabrics website at http://www.openfabrics.org for more information on...
Page 14 - Supported Distributions and Kernels; IBM Power systems run only with the SLES 10 distribution.
1 – IntroductionSupported Distributions and Kernels 1-4 IB6054601-00 D Q Support for multiple versions of MPI has been added. You can use a different version of MPI and achieve the high-bandwidth and low-latency performance that is standard with InfiniPath MPI. Also included is expanded operating sy...
Page 15 - Software Components
1 – Introduction Software Components IB6054601-00 D 1-5 Q 1.8 Software Components The software provided with the InfiniPath Interconnect product consists of: ■ InfiniPath driver (including OpenFabrics) ■ InfiniPath ethernet emulation ■ InfiniPath libraries ■ InfiniPath utilities, configuration, and ...
Page 16 - InfiniPath Install Guide
1 – IntroductionDocumentation and Technical Support 1-6 IB6054601-00 D Q NOTE: 32 bit OpenFabrics programs using the verb interfaces are not supported in this InfiniPath release, but will be supported in a future release. 1.9 Conventions Used in this Document This Guide uses these typographical conv...
Page 17 - Readme file
1 – Introduction Documentation and Technical Support IB6054601-00 D 1-7 Q ■ Readme file The Troubleshooting Appendix for installation, InfiniPath and OpenFabrics administration, and MPI issues is located in the InfiniPath User Guide . Visit the QLogic support Web site for documentation and the lates...
Page 19 - Section 2; InfiniPath Cluster Administration; The InfiniPath driver; Installed Layout
IB6054601-00 D 2-1 Section 2 InfiniPath Cluster Administration This chapter describes what the cluster administrator needs to know about the InfiniPath software and system administration. 2.1 Introduction The InfiniPath driver ib_ipath , layered Ethernet driver ipath_ether , OpenSM, and other module...
Page 20 - Memory Footprint
2 – InfiniPath Cluster AdministrationMemory Footprint 2-2 IB6054601-00 D Q MPI include files are in: /usr/include MPI programming examples and source for several MPI benchmarks are in: /usr/share/mpich/examples InfiniPath utility programs, as well as MPI utilities and benchmarks are installed in: /u...
Page 22 - This breaks down to a memory footprint of 331MB per node, as follows:; Configuration and Startup; ACPI needs to be enabled
2 – InfiniPath Cluster AdministrationConfiguration and Startup 2-4 IB6054601-00 D Q This breaks down to a memory footprint of 331MB per node, as follows: 2.4 Configuration and Startup 2.4.1 BIOS Settings A properly configured BIOS is required. The BIOS settings, which are stored in non-volatile memo...
Page 23 - InfiniPath Driver Startup
2 – InfiniPath Cluster Administration Configuration and Startup IB6054601-00 D 2-5 Q You can check and adjust these BIOS settings using the BIOS Setup Utility. For specific instructions on how to do this, follow the hardware documentation that came with your system. 2.4.2 InfiniPath Driver Startup T...
Page 25 - Configuration on Fedora and RHEL4
2 – InfiniPath Cluster Administration Configuration and Startup IB6054601-00 D 2-7 Q You must create a network device configuration file for the layered Ethernet device on the InfiniPath adapter. This configuration file will resemble the configuration files for the other Ethernet devices on the node...
Page 27 - infinipath
2 – InfiniPath Cluster Administration Configuration and Startup IB6054601-00 D 2-9 Q Step 3 is applicable only to SLES 10; it is required because SLES 10 uses a newer version of the udev subsystem. NOTE: The MAC address (media access control address) is a unique identifier attached to most forms of ...
Page 29 - OpenFabrics Configuration and Startup; off; Configuring the IPoIB Network Interface; sh
2 – InfiniPath Cluster Administration Configuration and Startup IB6054601-00 D 2-11 Q 6. To verify that the configuration files are correct, you will normally now be able to run the commands: # ifup eth2 # ifconfig eth2 Note that it may be necessary to reboot the system before the configuration chan...
Page 30 - ping; OpenSM; opensm; on; chkconfig; opensmd
2 – InfiniPath Cluster AdministrationConfiguration and Startup 2-12 IB6054601-00 D Q To verify the configuration, type: # ifconfig ib0 The output from this command should be similar to this: ib0 Link encap:InfiniBand HWaddr 00:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:10.1.1...
Page 31 - SRP
2 – InfiniPath Cluster Administration Starting and Stopping the InfiniPath Software IB6054601-00 D 2-13 Q and you can stop it again like this: # /etc/init.d/opensmd stop If you wish to pass any arguments to the OpenSM program, modify the file: /etc/init.d/opensmd and add the arguments to the "OP...
Page 32 - restart
2 – InfiniPath Cluster AdministrationStarting and Stopping the InfiniPath Software 2-14 IB6054601-00 D Q To disable the driver on the next system boot, use the command (as root): # chkconfig infinipath off NOTE: This does not stop and unload the driver, if it is already loaded. You can start, stop, ...
Page 33 - Software Status
2 – InfiniPath Cluster Administration Configuring ssh and sshd Using shosts.equiv IB6054601-00 D 2-15 Q If there is output, you should look at the output from this command to determine if it is configured: $ /sbin/ifconfig -a Finally, if you need to find which InfiniPath and OpenFabrics modules are ...
Page 35 - Process Limitation with
2 – InfiniPath Cluster Administration Performance and Management Tips IB6054601-00 D 2-17 Q 0zwxSL7GP1nEyFk9wAxCrXv3xPKxQaezQKs+KL95FouJvJ4qrSxxHdd1NYNR0D avEBVQgCaspgWvWQ8cL 0aUQmTbggLrtD9zETVU5PCgRlQL6I3Y5sCCHuO7/UvTH9nneCg== Change the file to mode 600 when finished editing. 4. On each node, the ...
Page 36 - Disable Powersaving Features
2 – InfiniPath Cluster AdministrationPerformance and Management Tips 2-18 IB6054601-00 D Q nodes. Since these are presumed to be specialized computing appliances, they do not need many of the service daemons normally running on a general Linux computer. Following are several groups constituting a mi...
Page 37 - Balanced Processor Power
2 – InfiniPath Cluster Administration Performance and Management Tips IB6054601-00 D 2-19 Q For SUSE 9.3 and 10.0 run this command as root: # /sbin/chkconfig --level 12345 powersaved off After running either of these commands, the system will need to be rebooted for these changes to take effect. 2.1...
Page 38 - Homogeneous Nodes
2 – InfiniPath Cluster AdministrationPerformance and Management Tips 2-20 IB6054601-00 D Q 2.10.6 Hyper-Threading If using Intel processors that support Hyper-Threading, it is recommended that HyperThreading is turned off in the BIOS. This will provide more consistent performance. You can check and ...
Page 39 - Note that i; The option; strings are available in
2 – InfiniPath Cluster Administration Performance and Management Tips IB6054601-00 D 2-21 Q 00: LID=0x30 MLID=0x0 GUID=00:11:75:00:00:07:11:97 Serial: 1236070407 Note that i path_control will report whether the installed adapter is the QHT7040, QHT7140, or the QLE7140. It will also report whether th...
Page 40 - Customer Acceptance Utility
2 – InfiniPath Cluster AdministrationCustomer Acceptance Utility 2-22 IB6054601-00 D Q $Id: kernel.org InfiniPath Release 2.0 $ $Date: 2006-09-15-04:16 $ /lib/modules/2.6.16.21-0.8-smp/updates/ipath.ko: $Id: kernel.org InfiniPath Release2.0 $ $Date: 2006-09-15-04:20 $ NOTE: ident is in the optional ...
Page 43 - Using InfiniPath MPI; InfiniPath MPI; if you have problems compiling or
IB6054601-00 D 3-1 Section 3 Using InfiniPath MPI This chapter provides information on using InfiniPath MPI. Examples are provided for compiling and running MPI programs. 3.1 InfiniPath MPI QLogic’s implementation of the MPI standard is derived from the MPICH reference implementation Version 1.2.6. ...
Page 44 - An Example C Program; mpicc
3 – Using InfiniPath MPIGetting Started with MPI 3-2 IB6054601-00 D Q These examples assume that: ■ Your cluster administrator has properly installed InfiniPath MPI and the PathScale compilers. ■ Your cluster’s policy allows you to use the mpirun script directly, without having to submit the job to ...
Page 45 - Examples Using Other Languages
3 – Using InfiniPath MPI Getting Started with MPI IB6054601-00 D 3-3 Q Here ./cpi designates the executable of the example program in the working directory. The -np parameter to mpirun defines the number of processes to be used in the parallel computation. Now try it with four processes: $ mpirun -n...
Page 46 - Configuring MPI Programs for InfiniPath MPI
3 – Using InfiniPath MPIConfiguring MPI Programs for InfiniPath MPI 3-4 IB6054601-00 D Q and run it with: $ mpirun -np 2 -m mpihosts ./pi3f90 The C++ program hello++.cc is a parallel processing version of the traditional “Hello, World” program. Notice that this version makes use of the external C bi...
Page 47 - InfiniPath MPI Details
3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-5 Q You may need to instead pass arguments to configure directly, in a fashion similar to this: $ ./configure -cc=mpicc -fc=mpif77 -c++=mpicxx -c++linker=mpicxx Sometimes you may need to edit a Makefile to achieve this result, adding l...
Page 48 - ForwardAgent yes
3 – Using InfiniPath MPIInfiniPath MPI Details 3-6 IB6054601-00 D Q The process is shown in the following steps: 1. Create a key pair. Use the default file name, and be sure to enter a passphrase. $ ssh-keygen -t rsa 2. Enter a passphrase for your key pair when prompted. Note that the key agent does...
Page 49 - Compiling and Linking; . The 3.0 compiler release will support the GNU 4.x compiler
3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-7 Q 3.5.2 Compiling and Linking These scripts invoke the compiler and linker for programs in each of the respective languages, and take care of referring to the correct include files and libraries in each case. mpicc mpicxx mpif77 mpif...
Page 50 - To Use Another Compiler; gfortran
3 – Using InfiniPath MPIInfiniPath MPI Details 3-8 IB6054601-00 D Q line options. See the PathScale compiler documentation and the man pages for pathcc and pathf90 for complete information on its options. See the corresponding documentation for any other compiler/linker you may call for its options....
Page 51 - Compiler and Linker Variables; Cross-compilation Issues
3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-9 Q To use the Intel compiler for Fortran90/Fortran95 programs, use: $ mpif90 -f90=ifort ..... $ mpif95 -f95=ifort ..... Usage for other compilers will be similar to the examples above, substituting the options following -cc , -CC , -f...
Page 52 - cc; Running MPI Programs; stdout
3 – Using InfiniPath MPIInfiniPath MPI Details 3-10 IB6054601-00 D Q The current workaround for this is to compile on a supported and compatible distribution, then run the executable on one of the systems that uses the GNU 4.x compilers and environment. ■ To run on FC4 or FC5, install FC3 or RHEL4/C...
Page 53 - node; The; machines
3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-11 Q program-name will generally be the pathname to the executable MPI program. If the MPI program resides in the current directory and the current directory is not in your search path, then program-name must begin with ‘./’, such as: ...
Page 54 - Console I/O in MPI Programs; stdin; Environment for Node Programs; not
3 – Using InfiniPath MPIInfiniPath MPI Details 3-12 IB6054601-00 D Q programs will be started on that host before using the next entry in the mpihosts file. If the full mpihosts file is processed, and there are still more processes requested, processing starts again at the start of the file. You hav...
Page 55 - Environment for Multiple Versions of InfiniPath or MPI
3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-13 Q LD_LIBRARY_PATH, and other environment variables for the node programs through the use of the -rcfile option of mpirun: $ mpirun -np n -m mpihosts -rcfile mpirunrc program In the absence of this option, mpirun checks to see if a f...
Page 56 - Multiprocessor Nodes
3 – Using InfiniPath MPIInfiniPath MPI Details 3-14 IB6054601-00 D Q 3.5.9 Multiprocessor Nodes Another command line option, -ppn , instructs mpirun to assign a fixed number p of node programs to each node, as it distributes the n instances among the nodes: $ mpirun -np n -m mpihosts -ppn p program-...
Page 57 - gdb
3 – Using InfiniPath MPI InfiniPath MPI Details IB6054601-00 D 3-15 Q -verbose Print diagnostic messages from mpirun itself. Can be useful in troubleshooting Default: Off -version, -v Print MPI version. Default: Off -help, -h Print mpirun help message. Default: Off -rcfile node-shell-script Startup ...
Page 59 - Using Other MPI Implementations; PATH; MPI Over uDAPL
3 – Using InfiniPath MPI MPD IB6054601-00 D 3-17 Q -statsfile file-prefix Specifies alternate file to receive the output from the -print-stats option. Default: stderr 3.6 Using Other MPI Implementations Support for multiple MPI implementations has been added. You can use a different version of MPI a...
Page 60 - Linux File I/O in MPI Programs
3 – Using InfiniPath MPIFile I/O in MPI 3-18 IB6054601-00 D Q 3.8.1 MPD Description The Multi-Purpose Daemon (MPD) was developed by Argonne National Laboratory (ANL), as part of the MPICH-2 system. While the ANL MPD had certain advantages over the use of their mpirun (faster launching, better cleanu...
Page 61 - in C, or by the use of MPI; InfiniPath MPI and Hybrid MPI/OpenMP Applications; funneled thread; , but typically only
3 – Using InfiniPath MPI InfiniPath MPI and Hybrid MPI/OpenMP Applications IB6054601-00 D 3-19 Q accessed via some network file system, typically NFS. Parallel programs usually need to have some data in files to be shared by all of the processes of an MPI job. Node programs may also use non-shared, ...
Page 62 - Debugging MPI Programs; appendix D; Using Debuggers
3 – Using InfiniPath MPIDebugging MPI Programs 3-20 IB6054601-00 D Q may be desirable to run multiple MPI processes and multiple OpenMP threads per node. The number of OpenMP threads is typically controlled by the OMP_NUM_THREADS environment variable in the . mpirunrc file. This may be used to adjus...
Page 63 - InfiniPath MPI Limitations
3 – Using InfiniPath MPI InfiniPath MPI Limitations IB6054601-00 D 3-21 Q Symbolic debugging is easier than machine language debugging. To enable symbolic debugging you must have compiled with the -g option to mpicc so that the compiler will have included symbol tables in the compiled object code. T...
Page 65 - Appendix A; Benchmark Programs; Benchmark 1: Measuring MPI Latency Between Two Nodes; latency for a message of given size
IB6054601-00 D A-1 Appendix A Benchmark Programs Several MPI performance measurement programs are installed from the mpi-benchmark RPM. This Appendix describes these useful benchmarks and how to run them. These programs are based on code from the group of Dr. Dhabaleswar K. Panda at the Network-Base...
Page 66 - Benchmark 2: Measuring MPI Bandwidth Between Two Nodes
A – Benchmark ProgramsBenchmark 2: Measuring MPI Bandwidth Between Two Nodes A-2 IB6054601-00 D Q This benchmark always involves just two node programs. You can run it with the command: $ mpirun -np 2 -ppn 1 -m mpihosts osu_latency The -ppn 1 option is needed to be certain that the two communicating...
Page 67 - You can run this program with:; Typical output might look like:; Benchmark 3: Messaging Rate Microbenchmarks; is the microbenchmark used to highlight QLogic’s messaging rate
A – Benchmark Programs Benchmark 3: Messaging Rate Microbenchmarks IB6054601-00 D A-3 Q MPI_Isend function, while the receiving node consumes them as quickly as it can using the non-blocking MPI_Irecv, and then returns a zero-length acknowledgement when all of the set has been received. You can run ...
Page 69 - Benchmark 4: Measuring MPI Latency in Host Rings; The program; might produce output like this:
A – Benchmark Programs Benchmark 4: Measuring MPI Latency in Host Rings IB6054601-00 D A-5 Q A.4 Benchmark 4: Measuring MPI Latency in Host Rings The program mpi_latency can be used to measure latency in a ring of hosts. Its syntax is a bit different from Benchmark 1 in that it takes command line ar...
Page 71 - Appendix B; Integration with a Batch Queuing System; A Batch Queuing Script; Allocating Resources
IB6054601-00 D B-1 Appendix B Integration with a Batch Queuing System Most cluster systems use some kind of batch queuing system as an orderly way to provide users with access to the resources they need to meet their job’s performance requirements. One of the tasks of the cluster administrator is to...
Page 73 - Simple Process Management; fuser; stats
B – Integration with a Batch Queuing System A Batch Queuing Script IB6054601-00 D B-3 Q by mpirun. Each line consists of a node name, a colon, and the number of processes to start on that node. NOTE: This is one of two formats that the file may use. See section 3.5.6 for more information. B.1.3 Simp...
Page 74 - Lock Enough Memory on Nodes When Using SLURM
B – Integration with a Batch Queuing SystemLock Enough Memory on Nodes When Using SLURM B-4 IB6054601-00 D Q The following command will terminate all processes using the InfiniPath interconnect: # /sbin/fuser -k /dev/ipath For more information, see the man pages for fuser(1) and lsof(8). NOTE: Run t...
Page 75 - Appendix C; Troubleshooting; Troubleshooting InfiniPath Adapter Installation
IB6054601-00 D C-1 Appendix C Troubleshooting This Appendix describes some of the existing provisions for diagnosing and fixing problems. The sections are organized in the following order: ■ C.1 “Troubleshooting InfiniPath adapter installation” ■ C.2 “BIOS settings” ■ C.3 “Software installation issu...
Page 77 - MTRR Mapping and Write Combining
C – Troubleshooting BIOS Settings IB6054601-00 D C-3 Q C.2.1 MTRR Mapping and Write Combining MTRR (Memory Type Range Registers) is used by the InfiniPath driver to enable write combining to the InfiniPath on-chip transmit buffers. This improves write bandwidth to the InfiniPath chip by writing mult...
Page 78 - Incorrect MTRR Mapping Causes Unexpected Low Bandwidth; The setting should look like this:; Change Setting for Mapping Memory
C – TroubleshootingBIOS Settings C-4 IB6054601-00 D Q C.2.3 Incorrect MTRR Mapping Causes Unexpected Low Bandwidth This same MTRR Mapping setting as described in the previous section can also cause unexpected low bandwidth if it is set incorrectly. The setting should look like this: MTRR Mapping [Di...
Page 79 - Software Installation Issues
C – Troubleshooting Software Installation Issues IB6054601-00 D C-5 Q C.3 Software Installation Issues This section covers issues related to software installation. C.3.1 OpenFabrics Dependencies You need to install sysfsutils for your distribution before installing the OpenFabrics RPMs, as there are...
Page 80 - Installing Newer Drivers from Other Distributions
C – TroubleshootingSoftware Installation Issues C-6 IB6054601-00 D Q In older distributions, such as RHEL4, the 32-bit glibc will be contained in the libgcc RPM. The RPM will be named similarly to: libgcc-3.4.3-9.EL4.i386.rpm In newer distributions, glibc is an RPM name. The 32-bit glibc will be nam...
Page 81 - Installing for Your Distribution; Kernel and Initialization Issues
C – Troubleshooting Kernel and Initialization Issues IB6054601-00 D C-7 Q 8. Reload all modules by using this command (as root): # /etc/init.d/infinipath start An alternate mechanism can be used, if provided as part of your alternate installation. 9. Run an OpenFabrics test program, such as ibstatus...
Page 82 - Follow the links to the download page.; Follow the links to the downloads page.
C – TroubleshootingKernel and Initialization Issues C-8 IB6054601-00 D Q C.4.1 Kernel Needs CONFIG_PCI_MSI=y If the InfiniPath driver is being compiled on a machine without CONFIG_PCI_MSI=y configured, you will get a compilation error similar to this: ib_ipath/ipath_driver.c:46:2: #error "Infini...
Page 83 - Driver Load Fails Due to Unsupported Kernel; listed in; InfiniPath Interrupts Not Working; Normal output will like similar to this:
C – Troubleshooting Kernel and Initialization Issues IB6054601-00 D C-9 Q NOTE: This problem has been fixed in the 2.6.17 kernel.org kernel. C.4.3 Driver Load Fails Due to Unsupported Kernel If you try to load the InfiniPath driver on a kernel that InfiniPath software does not support, the load fail...
Page 84 - OpenFabrics Load Errors If
C – TroubleshootingKernel and Initialization Issues C-10 IB6054601-00 D Q A zero count in all CPU columns means that no interrupts have been delivered to the processor. Possible causes are: ■ Booting the linux kernel with ACPI (Advanced Configuration and Power Interface) disabled on the boot command...
Page 85 - InfiniPath; MPI Job Failures Due to Initialization Problems
C – Troubleshooting Kernel and Initialization Issues IB6054601-00 D C-11 Q C.4.6 InfiniPath ib_ipath Initialization Failure There may be cases where ib_ipath was not properly initialized. Symptoms of this may show up in error messages from an MPI job or another program. Here is a sample command and ...
Page 86 - The environment variable $IBPATH should be set to
C – TroubleshootingSystem Administration Troubleshooting C-12 IB6054601-00 D Q C.5 OpenFabrics Issues This section covers items related to OpenFabrics, including OpenSM. C.5.1 Stop OpenSM Before Stopping/Restarting InfiniPath OpenSM must be stopped before stopping or restarting InfiniPath. If not, e...
Page 87 - Broken Intermediate Link
C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-13 Q C.6.1 Broken Intermediate Link Sometimes message traffic passes through the fabric while other traffic appears to be blocked. In this case, MPI jobs fail to run. In large cluster configurations, switches may be attached to othe...
Page 89 - Bstatic; Compiler/Linker Mismatch; Compiler Can’t Find Include, Module or Library Files
C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-15 Q On a SLES 10 system, you would need: ■ compat-libstdc++ (for FC3) ■ compat-libstdc++5 (for SLES 10) Depending upon the application, you may need to use the -W1 ,- Bstatic option to use the static versions of some libraries. C.8...
Page 90 - Compiling on Development Nodes; Specifying the Run-time Library Path
C – TroubleshootingInfiniPath MPI Troubleshooting C-16 IB6054601-00 D Q For these examples in Section C.8.5 below, we assume that these new locations are: /path/to/devel (for mpi-devel-*) /path/to/libs (for mpi-libs-*) C.8.5 Compiling on Development Nodes If the mpi-devel-* rpm is installed with the...
Page 91 - Run Time Errors With Different MPI Implementations
C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-17 Q The above compiler command insures that the program will run using this path on any machine. For the second option, we change the file /etc/ld.so.conf on the compute nodes rather than using the -Wl,-rpath, option when compiling...
Page 94 - Extending MPI Modules
C – TroubleshootingInfiniPath MPI Troubleshooting C-20 IB6054601-00 D Q ^ pathf95-389 pathf90: ERROR BORDERS, File = communicate.F, Line = 407, Column = 18 No specific match can be found for the generic subprogram call "MPI_RECV". If it is necessary to use a non-standard argument list, it is...
Page 95 - However, some care must be taken. One should only do this if:; Lock Enough Memory on Nodes When Using a Batch Queuing System; , which is created or
C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-21 Q integer count, datatype, root, comm, ierror ! Call the Fortran 77 style implicit interface to "mpi_bcast" external mpi_bcast call mpi_bcast(buffer, count, datatype, root, comm, ierror) end subroutine additional_mpi_bcas...
Page 96 - Error Messages Generated by; Messages from the InfiniPath Library
C – TroubleshootingInfiniPath MPI Troubleshooting C-22 IB6054601-00 D Q If this file is not present or the node has not been rebooted after the infinipath RPM has been installed, a failure message similar to this will be generated: $ mpirun -m ~/tmp/sm -np 2 -mpi_latency 1000 1000000 node-00:1.ipath...
Page 97 - In these cases you can try to reboot, then call Support
C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-23 Q Found unknown timer type type unknown frame type type recv done: available_tids now n, but max is m (freed p) cancel recv available_tids now n, but max is m (freed %p) [n] Src lid error: sender: x, exp send: y Frame receive fro...
Page 98 - MPI Messages
C – TroubleshootingInfiniPath MPI Troubleshooting C-24 IB6054601-00 D Q The following message indicates that a node program may not be processing incoming packets, perhaps due to a very high system load: eager array full after overflow, flushing (head h, tail t) The following indicates an invalid In...
Page 100 - quiescence detected
C – TroubleshootingInfiniPath MPI Troubleshooting C-26 IB6054601-00 D Q There is no route to any host: $ mpirun -np 2 -m ~/tmp/q mpi_latency 100 100 ssh: connect to host <nodename> port 22: No route to host ssh: connect to host <nodename> port 22: No route to host MPIRUN: All node progra...
Page 101 - Driver and Link Error Messages Reported by MPI Programs; syslog
C – Troubleshooting InfiniPath MPI Troubleshooting IB6054601-00 D C-27 Q $ mpirun -np 2 -m ~/tmp/q -q 60 mpi_latency 1000000 1000000 MPIRUN: MPI progress Quiescence Detected after 9000 seconds. MPIRUN: 2 out of 2 ranks showed no MPI send or receive progress. MPIRUN: Per-rank details are the followin...
Page 102 - MPI Stats; Eager
C – TroubleshootingInfiniPath MPI Troubleshooting C-28 IB6054601-00 D Q C.8.13 MPI Stats Using the -print-stats option to mpirun will result in a listing to stderr of various MPI statistics. Here is example output for the -print-stats option when used with an 8-rank run of the HPCC benchmark. Messag...
Page 103 - Useful Programs and Files for Debugging
C – Troubleshooting Useful Programs and Files for Debugging IB6054601-00 D C-29 Q C.9 Useful Programs and Files for Debugging The most useful programs and files for debugging are listed in the sections below. Many of these programs and files have been discussed elsewhere in the documentation: this i...
Page 104 - Summary of Useful Programs and Files; pages for more information on the; A shell script
C – TroubleshootingUseful Programs and Files for Debugging C-30 IB6054601-00 D Q C.9.3 Summary of Useful Programs and Files Useful programs and files are summarized in the table below. Descriptions for some of the programs and files follow. Check man pages for more information on the programs. Table...
Page 105 - Example contents are:
C – Troubleshooting Useful Programs and Files for Debugging IB6054601-00 D C-31 Q C.9.4 boardversion It may be useful to keep track of the current version of the installed software. You can check the version of the installed InfiniPath software by looking in: /sys/bus/pci/drivers/ib_ipath/00/boardve...
Page 106 - $ident is in the optional rcs RPM, and is not always installed.
C – TroubleshootingUseful Programs and Files for Debugging C-32 IB6054601-00 D Q C.9.5 ibstatus This program displays basic information on the status of InfiniBand devices that are currently in use when the OpenFabrics modules are loaded. C.9.6 ibv_devinfo This program displays information about Inf...
Page 112 - version
C – TroubleshootingUseful Programs and Files for Debugging C-38 IB6054601-00 D Q C.9.17 strings The command strings can also be used. Its format is as follows: $ strings /usr/lib/libinfinipath.so.4.0 | grep Date: will produce output like this: $Date: 2006-09-15 04:07 Release2.0 InfiniPath $ NOTE: st...
Page 113 - Appendix D; Recommended Reading; References for MPI; Using MPI; Reference and Source for SLURM
IB6054601-00 D D-1 Appendix D Recommended Reading Reference material for further reading is provided here. D.1 References for MPI The MPI Standard specification documents. http://www.mpi-forum.org/docs The MPICH implementation of MPI and its documentation. http://www-unix.mcs.anl.gov/mpi/mpich/ The ...
Page 114 - Clusters; Beowulf Cluster Computing with; Rocks; Extensive documentation on installing Rocks and custom Rolls.
D – Recommended ReadingRocks D-2 IB6054601-00 D Q D.6 Clusters Gropp, William, Ewing Lusk, and Thomas Sterling, Beowulf Cluster Computing with Linux , Second Edition, 2003, MIT Press, ISBN 0-262-69292-9. D.7 Rocks Extensive documentation on installing Rocks and custom Rolls. http://www.rocksclusters...
Page 115 - Appendix E; Glossary
IB6054601-00 D E-1 Appendix E Glossary A glossary is provided below for technical terms used in the documentation. bandwidth The rate at which data can be transmitted. This represents the capacity of the network connection. Theoretical peak bandwidth is fixed, but the effective bandwidth , the ideal...
Page 121 - see also
IB6054601-00 D Index-1 Index A ACPI, enabling C-9 B Batch queuing for MPI jobs B-1 – B-4 Benchmarking MPI bandwidth A-2 – A-3 MPI latency measurement A-1 – A-2 MPI latency measurement in host rings A-5 C Compiling MPI programs compiler and linker variables 3-9 scripts for invoking compiler and linke...