Page 3 - Recovery and Restart Guide
CICS Transaction Server for z/OSVersion 4 Release 1 Recovery and Restart Guide SC34-7012-01
Page 5 - Contents; Preface
Contents Preface . . . . . . . . . . . . . . vii What this book is about . . . . . . . . . . vii Who should read this book . . . . . . . . . vii What you need to know to understand this book vii How to use this book . . . . . . . . . . . vii Changes in CICS Transaction Server forz/OS, Version 4 Rele...
Page 7 - Chapter 13. Programming for recovery 141
Input extrapartition data sets . . . . . . . 134 Output extrapartition data sets . . . . . . 135 Using post-initialization (PLTPI) programs . . 135 Recovery for temporary storage . . . . . . . 135 Backward recovery . . . . . . . . . . 135 Forward recovery . . . . . . . . . . . 136 Recovery for Web s...
Page 9 - What this book is about; Who should read this book; vii
Preface What this book is about This book contains guidance about determining your CICS ® recovery and restart needs, deciding which CICS facilities are most appropriate, and implementing yourdesign in a CICS region. The information in this book is generally restricted to a single CICS region. Forin...
Page 10 - viii
viii CICS TS for z/OS 4.1: Recovery and Restart Guide
Page 11 - ix
Changes in CICS Transaction Server for z/OS, Version 4Release 1 For information about changes that have been made in this release, please refer to What's New in the information center, or the following publications: v CICS Transaction Server for z/OS What's New v CICS Transaction Server for z/OS Upg...
Page 13 - Part 1. CICS recovery and restart concepts
Part 1. CICS recovery and restart concepts It is very important that a transaction processing system such as CICS can restartand recover following a failure. This section describes some of the basic conceptsof the recovery and restart facilities provided by CICS. © Copyright IBM Corp. 1982, 2010 1
Page 15 - Chapter 1. Recovery and restart facilities; Maintaining the integrity of data; resources; Logging changes
Chapter 1. Recovery and restart facilities Problems that occur in a data processing system could be failures withcommunication protocols, data sets, programs, or hardware. These problems arepotentially more severe in online systems than in batch systems, because the datais processed in an unpredicta...
Page 16 - quiesce; SET DSNAME RETRY
In general, forward recovery is applicable to data set failures, or failures insimilar data resources, which cause data to become unusable because it hasbeen corrupted or because the physical storage medium has been damaged. Minimizing the effect of failures An online system should limit the effect ...
Page 17 - Recoverable resources
Another way is to shut down CICS with an immediate shutdown and perform theforward recovery, after which a CICS emergency restart performs the backwardrecovery. Recoverable resources In CICS, a recoverable resource is any resource with recorded recovery informationthat can be recovered by backout. T...
Page 18 - Dynamic transaction backout; Emergency restart backout
v In the event of an emergency restart, when CICS backs out all those transactionsthat were in-flight at the time of the CICS failure (emergency restart backout). Although these occur in different situations, CICS uses the same backout process ineach case. CICS does not distinguish between dynamic b...
Page 19 - CICS forward recovery; Forward recovery of CICS data sets
The recovery manager also drives: v The backout processing for any units of work that were in a backout-failed stateat the time of the CICS failure v The commit processing for any units of work that had not finished commitprocessing at the time of failure (for example, for resource definitions that ...
Page 20 - Forward recovery for non-VSAM resources; Failures that require CICS recovery processing
Forward recovery journal names are of the form DFHJ nn where nn is a number in the range 1–99 and is obtained from the forward recovery log id(FWDRECOVLOG) in the FILE resource definition.In this case, CICS creates a journal entry for the forward recovery log, whichcan be mapped by a JOURNALMODEL re...
Page 21 - XCF/MRO partner failures
2. If the failure occurs during the execution of a CICS syncpoint, where theconversation is with another resource manager (perhaps in another CICSregion), CICS handles the resynchronization. This is described in the CICS Intercommunication Guide . If the link fails and is later reestablished, CICS a...
Page 22 - CICS recovery processing following a transaction failure; CICS recovery processing following a system failure; system log; emergency restart
When the operator replies to IXC402D, the CICS interregion communicationprogram, DFHIRP, is notified and the suspended tasks are abended, and MROconnections closed. Until the reply is issued to IXC402D, an INQUIRECONNECTION command continues to show connections to regions in the failedMVS as in serv...
Page 25 - Chapter 2. Resource recovery in CICS; Units of work; Shunted units of work
Chapter 2. Resource recovery in CICS Before you begin to plan and implement resource recovery in CICS, you shouldunderstand the concepts involved, including units of work, logging and journaling. Units of work When resources are being changed, there comes a point when the changes arecomplete and do ...
Page 26 - Locks; Active and retained states for locks
v Working storage v Any LU6.2 sessions v Any LU6.1 links v Any MRO links The resources CICS retains include: v Locks on recoverable data. If the unit of work is shunted indoubt, all locks areretained. If it is shunted because of a commit- or backout-failure, only the lockson the failed resources are...
Page 27 - Synchronization points
When a lock is first acquired, it is an active lock. It remains an active lock untilsuccessful completion of the unit of work, when it is released, or is converted intoa retained lock if the unit of work fails, or for a CICS or SMSVSAM failure: v If a unit of work fails, RLS VSAM or the CICS enqueue...
Page 28 - Examples of synchronization points; not
– EXEC CICS CREATE CONNECTION COMPLETE – EXEC CICS DISCARD CONNECTION – EXEC CICS DISCARD TERMINAL A UOW that does not change a recoverable resource has no meaningful effect forthe CICS recovery mechanisms. Nonrecoverable resources are never backed out. A unit of work can also be ended by backout, w...
Page 29 - CICS recovery manager
CICS recovery manager The recovery manager ensures the integrity and consistency of resources (such asfiles and databases) both within a single CICS region and distributed overinterconnected systems in a network. Figure 3 on page 18 shows the resource managers and their resources with which the CICS...
Page 30 - Managing the state of each unit of work; TD; Lo
v Managing the state, and controlling the execution, of each UOW v Coordinating UOW-related changes during syncpoint processing for recoverableresources v Coordinating UOW-related changes during restart processing for recoverableresources v Coordinating recoverable conversations to remote nodes v Te...
Page 31 - Coordinating updates to local resources
v Notification that the resource is not available, requiring temporary suspension(shunting) of the UOW v Notification that the resource is available, enabling retry of shunted UOWs v Notification that a connection is reestablished, and can deliver a commit orrollback (backout) decision v Syncpoint r...
Page 32 - Coordinating updates in distributed units of work; Managing indoubt units of work
others. This can happen, for example, if two data sets are updated and the UOWhas to be backed out, and the following happens: v One resource backs out successfully v While committing this successful backout, the commit fails v The other resource fails to back out These events leave one data set com...
Page 33 - Resynchronization after system or connection failure; CICS system log; Information recorded on the system log
Resynchronization after system or connection failure Units of work that fail while in an indoubt state remain shunted until the indoubtstate can be resolved following successful resynchronization with the coordinator. Resynchronization takes place automatically when communications are nextestablishe...
Page 34 - System activity keypoints; AKPFREQ; Forward recovery logs
CICS also writes “backout-failed” records to the system log if a failure occurs inbackout processing of a VSAM data set during dynamic backout or emergencyrestart backout. Records on the system log are used for cold, warm, and emergency restarts of aCICS region. The only type of start for which the ...
Page 37 - Chapter 3. Shutdown and restart recovery; Normal shutdown processing; shutdown assist transaction
Chapter 3. Shutdown and restart recovery CICS can shut down normally or abnormally and this affects the way that CICSrestarts after it shuts down. CICS can stop executing as a result of: v A normal (warm) shutdown initiated by a CEMT, or EXEC CICS, PERFORMSHUT command v An immediate shutdown initiat...
Page 38 - and; Second quiesce stage
v The DFHCESD program started by the CICS-supplied transaction, CESD,attempts to purge and back out long-running tasks using increasingly strongermethods (see “The shutdown assist transaction” on page 30). v Tasks that are automatically initiated are run—if they start before the secondquiesce stage....
Page 39 - Warm keypoints
this indicator to determine the type of startup it is to perform. See “How thestate of the CICS region is reconstructed” on page 34. v CICS writes warm keypoint records to:– The global catalog for terminal control and profiles– The CICS system log for all other resources. See “Warm keypoints.” v CIC...
Page 40 - Flushing journal buffers; SET JOURNAL FLUSH; SHUT IMMEDIATE; WRITE; Immediate shutdown processing (PERFORM SHUTDOWN IMMEDIATE); PERFORM IMMEDIATE not recommended
Flushing journal buffers During a successful normal shutdown, CICS calls the log manager domain to flushall journal buffers, ensuring that all journal records are written to theircorresponding MVS system logger log streams. During an immediate shutdown, the call to the log manager domain is bypassed...
Page 41 - Shutdown requested by the operating system
2. If the default shutdown assist transaction CESD is run, it allows as many tasksas possible to commit or back out cleanly, but within a shorter time than thatallowed on a normal shutdown. See “The shutdown assist transaction” on page30 for more information about CESD, which runs the CICS-supplied ...
Page 42 - Uncontrolled termination; The shutdown assist transaction
The next initialization of CICS must be an emergency restart, in order to preserve data integrity. An emergency restart is ensured if the next initialization of CICSspecifies START=AUTO. This is because the recovery manager ’s type-of-restartindicator is set to “emergency-restart-needed” during init...
Page 43 - Cataloging CICS resources; Global catalog
You are recommended always to use the CESD shutdown-assist transaction whenshutting down your CICS regions. You can use the DFHCESD program “as is”, oruse the supplied source code as the basis for your own customized version (CICSsupplies versions in assembler, COBOL, and PL/I). For more information...
Page 44 - Local catalog
- File control recovery blocks (only if a SHCDS NONRLSUPDATEPERMITTED command has been used). – Transient data queue definitions– Dump table information– Interval control elements and automatic initiate descriptors at shutdown– APPC connection information so that relevant values can be restored duri...
Page 46 - How the state of the CICS region is reconstructed
and therefore recovery of the most recent units of work cannot be carried out.However, data might be missing from any part of the system log and CICS cannotidentify what is missing. CICS cannot examine the log and determine exactly whatdata is missing, because the log data might appear consistent in...
Page 47 - Overriding the type of start indicator; About this task; Warm restart; GRPLIST; Emergency restart; Initialization during emergency restart
Overriding the type of start indicator The operation of the recovery manager's control record can be modified byrunning the recovery manager utility program, DFHRMUTL. About this task This can set an autostart record that determines the type of start CICS is toperform, effectively overriding the typ...
Page 48 - Recovery of data during an emergency restart; Cold start; START; An initial start of CICS
performs the recovery process for work that was in-flight when the previous run ofCICS was abnormally terminated. Recovery of data during an emergency restart During the final stage of emergency restart, the recovery manager uses the systemlog data to drive backout processing for any units of work t...
Page 49 - Dynamic RLS restart; INQUIRE UOWDSNFAIL
You can do this by specifying START=INITIAL as a system initialization parameter,or by running the recovery manager's utility program (DFHRMUTL) to overridethe type of start indicator to force an initial start. See the CICS Operations and Utilities Guide for information about the DFHRMUTL utility pr...
Page 50 - Recovery with VTAM persistent sessions; MNPS, multinode persistent sessions; Running with persistent sessions support
Recovery with VTAM persistent sessions With VTAM persistent sessions support, if CICS fails or undergoes immediateshutdown (by means of a PERFORM SHUTDOWN IMMEDIATE command), VTAM holds the CICS LU-LU sessions in recovery-pending state, and they can be recoveredduring startup by a newly starting CIC...
Page 51 - Situations in which sessions are not reestablished; PSDINT
During an emergency restart of CICS, CICS restores those sessions pendingrecovery from the CICS global catalog and the CICS system log to an in-sessionstate. This process of persistent sessions recovery takes place when CICS opens itsVTAM ACB. With multinode persistent sessions support, if VTAM or z...
Page 52 - Situations in which VTAM does not retain sessions; Running without persistent sessions support
v If CICS determines that it cannot recover the session without unbinding andrebinding it. The result in each case is as if CICS has restarted following a failure withoutVTAM persistent sessions support. In some other situations APPC sessions are unbound. For example, if a bind was inprogress at the...
Page 53 - PSTYPE
You can then start further CICS regions with or without persistent sessions supportas appropriate, provided that you do not exceed the limit for the number ofregions that do have persistent sessions support. If you specify NOPS (no persistent session support) for the PSTYPE system initialization par...
Page 55 - Part 2. Recovery and restart processes
Part 2. Recovery and restart processes You can add your own processing to the CICS recovery and restart processes. This part contains the following sections: v Chapter 4, “CICS cold start,” on page 45 v Chapter 5, “CICS warm restart,” on page 53 v Chapter 6, “CICS emergency restart,” on page 61 v Ch...
Page 57 - Chapter 4. CICS cold start; Starting CICS with the START=COLD parameter
Chapter 4. CICS cold start This section describes the CICS startup processing specific to a cold start. It covers the two forms of cold start: v “Starting CICS with the START=COLD parameter” v “Starting CICS with the START=INITIAL parameter” on page 50 Starting CICS with the START=COLD parameter STA...
Page 58 - Files; VSAM
– CICS requests the SMSVSAM server, if connected, to release all RLS retained locks. – CICS does not rebuild the non-RLS retained locks. v CICS requests the SMSVSAM server to clear the RLS sharing control status forthe region. v CICS does not restore the dump table, which may contain entries control...
Page 59 - Data tables; Temporary storage; Temporary storage data sharing server; Transient data
specify on the GRPLIST system initialization parameter. The CSD filedefinition is built and installed from the CSD xxxx system initialization parameters. Data tables As for VSAM file definitions. BDAM File definitions are installed from file control table entries, specified by theFCT system initiali...
Page 61 - Monitoring and statistics; Terminal control resources; VTAM devices; Committing and cataloging resources installed from the CSD; Single resource install
If you define new resource definitions and install them dynamically, ensure thegroup containing the resources is added to the appropriate group list. Monitoring and statistics The initial status of CICS monitoring is determined by the monitoring systeminitialization parameters (MN and MN xxxx ). The...
Page 62 - Installable set install; Distributed transaction resources; Dump table
Installable set install The following VTAM terminal control resources are committed ininstallable sets: v Connections and their associated sessions v Pipeline terminals—all the terminal definitions sharing the same POOLname If one definition in an installable set fails, the set fails. However, eachi...
Page 65 - Chapter 5. CICS warm restart; Rebuilding the CICS state after a normal shutdown
Chapter 5. CICS warm restart This section describes the CICS startup processing specific to a warm restart. If you specify START=AUTO, which is the recommended method, CICSdetermines which type of start to perform using information retrieved from therecovery manager's control record in the global ca...
Page 66 - Data set name blocks; Reconnecting to SMSVSAM for RLS access; lost; Recreating non-RLS retained locks
Files File control information from the previous run is recovered from informationrecorded in the CICS catalog only. File resource definitions for VSAM and BDAM files, data tables, and LSR pools areinstalled from the global catalog, including any definitions that were addeddynamically during the pre...
Page 68 - Transactions; No autoinstall for programs
v All intrapartition TD queues are initialized empty. v The queue resource definitions are installed from the global catalog, but they arenot updated by any log records or keypoint data. They are always installedenabled. This option is intended for use when initiating remote site recovery (see Chapt...
Page 69 - Autoinstall for programs; Start requests
Autoinstall for programs If program autoinstall is enabled (PGAIPGM=ACTIVE), program, mapset, andpartitionset resource definitions are installed from the CSD only if they werecataloged; otherwise they are installed at first reference by the autoinstall process. All definitions installed from the CSD...
Page 70 - CSD-defined resource definitions; Same TCT as last run
Journal names and journal models The CICS log manager restores the journal name and journal model definitionsfrom the global catalog. Journal name entries contain the names of the log streamsused in the previous run, and the log manager reconnects to these during thewarm restart. Terminal control re...
Page 71 - Different TCT from last run; URIMAP definitions and virtual hosts
v Different TCT from last run . CICS installs the TCT only, and does not apply the warm keypoint information, effectively making this a cold start for thesedevices. Note: CICS TS for z/OS, Version 4.1 supports only remote TCAM terminals—that is, the only TCAM terminals you can define are those attac...
Page 73 - Chapter 6. CICS emergency restart; Recovering after a CICS failure; Recovering information from the system log
Chapter 6. CICS emergency restart This section describes the CICS startup processing specific to an emergency restart. If you specify START=AUTO, CICS determines what type of start to perform usinginformation retrieved from the recovery manager ’s control record in the globalcatalog. If the type-of-...
Page 74 - Effect of delayed recovery on PLTPI processing; Other backout processing
Any non-RLS locks associated with in-flight (and other failed) transactions areacquired as active locks for the tasks attached to perform the backouts. This meansthat, if any new transaction attempts to access non-RLS data that is locked by abackout task, it waits normally rather than receiving the ...
Page 79 - Chapter 7. Automatic restart management; Restrictions; CICS ARM processing
Chapter 7. Automatic restart management CICS uses the automatic restart manager (ARM) component of MVS to increase theavailability of your systems. MVS automatic restart management is a sysplex-wide integrated automatic restartmechanism that performs the following tasks: v Restarts an MVS subsystem ...
Page 80 - Registering with ARM; Before you begin; Waiting for predecessor subsystems; wait predecessor; De-registering from ARM
If CICS is restarted by ARM with the same persistent JCL, CICS forcesSTART=AUTO to ensure data integrity. Registering with ARM To register with ARM, you must implement automatic restart management on theMVS images that the CICS workload is to run on. You must also ensure that theCICS startup JCL use...
Page 81 - CICS restart JCL and parameters; XRF
CANCEL, CICS de-registers from ARM before terminating, because if CICSremained registered, an automatic restart would probably encounter the same errorcondition. For other error situations, CICS does not de-register, and automatic restarts follow.To control the number of restarts, specify in your AR...
Page 82 - CICS START options; Workload policies
CICS START options You are recommended to specify START=AUTO, which causes a warm start after anormal shutdown and an emergency restart after failure. You are also recommended always to use the same JCL, even if it specifiesSTART=COLD or START=INITIAL, to ensure that CICS restarts correctly whenrest...
Page 83 - Automatic restart of CICS data-sharing servers
The COVR transaction To ensure that CICS reconnects to VTAM in the event of a VTAM abend, CICSkeeps retrying the OPEN VTAM ACB using a time-delay mechanism via thenon-terminal transaction COVR. After CICS has completed clean-up following the VTAM failure, it invokes theCICS open VTAM retry (COVR) tr...
Page 84 - Waiting on events during initialization; Server initialization parameters for ARM support; Server commands for ARM support; ARMREGISTERED
You can also restart a server explicitly using either the server command CANCEL RESTART=YES , or the MVS command CANCEL jobname ,ARMRESTART By default, the server uses an ARM element type of SYSCICSS, and an ARMelement identifier of the form DFH xxnn _ poolname where xx is the server type (XQ, CF or...
Page 85 - Chapter 8. Unit of work recovery and abend processing; SET TASK PURGE; Unit of work recovery
Chapter 8. Unit of work recovery and abend processing A number of different events can cause the abnormal termination of transactions inCICS. These events include: v A transaction ABEND request issued by a CICS management module. v A program check or operating system abend (this is trapped by CICS a...
Page 86 - Transaction backout; single
See “Commit-failed recovery” on page 83. Backout-failed A unit of work fails while backing out updates to file control recoverableresources. (The concept of backout-failed applies in principle to anyresource that performs backout recovery, but CICS file control is the onlyresource manager to provide...
Page 87 - BDAM files and VSAM ESDS files:
terminating transaction takes place immediately. Therefore, it does not cause anyactive locks to be converted into retained locks. In the case of a CICS region abend,in-flight tasks have to wait to be backed out when CICS is restarted, during whichtime the locks are retained to protect uncommitted r...
Page 88 - Intrapartition transient data; logically recoverable; Physically recoverable; are; Auxiliary temporary storage; PROTECT
Intrapartition transient data Intrapartition destinations specified as logically recoverable are restored by transaction backout. Read and write pointers are restored to what they were beforethe transaction failure occurred. Physically recoverable queues are recovered on warm and emergency restarts....
Page 89 - START with nonrecoverable data (no PROTECT); recoverable; Restart of started transactions:
intended for the started task, but does not back out the START request itself. Thus the new task will start at its specified time, but the data will not beavailable to the started task, to which CICS will return a NOTFND conditionin response to the RETRIEVE command. START with recoverable data (PROT...
Page 90 - EXEC CICS CANCEL; SEND MAP
Table 1. Effect of RESTART option on started transactions (continued) Description ofnon-terminal STARTcommand Events Effect ofRESTART(YES) Effect ofRESTART(NO) Specifies nonrecoverabledata Started taskabends without retrieving its data Transaction is restartedwith its data stillavailable, up to n ¹ ...
Page 91 - Backout-failed recovery
Backout-failed recovery Backout failure support is currently provided only by CICS file control. If backout to a VSAM data set fails for any reason, CICS performs the followingprocessing: v Invokes the backout failure global user exit program at XFCBFAIL, if this exit isenabled. If the user exit pro...
Page 92 - Disposition of data sets after backout failures
Transient data All updates to logically recoverable intrapartition queues are managed in mainstorage until syncpoint, or until a buffer must be flushed because all buffers are inuse. TD always commits forwards; therefore, TD can never suffer a backout failureon DFHINTRA. Retrying backout-failed unit...
Page 95 - Commit-failed recovery
This situation can be resolved only by deleting the rival record with theduplicate key value. Lock structure full error The backout required VSAM to acquire a lock for internal processing, but itwas unable to do so because the RLS lock structure was full. This error canoccur only for VSAM data sets ...
Page 96 - Indoubt failure recovery; indoubt
distinguishes between a commit failure where recoverable work was performed,and one for which only repeatable read locks were held. Indoubt failure recovery The CICS recovery manager is responsible for maintaining the state of each unit ofwork in a CICS region. For example, typical events that cause...
Page 97 - READQ; WRITEQ; DELETEQ; Investigating an indoubt failure
reads against VSAM data sets and has made no updates to other resources, it issafe to force the unit of work using the SET DSNAME or SET UOW commands. CICS saves enough information about the unit of work to allow it to be eithercommitted or backed out when the indoubt unit of work is unshunted when ...
Page 100 - cache set
Recovery from failures associated with the coupling facility This topic deals with recovery from failures arising from the use of the couplingfacility, and which affect CICS units of work. It covers: v SMSVSAM cache structure failures v SMSVSAM lock structure failures (lost locks) v Connection failu...
Page 101 - Lost locks recovery; Rebuilding the lock structure; Notifying CICS of SMSVSAM restart
CICS recovers after a cache failure automatically. There is no need for manualintervention (other than the prerequisite action of resolving the underlying cause ofthe cache failure). Lost locks recovery The failure of a coupling facility lock structure that cannot be rebuilt by VSAMcreates the lost ...
Page 102 - Performing lost locks recovery for failed units of work
region that was not sharing the data set at the time the lost locks conditionoccurred, and on RLS access requests issued by any new units of work in CICSregions that were sharing the data set. Performing lost locks recovery for failed units of work Lost locks recovery requires that any units of work...
Page 103 - Connection failure to a coupling facility cache structure; Connection failure to a coupling facility lock structure
simultaneously all data sets in use when the lock structure fails, each data set canbe restored to service individually as soon as all its sharing CICS regions havecompleted lost locks recovery. Connection failure to a coupling facility cache structure If connection to a coupling facility cache stru...
Page 104 - HANDLE ABEND
Recovery from the failure of a sysplex is just the equivalent of multiple MVSfailure recoveries. Transaction abend processing If, during transaction abend processing, another abend occurs and CICS continues,there is a risk of a transaction abend loop and further processing of a resource thathas lost...
Page 105 - Abnormal termination of a task; Transaction restart
The exit code then executes as an extension of the abending task, and runs at thesame level as the program that issued the HANDLE ABEND command thatactivated the exit. After any program-level abend exit code has been executed, the next actiondepends on how the exit code ends: v If the exit code ends...
Page 106 - SYNCPOINT ROLLBACK; Actions taken at transaction failure; Processing operating system abends and program checks
1. CICS invokes DFHREST only when RESTART(YES) is specified in atransaction’s resource definition. 2. Ensure that resources used by restartable transactions, such as files, temporarystorage, and intrapartition transient data queues, are defined as recoverable. 3. When transaction restart occurs, a n...
Page 109 - Chapter 9. Communication error processing; Terminal error processing
Chapter 9. Communication error processing The types of communication error that can occur include terminal error processingand intersystem communication failures. Terminal error processing There are two main CICS programs that participate in terminal error processing.These are the node error program...
Page 110 - Intersystem communication failures
The TEP is entered once for each terminal error, and therefore should be designedto process only one error for each invocation. Intersystem communication failures An intersystem communication failure can be caused by the failure of a CICSregion, or the remote system to which it is connected. A netwo...
Page 111 - Part 3. Implementing recovery and restart
Part 3. Implementing recovery and restart This part describes the way you implement recovery and restart for CICS regions. © Copyright IBM Corp. 1982, 2010 99
Page 113 - Chapter 10. Planning aspects of recovery; Application design considerations; online
Chapter 10. Planning aspects of recovery When you are planning aspects of recovery, you must consider your applications,system definitions, internal documentation, and test plans. Application design considerations Think about recoverability as early as possible during the application designstages. T...
Page 114 - Validate the recovery requirements statement
Question 5: If a data set becomes unusable, should all applications be terminated whilerecovery is performed? If degraded service to any application must be preserved while recovery of the data set takes place, you will need to include procedures todo this. Question 6: Which of the files to be updat...
Page 115 - Designing the end user’s restart procedure; End user’s standby procedures
Before any design or programming work begins, all interested parties should agreeon the statement—including: v Those responsible for business management v Those responsible for data management v Those who are to use the application—including the end users, and those responsible for computer and onli...
Page 116 - SRT; Resource definitions for recovery; System log streams and general log streams
v If a user ’s printer becomes unusable (because of hardware or communicationproblems), consider the use of alternatives, such as the computer center ’s printer,as a standby. Security Decide the security procedures for an emergency restart or a break incommunications. For example, when confidential ...
Page 117 - Temporary storage table; Documentation and test plans
and general log data to log streams defined to the MVS system logger. Formore information, see Chapter 11, “Defining system and general logstreams,” on page 107. Files For VSAM files defined to be accessed in RLS mode, define the recoveryattributes in the ICF catalog, using IDCAMS. For VSAM files de...
Page 119 - Chapter 11. Defining system and general log streams
Chapter 11. Defining system and general log streams All CICS system logging and journaling is controlled by the CICS log manager,which uses MVS system logger log streams to store its output. About this task CICS logging and journaling can be divided into four broad types of activity: System logging ...
Page 120 - Defining log streams to MVS; Defining system log streams
System log streams These are used by the CICS log manager and the CICS recovery managerexclusively for unit of work recovery purposes. Each system log is uniqueto a CICS region, and must not be merged with any other system log. General log streams These are used by the CICS log manager for all other...
Page 121 - Specifying a JOURNALMODEL resource definition; Without a JOURNALMODEL definition; With a JOURNALMODEL definition
CICS log manager connects to its log stream automatically during systeminitialization, unless it is defined as TYPE(DUMMY) in a CICS JOURNALMODELresource definition. Although the CICS system log is logically a single logical log stream, it is writtento two physical log streams—a primary and a second...
Page 122 - Model log streams for CICS system logs; Example; Recovery considerations
Model log streams for CICS system logs If CICS fails to connect to its system log streams because they have not beendefined, CICS attempts to have them created dynamically using model log streams. To create a log stream dynamically, CICS must specify to the MVS system loggerall the log stream attrib...
Page 124 - EXEC CICS ASSIGN INITPARM; Activity keypointing
Varying the model log stream name: To balance log streams across log structures, using model log streams meanscustomizing the model log stream names. You cannot achieve the distribution oflog streams shown in this scenario using the CICS default model name. About this task You can use an XLGSTRM glo...
Page 125 - Keeping system log data to a minimum
work. With this information, CICS continues reading backwards, but this timereading only the records for units of work that are identified in the activitykeypoint. Reading continues until CICS has read all the records for the units ofwork identified by the activity keypoint.This process means that c...
Page 126 - Moving units of work to the secondary log:
v If a system log stream exceeds the primary storage space allocated, it spills ontosecondary storage. (For a definition of primary and secondary storage, see the CICS Transaction Server for z/OS Installation Guide .) The resulting I/O can adversely affect system performance. v If the interval betwe...
Page 127 - Writing user-recovery data; Retrieving user records from the system log:; Avoiding retention periods on the system log
Writing user-recovery data About this task You should write only recovery-related records to the system log stream. You cando this using the commands provided by the application programming interface(API) or the exit programming interfaces (XPI). This is important because userrecovery records are pr...
Page 128 - Long-running transactions; Defining forward recovery log streams
About this task The dddd value specifies the minimum number of days for which data is to be retained on the log. You are strongly recommended not to use the system log for records that need to be kept. Any log and journal data that needs to be preserved should be written toa general log stream. See ...
Page 129 - What to do next; Model log streams for CICS general logs
2. Define a general log stream for forward recovery data. If you do not define ageneral log stream, CICS attempts to create a log stream dynamically. See“Model log streams for CICS general logs” for details. 3. Decide how you want to merge forward recovery data from different CICSregions into one or...
Page 130 - Merging data on shared general log streams; Defining the log of logs
Merging data on shared general log streams Unlike system log streams, which are unique to one CICS region, general logstreams can be shared between many CICS regions. This means that you canmerge forward recovery data from a number of CICS regions onto the sameforward recovery log stream. About this...
Page 131 - Log of logs failure
About this task The CICS-supplied group, DFHLGMOD, includes a JOURNALMODEL for the logof logs, called DFHLGLOG, which has a log stream name of&USERID..CICSVR.DFHLGLOG. Note that &USERID resolves to the CICS regionuserid, and if your CICS regions run under different RACF user IDs, theDFHLGLOG...
Page 132 - Effect of daylight saving time changes; CEMT PERFORM RESET; Time stamping log and journal records
v In a format compatible with utility programs written for versions of CICS thatuse the log manager for logging and journaling. See the CICS Operations and Utilities Guide for more information about using the LOGR SSI to access log stream data, and for sample JCL. If you plan to write your own utili...
Page 133 - Offline utility program, DFHJUP
Operating a recovery process that is independent of time-stamps in the system logdata ensures that CICS can restart successfully after an abnormal termination, evenif the failure occurs shortly after local time has been put back. Offline utility program, DFHJUP Changing the local time forward has no...
Page 135 - Recovery for transactions; local
Chapter 12. Defining recoverability for CICS-managedresources This section describes what to do to ensure that you can recover the resourcescontrolled by CICS on behalf of your application programs. About this task It covers the recoverability aspects of the various resources as follows: v “Recovery...
Page 136 - Indoubt options for distributed transactions
SPURGE({NO|YES}) This option indicates whether the transaction is initially system-purgeable; thatis, whether CICS can purge the transaction as a result of: v Expiry of a deadlock timeout (DTIMOUT) delay interval v A CEMT, or EXEC CICS, SET TASK(id) PURGE|FORCEPURGE command. The default is SPURGE(NO...
Page 137 - Recovery for files; file; data set; VSAM files; Sharing data sets with batch jobs
Recovery for files A CICS file is a logical view of a physical data set, defined to CICS in a file resource definition with an 8-character file name. A CICS file is associated with a VSAM or BDAM data set by one of the following: v By dynamic allocation, where the data set name is predefined on the ...
Page 138 - Forward recovery; Backward recovery; Defining files as recoverable resources
Forward recovery For VSAM files, you can use a forward recovery utility, such as CICSVR, whenonline backout processing has failed as a result of some physical damage to thedata set. For forward recovery: v Create backup copies of data sets. v Record after-images of file changes in a forward recovery...
Page 139 - NONRLSRECOV; VSAM files accessed in non-RLS mode; BACKUPTYPE
uses the ICF catalog entry recovery attributes instead of the FILE resource. Toforce CICS to use the FILE resource attributes instead of the catalog, set the NONRLSRECOV system initialization parameter to FILEDEF. v You define the recovery attributes for BDAM files in file entries in the filecontrol...
Page 140 - VSAM files accessed in RLS mode; NONE; UNDO; Inquiring on recovery attributes:
VSAM files accessed in RLS mode If you specify file definitions that open a data set in RLS mode, specify therecovery options in the ICF catalog. The recovery options on the CICS file resource definitions (RECOVERY,FWDRECOVLOG, and BACKUPTYPE) are ignored if the file definition specifiesRLS access. ...
Page 141 - BDAM files; The CSD data set; Overriding open failures at the XFCNREC global user exit
INQUIRE DSNAME command returns values from the VSAM base cluster block(BCB). However, because base cluster block (BCB) recovery values are not set untilthe first open, if you issue an INQUIRE DSNAME command before the first file isopened, CICS returns NOTAPPLIC for RECOVSTATUS. BDAM files You can sp...
Page 142 - INQUIRE DSNAME RECOVSTATUS; SET DSNAME REMOVE; CICS responses to file open requests
About this task If you use XFCNREC to suppress open failures that are a result of inconsistenciesin the backout settings, CICS issues a message to warn you that the integrity of thedata set can no longer be guaranteed. Any INQUIRE DSNAME RECOVSTATUS command that is issued from this point onward will...
Page 143 - Implementing forward recovery with user-written utilities; Recovery for intrapartition transient data
- File is defined with RECOVERY(ALL): the open fails. – Base cluster has RECOVERY(ALL): - File is defined with RECOVERY(NONE): the open fails.- File is defined with RECOVERY(BACKOUTONLY): the open fails.- File is defined with RECOVERY(ALL): the open proceeds unless FWDRECOVLOG specifies a different ...
Page 144 - extrapartition; Logical recovery
For more information about allocation and space requirements, see the CICS System Definition Guide .) For extrapartition transient data considerations, see “Recovery for extrapartition transient data” on page 134. You must specify the name of every intrapartition transient data queue that youwant to...
Page 145 - No recovery
Making intrapartition TD physically recoverable can be useful in the case of someCICS queues. For example, after a CICS failure, you might choose to restart CICSas quickly as possible, and then look for the cause of the failure. By specifyingqueues such as CSMT as intrapartition and physically recov...
Page 146 - Recovery for extrapartition transient data; Input extrapartition data sets; immediately
Recovery for extrapartition transient data CICS does not recover extrapartition data sets. If you depend on extrapartitiondata, you will need to develop procedures to recover data for continued executionon restart following either a controlled or an uncontrolled shutdown of CICS. There are two areas...
Page 147 - Output extrapartition data sets; Recovery for temporary storage
Output extrapartition data sets The recovery of output extrapartition data sets is somewhat different from therecovery of input data sets. For a tape output data set, use a new output tape on restart. You can then use theprevious output tape if you need to recover information recorded beforeterminat...
Page 149 - EXEC CICS SYNCPOINT; Results; Defining local queues in a service provider; Procedure
About this task CICS uses Business Transaction Services (BTS) to ensure that persistent messagesare recovered in the event of a CICS system failure. For this to work correctly,follows these steps: Procedure 1. Use IDCAMS to define the local request queue and repository file to MVS. Youmust specify a...
Page 150 - Persistent message processing; Error processing
2. For each local request queue, define a QLOCAL object. Use the followingcommand: DEFINEQLOCAL(' queuename ') DESCR(' description ') PROCESS( processname ) INITQ(' initiation_queue ') TRIGGERTRIGTYPE(FIRST)TRIGDATA(' default_target_service ') BOTHRESH( nnn ) BOQNAME(' requeuename ') where: v queuen...
Page 151 - RESET ACQPROCESS; RUN ASYNC; MQ GET
not usable, message DFHPI0117 is issued, and CICS continues without BTS, usingthe existing channel-based container mechanism. If a CICS failure occurs before the Web service starts or completes processing, BTSrecovery ensures that the process is rescheduled when CICS is restarted. If the Web service...
Page 153 - Chapter 13. Programming for recovery; Designing applications for recovery; application; Splitting the application into transactions
Chapter 13. Programming for recovery When you are designing your application programs, you can include recoveryfacilities that are provided by CICS; for example, you can use global user exits forbackout recovery. This section covers the following topics: v “Designing applications for recovery” v “Pr...
Page 155 - SAA-compatible applications; Program design; committed
SAA-compatible applications The resource recovery element of the Systems Application Architecture ® (SAA) common programming interface (CPI) provides an alternative to the standard CICSapplication program interface (API) if you need to implement SAA-compatibleapplications. The resource recovery faci...
Page 156 - one; Processing dialogs with users; Conversational processing
committed in one unit of work, but the transaction is to continue with one or more units of work for further processing. 3. Where file or database updates must be kept in step, make sure that yourapplication does them in the same unit of work. This approach ensures thatthose updates will all be comm...
Page 157 - Mechanisms for passing data between transactions; Main storage areas; CICS recoverable resources
back out only the updates made during that individual step; the application isresponsible for restarting at the appropriate point in the conversation, which mightinvolve recreating a screen format. However, other tasks might try to update the database between the time whenupdate information is accep...
Page 158 - Transient data queues; logically; User files and DL/I and DB2 databases; Designing to avoid transaction deadlocks
v Data tables (user-maintained) v Coupling facility data tables CICS can return all these resources to their status at the beginning of an in-flightunit of work if a task ends abnormally. Temporary storage (auxiliary) You can use a temporary storage item to communicate between transactions. (For thi...
Page 159 - Implications of interval control START requests
Procedure v Arrange for all transactions to access files in a sequence agreed in advance. Thiscould be a suitable subject for installation standards. Be extra careful if youallow updates through multiple paths. v Enforce explicit installation enqueueing standards so that all applications do thefollo...
Page 160 - own; Implications of automatic task initiation (TD trigger level); Implications of presenting large amounts of data to the user; Terminal paging through BMS; SEND PAGE BMS; Using transient data queues
The abend processing should analyze the cause of failure as far as possible, andrestart the task if appropriate. Ensure that either the user or the master terminaloperator can take appropriate action to repeat the updates. You could, for example,allow the user to reinitiate the task. An alternative ...
Page 161 - Managing transaction and system failures; Transaction failures
About this task Such queuing can be done on a transient data queue associated with a terminal. Aspecial transaction, triggered when the terminal is available, can then format andpresent the data. For recovery and restart purposes: v The transient data queue should be specified as logically recoverab...
Page 162 - HANDLE ABEND commands; EXEC CICS SYNCPOINT ROLLBACK command; EXEC CICS ABEND
For example, if file input and output errors occur (where the default action ismerely to abend the task), you might want to inform the master terminal operator,who can decide to terminate CICS, especially if one of the files is critical to theapplication. Your installation might have standards relat...
Page 163 - System failures; Handling abends and program level abend exits
v DTB takes place only after program level abend exits (if any) have attemptedcleanup or logical recovery. Transaction restart after DTB For each transaction where DTB is specified, consider also specifying automatictransaction restart. For example, for transactions that access DL/I databases (andar...
Page 164 - Processing the IOERR condition
v Send a message to the terminal operator if, for example, you believe that theabend is due to bad input data. Information that is available to a program-level exit routine or program includesthe following: Command Information provided ADDRESS TWA The address of the TWA ASSIGN ABCODE The current CIC...
Page 165 - START TRANSID commands; START TRANSID; PL/I programs and error handling
START TRANSID commands In a transaction that uses the START TRANSID command to start other transactions, you must maintain logical data integrity. You can maintain data integrity by following these guidelines: 1. Always use the PROTECT option of the START TRANSID command. Thisensures that if the STA...
Page 166 - Implicit locking for files; Nonrecoverable files
About this task There are two forms of locking: 1. The implicit locking functions performed by CICS (or the access method)whenever your transactions issue a request to change data. These are describedunder: v “Implicit locking for files” v “Implicit enqueuing on logically recoverable TD destinations...
Page 167 - Recoverable files
Recoverable files For VSAM or BDAM files designated as recoverable, the duration of the lockingaction is extended. For VSAM files, the extended locking is on the updated recordonly, not the whole control interval. READ WRITE UPDATE ====== Locking ===== during update (See Note below) Task A SOT READ ...
Page 169 - Implicit enqueuing on logically recoverable TD destinations; WRITEQ TD; Implicit enqueuing on recoverable temporary storage queues
The backout fails because a duplicate key is detected in the AIX indicated bymessage DFHFC4701, with a failure code of X'F0'. There is no locking on theAIX ® key to prevent the second task taking the key before the end of the first task’s unit of work. If there is an application requirement for this...
Page 170 - EXEC CICS ENQ RESOURCE
enqueuing on temporary storage queues where concurrently executing tasks canread and change queue(s) with the same temporary storage identifier. (See“Explicit enqueuing (by the application programmer).”) Temporary storage control commands that invoke implicit enqueuing are: v WRITEQ TS v DELETEQ TS ...
Page 171 - Possibility of transaction deadlock; transaction
After a task has issued an ENQ RESOURCE( data-area ) command, any other task that issues an ENQ RESOURCE command with the same data-area parameter issuspended until the task issues a matching DEQ RESOURCE( data-area ) command, or until the unit of work ends. Note: Enqueueing on more than one resourc...
Page 173 - TBEXITS; XRCINIT exit
Procedure v Enable them in PLT programs in the first part of PLT processing. v Specify them on the system initialization parameter, TBEXITS . This takes the form TBEXITS=( name1,name2,name3,name4,name5,name6 ), where name1, name2, name3, name4, name5 , and name6 are the names of your global user exi...
Page 174 - Coding transaction backout exits
XFCLDEL global user exit XFCLDEL is invoked when backing out a unit of work that performed a writeoperation to a VSAM ESDS, or a BDAM data set. XFCBOVER global user exit XFCBOVER is invoked whenever CICS is about to decide not to backout anuncommitted update, because the record could have been updat...
Page 175 - The CICS-supplied PEP
Chapter 14. Using a program error program (PEP) The program error program (PEP) gains control after all program-level ABEND exitcode has executed and after dynamic transaction backout has been performed. About this task There is only one program error program for the whole region. Procedure 1. Decid...
Page 176 - Your own PEP
7. The CICS transaction failure program, DFHTFP, links to DFHPEP beforetransaction backout is performed. This means resources used by the abendingtransaction may not have been released. DFHPEP needs to be aware of this,and might need logic to handle resources that are still locked. 8. Do not use the...
Page 177 - Omitting the PEP
When you have corrected the error, you can re-enable the relevant installedtransaction definition to allow terminals to use it. You can also disable transactionidentifiers when transactions are not to be accepted for application-dependentreasons, and can enable them again later. The CICS Resource De...
Page 179 - Quiescing RLS data sets
Chapter 15. Resolving retained locks on recoverableresources This section describes how you can locate and resolve retained locks that arepreventing access to resources, either by CICS transactions or by batch jobs. About this task Although the main emphasis in this section is on how you can switch ...
Page 180 - The RLS quiesce and unquiesce functions; Illustration of the quiesce flow across two CICS regions
The RLS quiesce and unquiesce functions The RLS quiesce and unquiesce functions are initiated by a CICS command in oneregion, and propagated by the VSAM RLS quiesce interface to other CICS regionsin the sysplex. When these functions are complete, the ICF catalog shows the quiesce state of thetarget ...
Page 182 - Other quiesce interface functions; Non-BWO data set backup start
(4a) SMSVSAM uses the coupling facility to propagate the request to the otherSMSVSAM servers in the sysplex. 5. The CICS RLS quiesce exit program schedules a CICS region task (CFQR) toperform asynchronously the required quiesce actions in that CICS region. 6. When CICS has closed all open RLS ACBs f...
Page 184 - Switching from RLS to non-RLS access mode; always; Exception for read-only operations
Lost locks recovery complete A quiesce interface function initiated by VSAM. VSAM takes actionassociated with a sphere having completed lost locks recovery on all CICSregions that were sharing the data set. SMSVSAM invokes the CICS RLS quiesce exit program in each region thatis registered with an SM...
Page 185 - What can prevent a switch to non-RLS access mode?
Note: If your file definitions specify an LSR pool id that is built dynamically by CICS, consider using the RLSTOLSR system initialization parameter. v Open the files non-RLS read-only mode in CICS. v Concurrently, run batch non-RLS. v When batch work is finished:– Close the read-only non-RLS mode f...
Page 186 - Investigating which retained locks are held and why
The remainder of this topic on switching to non-RLS access mode describes theoptions that are available if you need to switch to non-RLS mode and areprevented from doing so by retained locks. Resolving retained locks before opening data sets in non-RLSmode VSAM sets an ‘RLS-in-use’ indicator in the ...
Page 187 - INQUIRE DSNAME; Indoubt failure
About this task However, it does know about the uncommitted changes that are protected by suchlocks, and why the changes have not yet been committed successfully. CICS usesthis information to help you resolve any retained locks that are preventing youfrom switching to non-RLS access mode. INQUIRE DS...
Page 188 - Commit failure; SHCDS LIST subcommands; Resolving retained locks and preserving data integrity
v Commit failure , where a unit of work has failed during the commit action. The commit action may be either to commit the changes made by a completed unitof work, or to commit the successful backout of a unit of work. This failure iscaused by a failure of the SMSVSAM server, which is returned as RL...
Page 189 - Choosing data availability over data integrity
4. If a unit of work has been shunted with a different CAUSE and REASON,review the descriptions of these values in the INQUIRE UOWDSNFAIL command to determine what action to take to allow the shunted unit of work to complete. Choosing data availability over data integrity There may be times when you...
Page 190 - The batch-enabling sample programs; CEMT command examples
Diagnostic messages DFHFC3003 and DFHFC3010 are issued for each log record. If a data set has both indoubt-failed and other (backout- or commit-) failed units ofwork, deal with the indoubt UOWs first, using SET DSNAME UOWACTION,because this might result in other failures which can then be cleared by...
Page 193 - Post-batch processing
v Do not use DENYNONRLSUPDATE if you run non-RLS work after specifyingPERMITNONRLSUPDATE. The permit status is automatically reset by the CICSregions that hold retained locks when they open the data set in RLS mode. Post-batch processing After a non-RLS program has been permitted to override retaine...
Page 194 - Coupling facility data table retained locks
Coupling facility data table retained locks Recoverable coupling facility data table records can be the subject of retained locks,like any other recoverable CICS resource that is updated in a unit of work thatsubsequently fails. A recoverable CFDT supports indoubt and backout failures. If a unit of ...
Page 195 - Procedure for moving a data set with retained locks; Using the REPRO method
Chapter 16. Moving recoverable data sets that have retainedlocks There may be times when you need to re-define a VSAM data set by creating anew data set and moving the data from the old data set to the new data set. About this task For example, you might need to do this to make a data set larger. In...
Page 196 - SHCDS FRSETRR
The following access method services examples assume that CICS.DATASET.A needs to be redefined and the data moved to a data set named CICS.DATASET.B , which is then renamed: DEFINE CLUSTER (NAME(CICS.DATASET.B) ...REPRO INDATASET(CICS.DATASET.A) OUTDATASET(CICS.DATASET.B)DELETE CICS.DATASET.AALTER C...
Page 197 - Using the EXPORT and IMPORT functions
This makes the data set unavailable while the move from old to new is inprogress, and also allows the following unbind operation to succeed. 4. Issue the SHCDS FRUNBIND subcommand to unbind any retained locksagainst the old data set. For example: SHCDS FRUNBIND(CICS.DATASET.A) This enables SMSVSAM t...
Page 198 - Rebuilding alternate indexes
v Create a new empty data set into which the copy is to be restored, and useIMPORT to copy the data from the exported version of the data set to the newempty data set. v Use SHCDS FRSETRR to mark the original data set as being under maintenance. v Use SHCDS FRUNBIND to unbind the locks from the orig...
Page 199 - Chapter 17. Forward recovery procedures; Forward recovery of data sets accessed in RLS mode
Chapter 17. Forward recovery procedures If a data set that is being used by CICS fails, perhaps because of physical damageto a disk, you can recover the data by performing forward recovery of the data set. About this task Your forward recovery procedures can be based either on your own, or anISV-sup...
Page 200 - Recovery of data set with volume still available
Recovery of data set with volume still available The procedure described here is necessary to preserve any retained locks that areheld by SMSVSAM against the data in the old data set. Unless you follow all thesteps of this procedure, the locks will not be valid for the new data set, withpotential lo...
Page 201 - Recovery of data set with loss of volume
9. Alter the new data set name Use access method services to rename the new data set to the name of the olddata set. ALTER CICS.DATASETB NEWNAME(CICS.DATASETA) You must give the restored data set the name of the old data set to enable thefollowing bind operation to succeed. 10. Issue the FRBIND subc...
Page 202 - Volume recovery procedure using CFVOL QUIESCE
There are several methods you can use to recover data sets after the loss of avolume. Whichever method you use (whether a volume restore, a logical data setrecovery, or a combination of both), you need to ensure SMSVSAM puts data setsinto a lost locks state to protect data integrity. This means that...
Page 204 - Example of recovery using data set backup:
This clears the SMSVSAM CFVOL-QUIESCED state and allows SMSVSAMRLS access to the volume. CICS ensures that access is not allowed to thedata sets that will eventually be forward recovered, but the volume isavailable for other data sets. 6. Run data set forward recovery jobs. The following two example...
Page 207 - Recoverlocks; Nolostlocks
work. Assuming that all CICS regions are active, and there are no indoubtUOWs, lost locks processing, for all data sets except the ones on the failedvolume, should complete quickly. 9. In this example, CEMT INQUIRE UOWDSNFAIL on CICS region ADSWA01Dshowed UOW failures only for the RLSADSW.VF04D.TELL...
Page 208 - Example of recovery using volume backup:
waits for indoubt resolution before allowing general access to the data set. In sucha situation you can still release the locks immediately, using the SET DSNAMEcommand, although in most cases you will lose data integrity. See “Lost locksrecovery” on page 89 for more information about resolving indo...
Page 209 - Catalog recovery
ROUTE *ALL,VARY SMS,SMSVSAM,TERMINATESERVER 8. When all SMSVSAM servers were down, we deleted the IGWLOCK00 lockstructure with the MVS command: VARY SMS,SMSVSAM,FORCEDELETELOCKSTRUCTURE 9. We restarted the SMSVSAM servers with the MVS command: ROUTE *ALL,VARY SMS,SMSVSAM,ACTIVE CICS was informed dur...
Page 210 - Forward recovery of data sets accessed in non-RLS mode; Create a new data set; Procedure for failed RLS mode forward recovery operation
that before running SHCDS CFREPAIR, the restored user catalog must be importconnected to the master catalog on all systems (see the “Recovering SharedCatalogs” topic in DFSMS/MVS Managing Catalogs ). Forward recovery of data sets accessed in non-RLS mode For data sets accessed in non-RLS mode, use t...
Page 213 - Procedure for failed non-RLS mode forward recovery operation
Procedure for failed non-RLS mode forward recovery operation If you are not successful in applying all the forward recovery log data to a restoredbackup, you are forced to abandon the forward recovery, and revert to your mostrecent full backup. However, during its recovery processing, CICS assumes t...
Page 215 - BWO and concurrent copy; Concurrent copy dump; BWO dump; BWO dump using concurrent copy; BWO and backups
Chapter 18. Backup-while-open (BWO) The BWO facility, together with other system facilities and products, allows you totake a backup copy of a VSAM data set while it remains open for update. Many CICS applications depend on their data sets being open for update over along period of time. Normally, y...
Page 216 - BWO requirements
forward-recovery logs. Long-running transactions, automated teller machines, andcontinuously available applications require the database to be up and runningwhen the backup is being taken. The concurrent copy function used along with BWO by DFSMSdss allows backupsto be taken with integrity even when...
Page 217 - Hardware requirements; Which data sets are eligible for BWO; VSAM control interval or control area split
Hardware requirements The concurrent copy function is supported by the IBM ® 3990 Model 3 with the extended platform and the IBM 3990 Model 6 control units. Which data sets are eligible for BWO You can use BWO only for: v Data sets that are on SMS-managed storage and that have an integrated catalogf...
Page 218 - How you request BWO; Specifying BWO using access method services; TYPECICS
How you request BWO You can define files as eligible for BWO in one of two ways. Procedure Decide which method you want to use for data sets: v If your data set is accessed in RLS mode, you must define the BWO option inthe ICF catalog. Defining BWO in the ICF catalog requires DFSMS 1.3. v If your da...
Page 219 - Specifying BWO on CICS file resource definitions
v But if you specify BWO(TYPECICS), and the PTF has not been applied, and youhave not specified LOG(ALL) and a forward recovery log stream name, BWOprocessing for RLS remains disabled for such files. To achieve BWO for the file,you must either:– apply the PTF,– or specify LOG(ALL) and a forward reco...
Page 220 - Removing BWO attributes; CEMT SET FILE CLOSED; ALTER NULLIFY BWO; Systems administration; Batch jobs
Removing BWO attributes If you want to remove BWO attributes from your data sets, you must follow thecorrect procedure to avoid problems when taking subsequent back ups. Procedure 1. Close the VSAM data set either by shutting down CICS normally or issuing thecommand CEMT SET FILE CLOSED . Do not per...
Page 221 - BWO processing
After an uncontrolled or immediate shutdown, further BWO backups might betaken by DFSMShsm, because the BWO status in the ICF catalog is not reset. Thesebackups should be discarded; only the non-BWO backups taken at the end of thebatch window should be used during forward recovery, together with the...
Page 222 - File opening; First file opened in non-RLS mode against a cluster
Each of these operations is discussed in the following sections. File opening Different processing is done for each of the three cases when a file is opened for anupdate. The following processing takes place: v First file opened for update against a cluster v Subsequent file opened for update agains...
Page 224 - Restriction for VSAM upgrade set
v If the file was defined with BACKUPTYPE(STATIC) and the ICF catalogindicates that the data set is already ineligible for BWO, the ICF catalog is notupdated.However, if the ICF catalog indicates that the data set is currently eligible forBWO, IGWABWO makes it ineligible for BWO and sets the recover...
Page 225 - Shutdown and restart; Controlled shutdown; Data set backup and restore
Shutdown and restart The way CICS closes files is determined by whether the shutdown is controlled,immediate, or uncontrolled. Controlled shutdown During a controlled shutdown, CICS closes all open files defined in the FCT. Thisensures that, for files that are open for update and eligible for BWO, t...
Page 226 - VSAM access method services
When you use DFSMShsm, you still use DFSMSdss as the data mover. You canspecify this using the DFSMShsm SETSYS command: SETSYS DATAMOVER(DSS) The DFSMS processing at the start of backup is dependent on the DFSMS releaselevel. For releases before DFSMS 1.2, DFSMSdss first checks the BWO attributes in...
Page 227 - Data set restore; Forward recovery logging; Data sets
DFSMSdfp must now disallow the pending change to ‘BWO enabled’ (andDFSMSdss must fail the backup) because, if the split did not finish before theend of the backup, the invalid backup would not be discarded. v From ‘BWO disabled and VSAM split occurred’ to ‘BWO enabled’. This statechange could be att...
Page 229 - Recovering VSAM spheres with AIXs
The forward recovery utility should ALLOCATE, with DISP=OLD, the data set thatis to be recovered. This prevents other jobs accessing a back level data set andensures that data managers such as CICS are not still using the data set. Before the data set is opened, the forward recovery utility should s...
Page 230 - An assembler program that calls DFSMS callable services
An assembler program that calls DFSMS callable services *ASM XOPTS(CICS,NOEPILOG,SP)** A program that can be run as a CICS transaction to Read and Set * the BWO Indicators and BWO Recovery Point via DFSMS Callable * Services (IGWABWO). ** Invoke the program via a CICS transaction as follows: ** Rxxx...
Page 235 - Chapter 19. Disaster recovery; Why have a disaster recovery plan?
Chapter 19. Disaster recovery If your CICS system is normally available about 99 percent of the time, it would bewise to look at your disaster recovery plan. The same pressure that drives highavailability drives the need for timely and current disaster recovery. You must plan what level of disaster ...
Page 236 - Disaster recovery testing
acceptable. If you are located in an area prone to hurricanes or earthquakes, forexample, a disaster recovery site next door would be pointless. When you are planning for disaster recovery, consider the cost of being unable tooperate your business for a period of time. You have to consider the numbe...
Page 237 - Six tiers of solutions for off-site recovery; Tier 1 - physical removal
v How critical and sensitive your business processes are: the more critical they are,the more frequently testing may be required. Six tiers of solutions for off-site recovery One blueprint for recovery planning describes a scheme consisting of six tiers ofoff-site recoverability (tiers 1-6), with a ...
Page 239 - Tier 2 - physical removal with hot site; Tier 3 - electronic vaulting
Tier 1 Tier 1 provides a very basic level of disaster recovery. You will lose data in thedisaster, perhaps a considerable amount. However, tier 1 allows you to recover andprovide some form of service at low cost. You must assess whether the loss of dataand the time taken to restore a service will pr...
Page 240 - Tier 0–3 solutions
The advantage of tier 3 is that you should be able to provide a service to yourusers quite rapidly. You must assess whether the loss of data will prevent yourcompany from continuing in business. Figure 20 summarizes the tier 3 solution. Tier 3 is similar to tier 2. The difference is that data is ele...
Page 241 - Tier 4 - active secondary site
The advantage of these methods is their low cost. The disadvantages of these methods are: v Recovery is slow, and it can take days or weeks to recover. v Any recovery is incomplete, because any updates made after the point-in-timebackup are lost. v Disaster recovery is risky, and difficulties in tes...
Page 243 - Tier 6 - minimal to zero data loss
v Cost is higher than for the tier 1 to 3 solutions, because you need dedicatedhardware, software, and communication links. Tier 5 - two-site, two-phase commit A tier 5 solution is appropriate for a custom-designed recovery plan with specialapplications. Because these applications must be designed t...
Page 245 - Tier 4–6 solutions
support the XRC DFSMS/MVS host, and one for the recovery 3990, this allows atotal of 86 km (53.4 miles) between the 3990s. If you use channel extenders withXRC, there is no limit on the distance between your primary and remote site. For RRDF there is no limit to the distance between the primary and ...
Page 246 - Disaster recovery and high availability
Disaster recovery and high availability This topic describes the tier 6 solutions for high availability and data currencywhen recovering from a disaster. Peer-to-peer remote copy (PPRC) and extended remote copy(XRC) PPRC and XRC are both 3990-6 hardware solutions that provide data currency toseconda...
Page 247 - Use PPRC for high value transactions
v IMS write-ahead data set (WADS) and IMS online log data set (OLDS) v ACBLIB for IMS v Boot-strap data set (BSDS), the catalog and the directory for DB2 v DB2 logs v Any essential non-database volumes CICS applications can use non-DASD storage for processing data. If yourapplication depends on this...
Page 248 - Remote Recovery Data Facility
where there is a high volume of transactions, but each transaction is typically lessthan 200 dollars in value. Other benefits of PPRC and XRC PPRC or XRC may eliminate the need for disaster recovery backups to be taken atthe primary site, or to be taken at all. PPRC allows you to temporarily suspend...
Page 249 - Choosing between RRDF and 3990-6 solutions; Disaster recovery personnel considerations
between the primary and secondary sites is interrupted. Remote logging is only aseffective as the currency of the data that is sent off-site. RRDF transports logstream data to a remote location in real-time, within seconds of the log operationat the primary site. When the RRDF address space at the r...
Page 250 - Returning to your primary site; Disaster recovery facilities
You should ensure that a senior manager is designated as the disaster recoverymanager. The recovery manager must make the final decision whether to switch toa remote site, or to try to rebuild the local system (this is especially true if youhave adopted a solution that does not have a warm or hot st...
Page 252 - Final summary
If a disaster occurs at the primary site, your disaster recovery procedures shouldinclude recovery of VSAM data sets at the designated remote recovery site. Youcan then emergency restart the CICS regions at the remote site so that they canbackout any uncommitted data. Special support is needed for R...
Page 255 - Notices
Notices This information was developed for products and services offered in the U.S.A.IBM may not offer the products, services, or features discussed in this document inother countries. Consult your local IBM representative for information on theproducts and services currently available in your area...
Page 256 - Trademarks
Such information may be available, subject to appropriate terms and conditions,including in some cases, payment of a fee. The licensed program described in this document and all licensed materialavailable for it are provided by IBM under terms of the IBM Customer Agreement,IBM International Programm...
Page 257 - Bibliography; CICS books for CICS Transaction Server for z/OS
Bibliography CICS books for CICS Transaction Server for z/OS General CICS Transaction Server for z/OS Program Directory , GI13-0536 CICS Transaction Server for z/OS What's New , GC34-6994 CICS Transaction Server for z/OS Upgrading from CICS TS Version 2.3 , GC34-6996 CICS Transaction Server for z/OS...
Page 258 - CICSPlex SM books for CICS Transaction Server for z/OS; General; Other CICS publications
CICS Shared Data Tables Guide , SC34-7017 CICSPlex SM books for CICS Transaction Server for z/OS General CICSPlex SM Concepts and Planning , SC34-7044 CICSPlex SM Web User Interface Guide , SC34-7045 Administration and Management CICSPlex SM Administration , SC34-7005 CICSPlex SM Operations Views Re...
Page 259 - Accessibility
Accessibility Accessibility features help a user who has a physical disability, such as restrictedmobility or limited vision, to use software products successfully. You can perform most tasks required to set up, run, and maintain your CICSsystem in one of these ways: v using a 3270 emulator logged o...
Page 261 - Index
Index A abend handling 95, 151ACID properties, of a transaction 20activity keypoints description 22 ADCD abend 159AFCF abend 159AFCW abend 159AIRDELAY 39AIX (alternate index) 130, 147alternate index (AIX) 130, 147alternate indexes preserving locks over a rebuild 186 application processing unit desig...
Page 265 - Readers’ Comments — We'd Like to Hear from You
Readers’ Comments — We'd Like to Hear from You CICS Transaction Server for z/OSVersion 4 Release 1Recovery and Restart Guide Publication No. SC34-7012-01 We appreciate your comments about this publication. Please comment on specific errors or omissions, accuracy,organization, subject matter, or comp...