z/OS Diagnostics & Debugging


This course provides attendees with an effective and systematic approach to z/OS problem diagnosis. In the course the z/OS software environment is examined by looking at the Recovery Termination Manager (RTM) - the 'cleaning-up' function of MVS - and its ABEND-concept. All the different reports that come out of a z/OS system, in conjunction with failures (messages, dumps, traces, etc.) are also discussed. System ABENDs, and how you can analyze the information coming out of the system when they occur, is also covered.
Attendees will learn how to identify system problems promptly, in order to provide greater system availability. The course focuses on a debugging methodology using IPCS. Practical workshops provide an opportunity to learn to debug system problems in realistic situations.

This course is also available 'on demand' (minimum 2 students) for public presentations or for one-company, on-site presentations.

What you will learn

On successful completion of this course you will be able to:

  • understand what MVS's Recovery Termination Manager (RTM) does when programs fail
  • understand the concept of an User ABEND
  • analyze User ABEND situations
  • resolve User ABEND situations
  • report problems and communicate with applications personnel and systems programmers
  • understand what the Recovery Termination Manager (RTM) does when programs fail
  • understand the concept of an ABEND
  • analyze ABEND-situations
  • use the appropriate diagnostic procedure for each type of dump
  • identify the failing operating system component in standalone and SVC dumps
  • use various operating system data-gathering facilities such as system traces, LOGREC, and SLIP
  • locate information in various manuals that is critical to problem resolution
  • use the tools available in order to resolve common system abends without Dumps.

Who Should Attend

This course is suitable for all Systems Programmers working in the zSeries Server environment.

Prerequisites

To benefit from this course, participants need the ability to read Assembler code and familiarity with z/OS internal operations and data areas, including the concept of control block chaining. These prerequisites can be met by completing the courses Using z/OS Assembler, z/OS System Anatomy Part 1 - zArchitecture and z/OS System Anatomy Part 2 - z/OS Infrastructure & Services.

Duration

5 days

Fee (per attendee)

£2250 (ex VAT)

Course Code

ZDIA

Contents

Recovery Basics

Normal Program Termination; EXIT (SVC 3); abnormal program termination; Program Checks; system forced ABEND; program ABEND; why abnormal termination?; logical application error; program incomplete; application detected software error; system detected software error; hardware detected software error; PC FLIH and ABENDs; hardware detected software error example; Program Checks in the Supervisor; hardware problems; RTM actions; recovery; Functional Recovery Routines (FRRs); Extended Specify Task Abnormal Exit (ESTAE); system breakdown; software problem types; review questions.

z/OS Error Reporting & Dumps

System error reporting; MVS dumps; Stand-Alone Dump (SADUMP); SVC dumps; user ABEND dumps; SYSUDUMP; SYSABEND; SYSMDUMP; CEEDUMP; generating a user ABEND dump; system generated ABEND dump; snap dumps; symptom dumps; review questions.

ABEND Analysis

What is ABEND?; the MVS ABEND service; why ABEND?; allows for recovery routines ; task termination; tasks in an Address Space; how RTM is invoked; program checks; ABEND; how to trigger an ABEND; ABEND macro and SVC 13; CALLRTM macro; why not normal end?; application detected software errors; system detected software errors; all the system ABEND codes; where do you see the ABEND codes?; the NOTIFY message; the SYSLOG; the job log; the symptom dump; ABEND dumps; SVC dumps; Stand-Alone dumps; the symptom dump in the SYSLOG; the symptom dump in the job log; explanations of ABEND and reason codes; IBM z/OS manuals on the web; Quickref and similar tools; analysis approach; examples of ABEND code explanation; system messages - a good information source; system message prefix; message level; standard message types; alternative message types; message identifier and MVS components; examples of system messages; explanation of system messages; common system ABEND codes; system ABEND code numbers; common SVCs and their macros; the x22 codes - caused by outside events; the x13 codes - OPEN problems; other x13 codes; example of S013-18; 806 - Program not found; sequence of events; example of S806-04; 804, 80A, 878, 878 and DC2 - virtual storage problems; the Virtual Address Space; "above the bar"; traditional address space areas; the need for managing virtual storage; storage for the program code; storage obtained outside the program; Virtual Storage requests; limitations on Virtual Storage; ABEND and reason codes; requests for storage below 2 GB (GETMAIN and STORAGE OBTAIN); requests for storage above 2 GB (IAR64 GETSTOR); the REGION limit; the effects of different REGION values; example of ABEND S822; the MEMLIMIT parameter; example of ABEND SDC2; the 0Cx codes; the Program Check Interrupt; running RTM1; PC FLIH and ABENDs; the meaning of Program Checks; common ABENDs from Program Checks; ABEND S0C4; Storage Protect Keys; virtual address protection; reasons for translation exceptions; address truly invalid; address valid - new area; address valid - old area; other S0Cx ABENDs; PIC 0001 Operation Exception (ABEND S0C1); PIC 0002 Privileged Operation Exception (ABEND S0C2); PIC 0007 Data Exception (ABEND S0C7); the S0E0 and 0Dx codes; miscellaneous problems; problems with translations; Linkage Stack problems; the Sx37 and SB14 codes; Sx37; EOV processing; how disk data sets are allocated; Physical Sequential (PS) data sets; problems when allocating a PS data set; initial allocation; primary allocation failure; data set full; no secondary allocation (SD37-04); secondary allocations (SB37-04); example of unavailable primary allocation; example of SD37-04; message IEC031I; example of ABEND SB37-04; message IEC030I; Partitioned Data Sets (PDS); problems when allocating a PDS; initial allocation; data set full; no secondary allocation (SD37-04); secondary allocations (SE37-04); directory full (SB14-0C); example of ABEND SE37-04; message IEC032I; example of ABEND SB14; message IEC217I; Partitioned Data Sets Extended (PDSE); problems when allocating a PDSE; summary of common system ABEND codes; other ABEND codes; MVS system codes (Sxxx); user ABEND codes (Uxxxx).

Interactive Problem Control System

Control block/data area; Information sources; Control block header; Control block data area map; Cross reference table; Fields and subfields; Field redefinitions; Control block chaining; Finding control blocks; The Prefix Area (PSA); The new Prefix Area (PSA); Dump types; IPCS introduction: what is IPCS?, What makes up IPCS?; Getting started with IPCS - Primary Option Menu; Default values selection; Primary Option Menu; Data entry panel; Pointer stack panel; Getting around in IPCS browse; IPCS subcommand entry panel; IPCS command output display; IPCS LIST command; Indirect addressing; Displaying Control Blocks; Creating SYMBOLS: Dump Directory; Additional Useful Commands; Dump analysis panel; Component Data Analysis Panel; STATUS; Analysis commands; Dump Management panel.

Recovery & Termination

MVS's recovery management; RMS; What does RTM do?; Interrupt types; Anatomy of an Interrupt; RTM - The Big Picture; How is RTM invoked?; Normal termination; Abnormal termination - problem types; Program check; Software 'Abend'; Abnormal termination - recovery; Recovery routines; RTM status information; ESPIE environment; ESPIE processing; ESTAE recovery routines; ESTAE environment; STAE Control Blocks (SCB); ESTAE processing; Percolation; Functional Recovery Routines; FRR environment; FRR stacks; RTM2WA; SDWA; Variable Recording Area; Interpreting the SDWA; Interpreting the Variable Recording Area; Logrec detail reports.

Request Block Analysis

Address space structures; RB loss of control; Linkage stacks; RB analysis procedure; Linkage Stack analysis; General analysis; RB analysis.

System Trace

Starting the System Trace; Formatting the Trace; Sequence of events; Interpreting Traces; System Trace tips.

SVC Dump Analysis Approach

Generating SVC dumps; Dump Analysis and Elimination; Types of SVC Dump; Problem resolution overview; Dump TITLE; SDWA; History; RTM2WA; Other dump types.

Multi-Processor Environments

Tightly coupled processing; Prefixing; Processor coexistence; Processor STATUS; Work In Progress; Interrupt information.

Locks

The problem; An example of what can go wrong; Serialization via LOCKS; Lock varieties; Locking Hierarchy; Locking Mechanics (SPIN); Spin Loop Identification; Spin Lock Holder; Local/CML Locks; Locking Mechanics; Global Suspend Locks ANALYZE; Locks Held; Locking Mechanics (CPU LOCK); SPIN lock summary; SUSPEND lock summary.

Dispatcher

What does it mean to be dispatched?; Where does the Dispatcher run?; Dispatchable units of work; Who calls the Dispatcher?; Special exits; Service Request Block routines; Service Request Block (SRB); SRB example - IOS post; Service Request Block (SRB); Suspended Service Request Block (SSRB); SRB priorities; SRB scheduling with IEAMSCHD; SRB enclaves; Dispatcher queues; Scheduling service requests; Address spaces; ASCB/ASXB contents; Finding work within an address space - tasks; TCB contents; TCB chaining; Address space task structure; Serialization with Intersect; Dispatcher indicators; Global problem determination; Global indicators - SRB queues.

Address Space Control

Cross Memory Services; XMS instructions; PC & PT/PR; XMS authorisation; Primary, Secondary & Home modes; Access Register mode; SSAR; Access Register Translation (ART); Access lists; ALETs.

SAD Analysis Approach

Big picture; Dump environments; When should A SADUMP be taken?; Pre SADUMP considerations; Taking a Standalone Dump; Stand Alone Dump analysis path selection; Disabled Wait analysis path; Enabled Wait Analysis path; Enabled Running Analysis path; Disabled Running Analysis path

Input/Output Supervisor

IOS drivers; Performing I/O; I/O flow; IOS analysis - high level; Active I/O analysis; IOS failure analysis.

Real Storage Manager

Types of storage; Dynamic Address Translation; Identifying The STD; Managing real storage; RSM high level check; Detailed analysis - high fixed page utilisation; Detailed analysis - other problems; History - Component Trace.

Auxiliary Storage Manager

Paging a frame to a slot; ASM high level check; Detailed analysis: what is the problem?, who is affected?


© RSM Technology 2021