z/OS System Problems Resolved - Quickly and Effectively


When system problems occur companies often automatically send the dumps to IBM, without looking at them first. However, as we all know, z/OS today is almost invariably a multi-vendor software environment.
This course teaches you the vital (yet simple) techniques that will quickly glean maximum information from the diagnostic data. This lets you identify the vendor product responsible for the problem, and ultimately verify their diagnosis of the problem.

This course describes and explains what can go wrong in an IBM z Systems environment, and what you can do about it as an operator or systems programmer. It looks at failure situations from many points of view, including hardware problems and the software environment.

The software environment is further examined by looking at the Recovery Termination Manager (RTM) - the 'cleaning-up' function of z/OS - and its ABEND concept. All the different reports that come out of a z/OS system in conjunction with failures (messages, dumps, traces, etc.) are also discussed. The most common reasons for system ABENDs (and how you can analyze the information coming out of the system when they occur) are also covered.

The course is available for exclusive one-company presentations and for live presentation over the Internet, via the Virtual Classroom Environment service.

Virtual Classroom Environment dates - click to book!

UK Start Times

23 June 2025

What is a 'Virtual Classroom Environment'?

 

What do I need?

  • webcam
  • headphones with microphone
  • sufficient bandwidth, at least 1.5 Mb/s in each direction.

What you will learn

On successful completion of this course you will be able to:

  • identify which software component caused the problem
  • identify the vendor responsible for the problem
  • glean the most relevant diagnostic information in the minimum time
  • use the appropriate diagnostic procedure for each type of dump
  • identify the failing operating system component in standalone and SVC dumps
  • use various operating system data-gathering facilities such as system traces, LOGREC, and SLIP
  • locate information in various manuals that is critical to problem resolution
  • develop a methodology to speedily extract the required information for resolving a problem situation.

Who Should Attend

This course is designed for all those in technical support and operations who are responsible for problem determination.

Prerequisites

To benefit from this course, participants need both the ability to read Assembler code and familiarity with z/OS internal operations and data areas (including the concept of control block chaining). These prerequisites can be met by completing the RSM courses Using z/OS Assembler, z/OS System Anatomy Part 1 and z/OS SystemAnatomy Part 2.

Duration

3 days

Fee (per attendee)

£1890 (ex VAT)

 

This includes free online 24/7 access to course notes.

 

Hard copy course notes are available on request from rsmshop@rsm.co.uk

at £50.00 plus carriage per set.

Course Code

MEDG

Contents

Interactive Problem Control System

Control block/data area; Information sources; Control block header; Control block data area map; Cross reference table; Fields and subfields; Field redefinitions; Control block chaining; Finding control blocks; The Prefix Area (PSA); The new Prefix Area (PSA); Dump types; IPCS introduction: what is IPCS?, What makes up IPCS?; Getting started with IPCS - Primary Option Menu; Default values selection; Primary Option Menu; Data entry panel; Pointer stack panel; Getting around in IPCS browse; IPCS subcommand entry panel; IPCS command output display; IPCS LIST command; Indirect addressing; Displaying Control Blocks; Creating SYMBOLS: Dump Directory; Additional Useful Commands; Dump analysis panel; Component Data Analysis Panel; STATUS; Analysis commands; Dump Management panel.

Recovery & Termination

MVS's recovery management; RMS; What does RTM do?; Interrupt types; Anatomy of an Interrupt; RTM - The Big Picture; How is RTM invoked?; Normal termination; Abnormal termination - problem types; Program check; Software 'Abend'; Abnormal termination - recovery; Recovery routines; RTM status information; ESPIE environment; ESPIE processing; ESTAE recovery routines; ESTAE environment; STAE Control Blocks (SCB); ESTAE processing; Percolation; Functional Recovery Routines; FRR environment; FRR stacks; RTM2WA; SDWA; Variable Recording Area; Interpreting the SDWA; Interpreting the Variable Recording Area; Logrec detail reports.

Request Block Analysis

Address space structures; RB loss of control; Linkage stacks; RB analysis procedure; Linkage Stack analysis; General analysis; RB analysis.

System Trace

Starting the System Trace; Formatting the Trace; Sequence of events; Interpreting Traces; System Trace tips.

SVC Dump Analysis Approach

Generating SVC dumps; Dump Analysis and Elimination; Types of SVC Dump; Problem resolution overview; Dump TITLE; SDWA; History; RTM2WA; Other dump types.

Multi Processor Environments

Tightly coupled processing; Prefixing; Processor coexistence; Processor STATUS; Work In Progress; Interrupt information.

Locks

The problem; An example of what can go wrong; Serialization via LOCKS; Lock varieties; Locking Hierarchy; Locking Mechanics (SPIN); Spin Loop Identification; Spin Lock Holder; Local/CML Locks; Locking Mechanics; Global Suspend Locks ANALYZE; Locks Held; Locking Mechanics (CPU LOCK); SPIN lock summary; SUSPEND lock summary; ANALYZE.

Dispatcher

What does it mean to be dispatched?; Where does the dispatcher run?; Dispatchable units of work; Who calls the Dispatcher?; Special exits; Service Request Block routines; Service Request Block (SRB); SRB example - IOS post; Service Request Block (SRB); Suspended Service Request Block (SSRB); SRB priorities; SRB scheduling with IEAMSCHD; SRB enclaves; Dispatcher queues; Scheduling service requests; Address spaces; ASCB/ASXB contents; Finding work within an address space - tasks; TCB contents; TCB chaining; Address space task structure; Serialization with Intersect; Dispatcher indicators; Global problem determination; Global indicators - SRB queues.

Address Space Control

Cross Memory Services; XMS instructions; PC & PT/PR; XMS authorisation; Primary, Secondary & Home modes; Access Register mode; SSAR; Access Register Translation (ART); Access lists; ALETs.

SAD Analysis Approach

Big picture; Dump environments; When should A SADUMP be taken?; Pre SADUMP considerations; Taking a Standalone Dump; Stand Alone Dump analysis path selection; Disabled Wait analysis path; Enabled Wait Analysis path; Enabled Running Analysis path; Disabled Running Analysis path.

Input/Output Supervisor

IOS drivers; Performing I/O; I/O flow; IOS analysis - high level; Active I/O analysis; IOS failure analysis.

Real Storage Manager

Types of storage; Dynamic Address Translation; Identifying The STD; Managing real storage; RSM high level check; Detailed analysis - high fixed page utilisation; Detailed analysis - other problems; History - Component Trace.

Auxiliary Storage Manager

Paging a frame to a slot; ASM high level check; Detailed analysis: what is the problem?, who is affected?


What the students say

really good fun whilst still being challenging and relevant. I feel a lot more confident with IPCS - and where to start looking in dumps.

Senior Network Systems Programmer

JPMorgan Chase & Co.

© RSM Technology 2024