VSAM Tutorial
Files
types Basics:
Files
are broadly classified in two Types:
1) Sequential files.
2) Direct Access files.
------------------------------------------------------------------------------------------------------------
Sequential
Files:
Sequential records are accessed serially.
This means that to read a record all the preceding records must be read.
Sequential files can be of two types:
1) Ordered Sequential.
2) Unordered Sequential.
In unordered sequential files (normal
sequential files) it is practical to read records from file and add records
at the end of the file
(OPEN file in ENTEND mode). Data is stored in the order in which they are added
to the file.
In these files it’s not practical to
delete or update record.
In Ordered sequential file records
are arranged in some order of key field or fields. When we want to insert, delete
or amend a record we must preserve the ordering. Note that it’s the
programmer’s responsibility to preserve the order. The system does not do
anything to preserve the order.
Only way to do this is to create a new
file.
Ø For insertion or updates the new file will contain the inserted or
updated record.
Ø For deletion deleted record will be missing from the new file.
Drawback of using this technique is that
we need to read the entire file and then write to another file. Thus if 10
records were to be inserted to a file of 10000 records, then 10000 reads will
be performed and 10010 writes will be performed . Thus this technique is highly
disk extensive.
To insert a record in an ordered
Sequential file:
1) All the records with a key value less than the record to be inserted
must be read and then written to the new file.
2) Then the record to be inserted must be written to the new file.
3) Finally, the remaining records must be written to the new file.
To delete a record in an ordered
Sequential file:
1) All the records with a key value less than the record to be deleted
must be written to the new file.
2) When the record to be deleted is encountered it is not written to the
new file.
3) Finally, all the remaining records must be written to the new file.
To amend a record in an ordered Sequential
file:
1) All the records with a key value less than the record to be amended
must be read and then written to the new file.
2) Then the record to be amended must be read the amendments applied to
it and the amended record must then be written to the new file.
3) Finally, all the remaining records must be written to the new file.
Disadvantages
of Sequential files:
Ø Sequential files are very slow to update when the hit rate is low
because the entire file must be read and then written to a new file, just to
update a few records. Hit rate is the number of records being actually
affected.
Ø For ordered sequential files preserving order is manual task.
Advantages
of Sequential files:
Ø Most efficient when the hit rate is high. No need for record position
to be calculated and no indexes required.
Ø Efficient storage as only contains data component. No indexes required.
Ø Space from deleted records is recovered.
Ø Sequential files may be stored
on serial media such as magnetic tape.
Direct
Access Files:
The problem with sequential files to have
to read the all records to reach a particular record is addressed by using
direct access files.
COBOL supports two kinds of direct access
file organizations –
1) Relative and
2) Indexed.
Relative Files:
·
Relative file makes use of the
Relative Record Number which actually indicates the record number from the start of the
file.
·
A Relative file may be
visualized as a one dimension table stored on disk, where the Relative Record
Number is the index into the table.
·
Relative files support
sequential access by allowing the active records to be read one after another.
·
Relative files support only one key .ie
Relative record number(RRN).
·
The key must be numeric
·
It must take a value between 1 and the current
highest Relative Record Number.
·
Enough room is pre-allocated to the file to contain records with Relative Record Numbers between 1 and the
highest record number. This means that say if a record is inserted with 10000
as the RRN then space for 10000 records will be allocated even if 9999 records
are empty.
·
Relative datasets provide
updates, delete and insert operations.
·
To access a records in a Relative file a Relative Record Number must
be provided. Depending upon the record
number that we provide and the start position of the file and size of record
the position of the record is determined.
·
Because the file management
system only has to make a few calculations to find the record position the
Relative file organization is the fastest of the two direct access file
organizations available in COBOL.
·
This indicates that for
Relative Files the record length should be fixed.
·
To read, insert, delete or update a record directly, the Relative
Record Number of the record must be placed in the key area and then the
operation must be applied to the file using commands like WRITE,REWRITE DELETE
etc.
Disadvantages
of using Relative Files:
·
Wasteful organization as far as space is concerned. The file will be allocated enough room to hold records from 1 to the
highest Relative Record Number used, even if only a few records have actually
been written to the file.
If First record written to the file has a
Relative Record Number of 10,000 then room for that many records is allocated
to the file.
·
Relative files cannot recover the space from deleted records.When a record is deleted in a Relative file, it is simply marked as
deleted but the actual space that used to be occupied by the record is still
allocated to the file. So if a Relative file is 560K in size when full, it will
still be 560K when you have deleted half the records.
·
The single key is limiting
because it is often the case that we need to access the file on more than one
key.
·
With Relative Files Key must be numeric. Hence we cant use it if we need to use a key to access file that is
no numeric.
·
The fact that the key must be
in the range 1 to the highest key value and that the file system allocates
space for all the records between 1 and the highest Relative Record Number
used, imposes severe constraints on the key. For instance even though the
StudentId is numeric we couldn't use it as a key because the file system would
allocate space for records from 1 to the highest StudentId written to the
file.Suppose the highest StudentId written to the file was 9876543. The file
system would allocate space for 9,876,543 records.
·
Relative files are direct access files they must be stored on direct access media such as a hard
or floppy disks. They can not be stored on magnetic tape.
Advantages
of Relative files:
·
This is the fastest direct
access organization.
·
Does not make use of indexed
structure.
·
Relative files allow sequential access to the records in the file.
Indexed Files:
·
Indexed files may have up to 255 keys.
·
Keys can be alphanumeric and numeric.
·
There will be one primary key and should be unique.
·
It is possible to read an
Indexed file sequentially on any of its keys (primary key or Alternate index).
Primary Key and Alternate key are the part of the record.
·
The key upon which the data
records are ordered is called the primary key. The other keys
are called alternate keys.
·
Records in the Indexed file are sequenced on ascending primary key.
·
For each of the alternate keys
specified in an Indexed file, an alternate index is built.
·
As well as allowing direct
access to records on the primary key or any of the 254 alternate keys, indexed
files may also be processed sequentially.
·
When processed sequentially,
the records may be read in ascending order on the primary key or on any of the
alternate keys.
·
Since the data records are in
held in ascending primary key sequence it is easy to see how the file may be
accessed sequentially on the primary key. It is not quite so obvious how
sequential on the alternate keys is achieved.
Disadvantages:
·
As Indexed file achieve direct
access by traversing a number of levels of index this is the slowest direct access
organization.
·
Indexed files require more
storage than other file organizations for base rows and index structures for
each of the index and alternate index.
·
As access to index structures
is also involved , IO is
comparatively more.
·
Space from deleted records is
only partially recovered.
·
Indexed files are direct
access files they must be stored on direct access media such as a hard or
floppy disks. They cannot
be stored on magnetic tape.
Advantages:
·
Indexed files can have multiple alphanumeric keys and
only the primary key has to be unique.
·
An indexed file may be read sequentially on any of
its keys.
---------------------------------------------------------------------------------------------------------
Virtual Storage Access Method –
Ø VSAM - is a data management
system introduced by IBM in the 1970.
Ø VSAM was, by several accounts,
intended to replace all of the earlier data management systems in use by IBM's
operating systems.
Ø Access Method Services is the
single, general-purpose utility that is used to manipulate VSAM components.
VSAM provides three types of datasets(cluster):
- Key Sequenced Data Set (KSDS)
Each
record is identified for access by specifying its key value—Part of the data
record that uniquely identifies the record from other records in the dataset.
- Entry Sequenced Data Set (ESDS)
Each record
is identified for access by specifying its physical location - the byte address
of the first data byte of each record in relationship to the beginning of the
dataset.
- Relative Record Data Set (RRDS)
Each record
is identified for access by specifying its record number - the sequence number
relative to the first record in the dataset.
VSAM
datasets are frequently referred to as clusters.
Ø
A KSDS cluster consists of two physical parts,
an index component, and a data component.
Ø ESDS and
RRDS clusters consist of only a single component, thedata component.
KSDS Cluster Component : (Indexed File)
·
Each Record contains a key Field which occur in
the same relative position in each record.
·
Records are stored in the logical sequence
based upon their key field value.
·
The index component of the KSDS cluster
contains the list of key values for the records in the cluster with pointers to
the corresponding records.
·
Records can be accessed:
o Sequentially
in order by the Key value.
o Directly by
supplying the key value.
·
Records can be deleted or added at any point
within a KSDS cluster. All other records will get organized accordingly.
ESDS Cluster Components(Sequential
files)
·
The records in an ESDS cluster are stored in the order in which
they are entered into the dataset.
·
Access is sequential.ie To read a particular
record all the preceding
records must be read
·
Each record is referenced by its relative byte address (RBA). In an ESDS dataset of 100
byte records, the RBA of the first record is 0, the RBA of the second record is
100, the RBA of the third record is 200. RBA is 4 bytes in length
·
The records in an ESDS may be accessed
sequentially, in order by RBA value, or directly, by supplying the RBA of the
desired record.
·
Records
may not be deleted from an ESDS cluster, and they may only be added (appended)
to the end of the dataset.
RRDS Cluster Components
·
The records in an RRDS cluster are stored in
fixed length slots.
·
Each record is referenced by the number of its
slot.
·
The records in an RRDS cluster may be accessed
sequentially, in relative record number order, or directly, by supplying the
relative record number of the desired record.
·
The records of an RRDS cluster must be of fixed
length.
·
When record is inserted a empty slot is used.
·
When record is deleted slot is left free
leaving the space free.
What is Control
Interval
Ø
In non-VSAM data management methods, the unit
of data that is moved between memory and the storage device is defined by the
block.
Ø
In VSAM, the unit of data that is transferred
in each physical I/O operation is defined as a control interval.
When a VSAM dataset is loaded, control
intervals are created and records are written into them.
Ø
With KSDS clusters, the entire control interval is usually not filled.
Some percentage of free space is left available for expansion. This can
be controlled using the FREESPACE parameter on the DEFINE CLUSTER command.
Ø
With ESDS clusters, each control interval is completely filled before
records are written into the next control interval in sequence.
Ø
With RRDS clusters, control intervals are filled with fixed-length
slots, each containing either an active
record or a dummy record.
Dummy record actually acts a place holder for records that are not yet
inserted. Slots containing dummy records are available for use when new
records are added to the dataset.
Control
Areas
Control
intervals are grouped together into control
areas.
The
rules used for filling and writing control areas are similar to those which
apply for control intervals.
Ø
For ESDS and RRDS clusters, control areas are
filled with control intervals that contain records.
Ø
For
KSDS clusters, some of the control intervals in each control area may consist
entirely of free space that can be used for dataset expansion.
VSAM
Catalogs
·
With non VSAM datasets we have an option of
keeping the dataset uncatloged. However
this is no option with VSAM datasets. This means that VSAM dataset has to be cataloged.
·
VSAM maintains its own catalog, which is itself
a KSDS cluster, into which catalog entries describing VSAM clusters are
recorded. The same VSAM catalog may also be used to contain the catalog
entries for non-VSAM datasets.
·
However recently newer type of catalog
Integrated Catalog Facility(ICF) is being used.
Master
Catalog:
·
Every
system that uses VSAM has one, and only one, master
catalog. The master catalog contains entries about system
datasets and VSAM structures used to manage the operation of VSAM.
·
In most computer systems, the Systems
Programming staff will have created user catalogs, which are cataloged in the master catalog;
all other users of the computer system will only be allowed to catalog datasets
in those user catalogs.
·
The master catalog is created during the System
Generation process and usually resides on the System Residence volume. The master catalog
"owns" all other VSAM resources in a computer system.
·
The master
catalog is the "VSAM King". Master Catalog is direct incharge with the VSAm
User
Catalogs
·
A user catalog is a catalog
created to contain entries about application specific datasets.
·
The information defining a user catalog is
stored into a catalog entry in the master catalog. A production system
might have any number of user catalogs, with the datasets cataloged in a
specific user catalog related by application type.
What
is the Relation between the Catalog and Volume ownership:
·
If a DASD volume contains a VASM Catalog Master
or User) the catalog must be the first VSAM object stored on that volume.
·
A VSAM catalog owns the volume on which it resides. A catalog
can also own other volumes; however, those volumes cannot also contain other
VSAM catalogs.
·
All the VSAM objects that are defined on a
volume containing a VSAM catalog must be cataloged in the catalog residing on
that volume.
What
is VSAM dataspace:
·
After we Create a Catalog on a volume and
before we start creating VSAM clusters we need to create one or more
Data spaces. A data space is an area of the direct access storage device
that is exclusively allocated for VSAM use.
·
Each VOlumne will have a VTOC(Volume table of contents. In VTOC this
space will be marked allocated to a dataset so that space will not be available
for allocation to any other use, either VSAM or non-VSAM.
Unique
Clusters
It is
possible to create VSAM clusters out of unallocated space on direct access
storage. This type of cluster has a designation of UNIQUE and essentially
consists of a separate data space which is utilized completely by the cluster
created within it. From a data management viewpoint, it is not a good
idea to create unique VSAM clusters