Saturday, 11 February 2012

File Types and VSAM basics


VSAM Tutorial
Files types Basics:
Files are broadly classified in two Types:
1)     Sequential files.
2)     Direct Access files.
------------------------------------------------------------------------------------------------------------
Sequential Files:
Sequential records are accessed serially. This means that to read a record all the preceding records must be read.

Sequential files can be of two types:
1)     Ordered Sequential.
2)     Unordered Sequential.

In unordered sequential files (normal sequential files) it is practical to read records from file and add records at the end of the file (OPEN file in ENTEND mode). Data is stored in the order in which they are added to the file.
In these files it’s not practical to delete or update record.

In Ordered sequential file records are arranged in some order of key field or fields. When we want to insert, delete or amend a record we must preserve the ordering. Note that it’s the programmer’s responsibility to preserve the order. The system does not do anything to preserve the order.
Only way to do this is to create a new file.

Ø  For insertion or updates the new file will contain the inserted or updated record.

Ø  For deletion deleted record will be missing from the new file.
Drawback of using this technique is that we need to read the entire file and then write to another file. Thus if 10 records were to be inserted to a file of 10000 records, then 10000 reads will be performed and 10010 writes will be performed . Thus this technique is highly disk extensive.

To insert a record in an ordered Sequential file:
1)     All the records with a key value less than the record to be inserted must be read and then written to the new file.
2)     Then the record to be inserted must be written to the new file.
3)     Finally, the remaining records must be written to the new file.

To delete a record in an ordered Sequential file:
1)     All the records with a key value less than the record to be deleted must be written to the new file.
2)     When the record to be deleted is encountered it is not written to the new file.
3)     Finally, all the remaining records must be written to the new file.

To amend a record in an ordered Sequential file:
1)     All the records with a key value less than the record to be amended must be read and then written to the new file.
2)     Then the record to be amended must be read the amendments applied to it and the amended record must then be written to the new file.
3)     Finally, all the remaining records must be written to the new file.

Disadvantages of Sequential files:

Ø  Sequential files are very slow to update when the hit rate is low because the entire file must be read and then written to a new file, just to update a few records. Hit rate is the number of records being actually affected.

Ø  For ordered sequential files preserving order is manual task.

Advantages of Sequential files:

Ø  Most efficient when the hit rate is high. No need for record position to be calculated and no indexes required.

Ø  Efficient storage as only contains data component. No indexes required.

Ø  Space from deleted records is recovered. 

Ø  Sequential files may be stored on serial media such as magnetic tape.


Direct Access Files:

The problem with sequential files to have to read the all records to reach a particular record is addressed by using direct access files.

COBOL supports two kinds of direct access file organizations –
1)     Relative and
2)     Indexed.

Relative Files:
·        Relative file makes use of the Relative Record Number which actually indicates the record number from the start of the file.

·        A Relative file may be visualized as a one dimension table stored on disk, where the Relative Record Number is the index into the table. 

·        Relative files support sequential access by allowing the active records to be read one after another.

·        Relative files support only one key .ie Relative record number(RRN).

·        The key must be numeric

·        It must take a value between 1 and the current highest Relative Record Number.
·        Enough room is pre-allocated to the file to contain records with Relative Record Numbers between 1 and the highest record number. This means that say if a record is inserted with 10000 as the RRN then space for 10000 records will be allocated even if 9999 records are empty.

·        Relative datasets provide updates, delete and insert operations.

http://www.csis.ul.ie/cobol/course/Resources/pics/I-DFIntroFig1.gif
·        To access a records in a Relative file a Relative Record Number must be provided. Depending upon the record number that we provide and the start position of the file and size of record the position of the record is determined.

·        Because the file management system only has to make a few calculations to find the record position the Relative file organization is the fastest of the two direct access file organizations available in COBOL.
·        This indicates that for Relative Files the record length should be fixed.

·        To read, insert, delete or update a record directly, the Relative Record Number of the record must be placed in the key area and then the operation must be applied to the file using commands like WRITE,REWRITE DELETE etc.

Disadvantages of using Relative Files:

·        Wasteful organization as far as space is concerned. The file will be allocated enough room to hold records from 1 to the highest Relative Record Number used, even if only a few records have actually been written to the file.  If First record written to the file has a Relative Record Number of 10,000 then room for that many records is allocated to the file. 

·        Relative files cannot recover the space from deleted records.When a record is deleted in a Relative file, it is simply marked as deleted but the actual space that used to be occupied by the record is still allocated to the file. So if a Relative file is 560K in size when full, it will still be 560K when you have deleted half the records.

·        The single key is limiting because it is often the case that we need to access the file on more than one key.

·        With Relative Files Key must be numeric. Hence we cant use it if we need to use a key to access file that is no numeric.
·        The fact that the key must be in the range 1 to the highest key value and that the file system allocates space for all the records between 1 and the highest Relative Record Number used, imposes severe constraints on the key. For instance even though the StudentId is numeric we couldn't use it as a key because the file system would allocate space for records from 1 to the highest StudentId written to the file.Suppose the highest StudentId written to the file was 9876543. The file system would allocate space for 9,876,543 records.

·         Relative files are direct access files they must be stored on direct access media such as a hard or floppy disks. They can not be stored on magnetic tape.

Advantages of Relative files:
·        This is the fastest direct access organization.

·        Does not make use of indexed structure.

·         Relative files allow sequential access to the records in the file.

Indexed Files:

·        Indexed files may have up to 255 keys.

·        Keys can be alphanumeric and numeric.

·        There will be one primary key and should be unique.

·        It is possible to read an Indexed file sequentially on any of its keys (primary key or Alternate index).

Primary Key and Alternate key are the part of the record.
·        The key upon which the data records are ordered is called the primary key. The other keys are called alternate keys.

·        Records in the Indexed file are sequenced on ascending primary key.
·        For each of the alternate keys specified in an Indexed file, an alternate index is built. 

·        As well as allowing direct access to records on the primary key or any of the 254 alternate keys, indexed files may also be processed sequentially.
·        When processed sequentially, the records may be read in ascending order on the primary key or on any of the alternate keys.

·        Since the data records are in held in ascending primary key sequence it is easy to see how the file may be accessed sequentially on the primary key. It is not quite so obvious how sequential on the alternate keys is achieved.

Disadvantages:

·        As Indexed file achieve direct access by traversing a number of levels of index this is the slowest direct access organization.

·        Indexed files require more storage than other file organizations for base rows and index structures for each of the index and alternate index.
·        As access to index structures is also involved , IO is comparatively more.
·        Space from deleted records is only partially recovered.

·        Indexed files are direct access files they must be stored on direct access media such as a hard or floppy disks. They cannot be stored on magnetic tape.

Advantages:                                

·        Indexed files can have multiple alphanumeric keys and only the primary key has to be unique.

·        An indexed file may be read sequentially on any of its keys.
---------------------------------------------------------------------------------------------------------
Virtual Storage Access Method –
Ø  VSAM - is a data management system introduced by IBM in the 1970.

Ø  VSAM was, by several accounts, intended to replace all of the earlier data management systems in use by IBM's operating systems. 

Ø  Access Method Services is the single, general-purpose utility that is used to manipulate VSAM components.

VSAM provides three types of datasets(cluster):
  • Key Sequenced Data Set (KSDS)
Each record is identified for access by specifying its key value—Part of the data record that uniquely identifies the record from other records in the dataset.
  • Entry Sequenced Data Set (ESDS)
Each record is identified for access by specifying its physical location - the byte address of the first data byte of each record in relationship to the beginning of the dataset.
  • Relative Record Data Set (RRDS)
Each record is identified for access by specifying its record number - the sequence number relative to the first record in the dataset. 
VSAM datasets are frequently referred to as clusters. 

Ø  A KSDS cluster consists of two physical parts, an index component, and a data component. 
Ø  ESDS and RRDS clusters consist of only a single component, thedata component.

KSDS Cluster Component :  (Indexed File)
·        Each Record contains a key Field which occur in the same relative position in each record.
·        Records are stored in the logical sequence based upon their key field value.
·        The index component of the KSDS cluster contains the list of key values for the records in the cluster with pointers to the corresponding records.
·        Records can be accessed:
o   Sequentially in order by the Key value.
o   Directly by supplying the key value.
·        Records can be deleted or added at any point within a KSDS cluster. All other records will get organized accordingly.
ESDS Cluster Components(Sequential files)
·        The records in an ESDS cluster are stored in the order in which they are entered into the dataset.
·        Access is sequential.ie To read a particular record all the preceding records must be read
·        Each record is referenced by its relative byte address (RBA).  In an ESDS dataset of 100 byte records, the RBA of the first record is 0, the RBA of the second record is 100, the RBA of the third record is 200. RBA is 4 bytes in length
·        The records in an ESDS may be accessed sequentially, in order by RBA value, or directly, by supplying the RBA of the desired record.
·        Records may not be deleted from an ESDS cluster, and they may only be added (appended) to the end of the dataset.
RRDS Cluster Components
·        The records in an RRDS cluster are stored in fixed length slots. 
·        Each record is referenced by the number of its slot.
·        The records in an RRDS cluster may be accessed sequentially, in relative record number order, or directly, by supplying the relative record number of the desired record.
·        The records of an RRDS cluster must be of fixed length. 
·        When record is inserted a empty slot is used.
·        When record is deleted slot is left free leaving the space free.

What is Control Interval

Ø  In non-VSAM data management methods, the unit of data that is moved between memory and the storage device is defined by the block. 
Ø  In VSAM, the unit of data that is transferred in each physical I/O operation is defined as a control interval.
When a VSAM dataset is loaded, control intervals are created and records are written into them. 
Ø  With KSDS clusters, the entire control interval is usually not filled.  Some percentage of free space is left available for expansion.  This can be controlled using the FREESPACE parameter on the DEFINE CLUSTER command.
Ø  With ESDS clusters, each control interval is completely filled before records are written into the next control interval in sequence
Ø  With RRDS clusters, control intervals are filled with fixed-length slots, each containing either an active record or a dummy record.  Dummy record actually acts a place holder for records that are not yet inserted. Slots containing dummy records are available for use when new records are added to the dataset.

Control Areas

Control intervals are grouped together into control areas
The rules used for filling and writing control areas are similar to those which apply for control intervals. 
Ø   For ESDS and RRDS clusters, control areas are filled with control intervals that contain records. 
Ø  For KSDS clusters, some of the control intervals in each control area may consist entirely of free space that can be used for dataset expansion.

VSAM Catalogs

·        With non VSAM datasets we have an option of keeping the dataset uncatloged.  However this is no option with VSAM datasets. This means that VSAM dataset has to be cataloged.
·        VSAM maintains its own catalog, which is itself a KSDS cluster, into which catalog entries describing VSAM clusters are recorded.  The same VSAM catalog may also be used to contain the catalog entries for non-VSAM datasets.
·        However recently newer type of catalog Integrated Catalog Facility(ICF) is being used.

Master Catalog:

·        Every system that uses VSAM has one, and only one, master catalog.  The master catalog contains entries about system datasets and VSAM structures used to manage the operation of VSAM.
·        In most computer systems, the Systems Programming staff will have created user catalogs, which are cataloged in the master catalog; all other users of the computer system will only be allowed to catalog datasets in those user catalogs.
·        The master catalog is created during the System Generation process and usually resides on the System Residence volume.  The master catalog "owns" all other VSAM resources in a computer system.
·        The master catalog is the "VSAM King". Master Catalog is direct incharge with the VSAm

User Catalogs

·        A user catalog is a catalog created to contain entries about application specific datasets. 
·        The information defining a user catalog is stored into a catalog entry in the master catalog.  A production system might have any number of user catalogs, with the datasets cataloged in a specific user catalog related by application type.

What is the Relation between the Catalog and Volume ownership:

·        If a DASD volume contains a VASM Catalog Master or User) the catalog must be the first VSAM object stored on that volume.
·        A VSAM catalog owns the volume on which it resides. A catalog can also own other volumes; however, those volumes cannot also contain other VSAM catalogs. 
·        All the VSAM objects that are defined on a volume containing a VSAM catalog must be cataloged in the catalog residing on that volume. 

What is VSAM dataspace:

·        After we Create a Catalog on a volume and before  we start creating VSAM clusters we need to create one or more Data spaces. A data space is an area of the direct access storage device that is exclusively allocated for VSAM use. 
·        Each VOlumne will have  a VTOC(Volume table of contents. In VTOC this space will be marked allocated to a dataset so that space will not be available for allocation to any other use, either VSAM or non-VSAM.

Unique Clusters

It is possible to create VSAM clusters out of unallocated space on direct access storage.  This type of cluster has a designation of UNIQUE and essentially consists of a separate data space which is utilized completely by the cluster created within it.  From a data management viewpoint, it is not a good idea to create unique VSAM clusters