Tux on VM

Last updated on:
Sunday, July 06, 2008

Software

Information

Community

News

Fun

Credits




Valid HTML 4.01!
Linux for Big Iron

Notes on using an STK Iceberg/IBM RVA on Linux zSeries

This information was contributed by Jim Sibley on July 19, 2002.

We have been successfully using the STK Iceberg/IBM RVA as DASD storage for Linux on the S/390-zSeries in LPAR and VM environments in our software lab since July, 2000. It behaves much like any other DASD device with a few exceptions, which I discuss here.

The STK Iceberg, marketed by IBM as the RVA, is a large scale direct access data storage device using the IBM OS (MVS/VM/zOS/zVM) Extended Count Key Data (ECKD) recording format. It provides hardware data compression to minimize the size of the "backend" or real disk for data recording and RAID5 protection for the recorded data. It appears to Linux as a 3990 controller and the volumes are most often 3390-3 images, but may be other 3390 sizes or 3380 images. The information I mention below is for 3390-3 images.

Only one volume on the RVA we used is an MVS volume for MVS IXFP communication and reporting from an MVS LPAR. All others are Linux formatted using the ext2 file system. The MVS IXFP LISTCFG DEV report was used to show the backend space occupied.

There are three areas though, to which we have to pay close attention:
  • Linux data may be less compressible than general MVS or VM data.
  • Space usage needs to be continually monitored as Linux does not have the software interface for free space collection, all formatted tracks are recorded on the RVA, whether they are "empty" or not, and removed data is not erased but merely requeued in the file system.
  • Error notification for RVA specific events is minimal, especially to LPAR Linuxes when the RVA is 90% full.
I. Data Compression on the RVA

The RVA achieves some economy by compressing data before it is recorded on the actual backend disk. Hardware data compression reduces the logical data to a smaller physical data package for recording.

Hardware timing gaps found on physical disk images are eliminated. ECKD has a lot of space "wasted" on the disk with these timing gaps. The smaller the blocks written, the more "waste" there is.

A record on disk would look like:
R0 defines the length of the trackfile
Rn is the length of the key/data section
Key is the data key (only written for keyed data).
Data is the data block, which can be variable in size
... are physical gaps on the track to allow for rotational delay

Then keyed data would be recorded on an ECKD device as
	R0 ... R1 ... Key ... Data ... R2 ... Key ... Data

	but recorded on the RVA as

	R0R1KeyDataR2KeyData
      
Non-keyed data would be recorded as
	R0 ... R1 ... Data ... R2 ... Data ... R3 ... Data

	but be recorded as

	R0R1DataR2DataR3Data
RVA Data Compression for Linux

For Linux, the RVA compression may be less effective than other OS.

The capacity of a 3390-3 logical image is 2.29 GB when preformatted in 4k blocks.
           12      blocks/track
           15      tracks/cylinder
        3,339      cylinders/vol
        = 601,020  4k blocks per volume
        = 2.29     GB capacity
All this Linux preformatting causes the tracks to be non-empty. They occupy 102.6 MB of space on the real disk, even though Linux reports the volumes to be empty. (RVA usage was obtained with the IXFP SIBBATCH LISTCFG DEV report on an MVS system that can access the RVA).

So, 256 "empty" Linux volumes require 25.7 GB of real disk space or a net capacity load (NCL) of 10.9% on an RVA with 235.76 GB real disk storage.

In addition, a Linux file system, such as the standard Linux ext2 has file system overhead. You can use the
	tune2fs -l /dev/dasd/a1 
to see this overhead. Basically, it shows
	601,017  4k blocks  3 blocks are lost to overhead)
	30,050 	 blocks are reserved
	570,967  blocks for user data and file system overhead
	or 2.17  GB useable capacity  
Linux data provides fewer opportunities for compression than much of the MVS and VM data. Often, records are padded with blanks or other recurring data, which the RVA can squeeze out. Under Linux, a tab character frequently defines multiple blanks and the record is terminated by a character rather than fixed length padding. The compression would be less effective, much as binary data is less compressible than text data for MVS or VM.

For example, the SuSE SLES7 IPL volume is not very compressible. With KDE and networking, it is typically on the order of 1.3 GB and occupies 751.2 MB on the backend, or a compression ratio of 1.7 to one.

To take this a bit further, if you split the IPL volume into 3 such that the /var directory is on one volume, the /usr directory is on a second, and the rest remain on the IPL volume, you would see something like
	directory	reported	recorded	compression
			size		size		ratio
	/		314.1 MB	255.9 MB	1.23 to 1
	/usr		790.0 MB	438.8 MB	1.80 to 1
	/var		201.1 MB	259.6 MB	.77 to 1
The last result is a bit surprising as "df" shows the volume to be about 15% full. However, much of the volume is empty and an "empty" volume is 102.6 MB. There is more of the space reported for that volume occupied by zero tracks than by recorded data.

II. Space Usage

The IXFP software communicates between MVS and the RVA so that tracks that are empty need not be written, when a track is "deleted" for the OS VTOC, the track no longer occupies space on the RVA, and there is periodic space collection initiated from MVS. Since an empty track is not written, there additional savings by reducing the ECKD overhead. There is similar software for VM.

Since the RVA software does not "know" about the Linux disk format and the Linux file systems, when a file is removed, no real space is given back. The file system changes the lists or queues the blocks are in, but the blocks are not rewritten, and the available block queue can account for considerable real space occupied on the RVA. Even with the Compatible Data Layout (CDL), where an OS VTOC is written, all tracks are stored because the VTOC shows the linux area as one large data set with no free space on the volume.

If you were to fill up a volume with some random data, it might show about 1818 MB compressed data on the backend.
file system   1k-blocks  Used Available Use% Mounted on
/dev/dasdd1   2366248     20         0 100% /mnt/3f63
If you removed the files from the volume, Linux df would report
file system   1k-blocks  Used Available Use% Mounted on
/dev/dasdd1   2366248     20   2246028   1% /mnt/3f63
but the IXFP would still show 1818 GB. No space has been returned on the RVA. The Linux file system chains have just been rearranged. You must explicitly write compressible data to the RVA to get space back.

You might use the Linux "dd" command to write a compressible file, then remove the file.
	dd if=/dev/zero of=/mnt/3f63/f1 bs=1k count=2246028
/dev/zero is a device supplied by Linux
/mnt/3f63/1 is the volume you want to compress
bs=1k as the df command blocks are reported as 1k
count=2246028, the available blocks from the df command

Once the file /mnt/3f63/f1 is removed, IXFP would then report about 103.8 MB used on the volume.

One should note that the full volume must be written before the zeroed files are removed or Linux may just reuse the zeroed available blocks and not clear the whole volume.

There does not seem to be any advantage writing blanks over zeroes. They are reported to occupy the same backend space and the write rates for a single stream are very much the same.

The following bash script might be a basis for compressing the free space on a Linux system
#!/bin/bash
# sample script to write zeroes on the end of all
# mounted ext2 volumes then remove file to compress RVA
# volumes.  jlsibley@us.ibm.com
#
# No warranty given or implied by the author or IBM. 
# Use at your own risk
#
# 1) display the local ext2 files	df -l -t ext2 
# 2) remove the heading and root 	| tail +3 
# 3) squeeze out unwanted blanks	| tr -s ' ' 
# 4) use gawk to generate a line for each mounted
# file system of the form
# 
# dd bs=1k count="$4" if=/dev/zero \
#    of="$6"/zeroes;rm "$6"/zeroes"
#
#	where 	$4 is the fourth column (available space)
#		$6 is the sixth column (mount point)
# 
# | gawk -F ' ' '{print "dd bs=1k count="$4" \
#   if=/dev/zero of="$6"/zeroes;rm "$6"/zeroes"}'
#
# 5) execute the script		| /bin/bash
#    you could delete this command if you just want to see the script generated
#    or write it to a file write to a file 
#
# the actual command is a single line of code! (continued over three lines)
df -l -t ext2 | tail +3 | tr -s ' ' | \
     gawk -F ' ' '{print "dd bs=1k count="$4" \
     if=/dev/zero of="$6"/zeroes;rm "$6"/zeroes"}' | /bin/bash
Caution

This technique should be used with caution because other processes may also be writing to the volume and may terminate with a "No space left on device" message before the zeroed file is removed. We are very careful to ensure that no other processes are writing to the volume when we use this technique.

Use at your own risk as no guarantee or warranty of the results is given or implied.

Special Error Notifications

Though the RVA appears to Linux as a 3990 controller, there are some differences. The RVA can return a "long busy" for an I/O request. The usual action is to retry and Linux handles this correctly, though it puts in the /var/log/messages file a message like:
May 31 23:59:00 svlxdbt1 kernel: dasd_erp(3990): /dev/dasda(94:0),3d21@0xabc:Perform logging requested
May 31 23:59:00 svlxdbt1 kernel: dasd:/dev/dasda(94:0),3d21@0xabc:ERP successful

As of this writing, I am unaware of any mechanism under Linux on zSeries to notify Linux when the RVA is over 90% full. Under VM I would assume that VM intercepts the message and displays it to the operator. In an LPAR, there is no adequate warning and the RVA could fill up and stall. For this reason, we frequently monitor the Net Capacity Load (NCL) of the RVA.

Summary

The STK Iceberg/IBM RVA can be used as a storage device under Linux on the IBM s/390-zSeries. It behaves like a 3390/3390 or 3390/3380 device. However, assumptions about data compressibility, space recovery techniques, and space monitoring need to be rethought and revised.

Disclaimer:

All results were obtained on a SuSE SLES7 patchlevel1 Linux system, IBM 9672-G6 processor in LPAR mode, and an RVA x82 with 512 3390-3 volumes and a real capacity of 236 GB. Results may vary from installation to installation.

Jim Sibley
07/18/2002
jlsibley@us.ibm.com

 

Site hosting courtesy of Velocity Software