Tux on VM

Last updated on:
Sunday, July 06, 2008

Software

Information

Community

News

Fun

Credits




Valid HTML 4.01!
Linux for Big Iron

Using BIND Mounts to Create A Simplified BaseVol/GuestVol Linux Server for SUSE Linux Enterprise Server 9

The information in this document is based on the concept provided in the IBM Redbook Linux on IBM eServer zSeries and S/390: Large Scale Linux Deployment, SG24-6824-00. This is a simpler procedure to follow to create a base-volume/guest-volume Linux for zSeries server which exploits the mount command with the bind option, introduced into the kernel at the 2.4 level. The procedure here was developed on a SUSE SLES 9 system (2.6 kernel), 64-bit, at the GA level.

This document last updated on 24 October 2005 by William Scully, Senior Systems Programmer, Computer Associates. Assistance for the SLES 9 design was provided by Robert Yeung, Systems Programmer for Visa.

Product, company, and service names referenced in this document may be copyright, trademark, or service marks of others. The information provided herein is provided as is and has not been subjected to formal testing by Computer Associates. Use this information at your own risk and only after you test in your environment.

Overview

The plan is to create a two-pack system for each Linux system.

  • The first pack is the so-called base volume, or basevol. Among these materials are those which are rarely changed and can be shared read-only among several servers simultaniously. This device is owned by the so-called base-server, or, in the examples below, VM userid LINXBASE.
  • The second volume is the so-called guest volume or guestvol. These are the materials which are unique to each Linux server and in directories which must be read/write. The VM userid used in the examples below is LNXGUEST.

The goal is to place as many programs (both the kernel code and vendor-supplied packages) as possible in the base volume so that it is leveraged over and over again. The remaining guest volume read/write disk space given to a typical Linux server can be far smaller as a result. Also, because fixes applied to packages installed onto the read-only disk are used by all servers, the maintenance burden is substantially decreased.

To implement this scheme requires only a modest change in the way a Linux server is created. Overall, the approach is to prevent the operating system from mounting the base volume materials in read/write mode during the boot process. This is easily accomplished because:

  • When the Linux boot process starts all directories are in read-only mode.
  • As the boot process continues, vendor-supplied scripts remount Linux directories in read/write mode, as needed.

It is easy to change the vendor-supplied boot-time scripts to avoid the read/write mounting of the Linux materials. It is also easy to create new boot scripts which mount in read/write mode only those directories which must be read/write. Thus we can easily pick and choose which directories (and subdirectories) are read-only and which are read/write. It is the use of bind-type mount commands which allow specific (sub-)directories to be R/O or R/W. The Redbook reference above includes a useful description of how bind mounts work. In particular you may want to read Sections 8.1 through 8.8.

Note that an objective is to hide, as much as possible, the fact that the server is using bind mounts. Using this implementation users who issue mount -l or cat /etc/mtab will see the same results as if the R/O server was booted with R/W directories. Only if the user issues a cat /proc/mounts will they see the details of what directories are mounted using the bind option.

Planning

Read-only and Read/Write Directories

The directories to be designated as read-only or read/write are selected based on commonly followed standards used by Linux. Please refer to Filesystem Hierarchy Standard for additional information.

The directories which will be included in the read-only base volume are:

Directory Purpose
/bin Linux's essential general-user commands are found here.
/boot Linux's kernel configuration files are found here.
/lib & /lib64 Libraries for Linux and installed packages are found here.
/lost+found The location of erased files. This directory is R/O.
/mnt This directory is normally empty but is commonly used as a temporary mount-point. This directory is R/O.
/opt Common packages, such as KDE and GNOME are found here. Note that /opt/local is available R/W to the guest.
/sbin Linux's essential privledged commands are found here.
/usr Applications are typically found here, in particular in /usr/bin. Note that /usr/local is available R/W to the guest.
/var/rpm Kept R/O to ensure GIS can properly log application of all maintenance

Of special note is the directory in which is kept the Red Hat Package Manager (RPM) data base. This is typically located in /var/lib/rpm. This directory is kept in R/O mode to prevent any attempt at using the RPM command from working. It is to be understood that all package or kernel maintenance which requires the use of the RPM command be performed by MIS. However, since any user can create an RPM data base and use the RPM command, we cannot stop someone from trying to circumvent this policy. But since the production RPM data base is on a R/O device, there's no chance MIS's data will be adversely affected.

The directories which will be included in the read/write guest volume are:

Directory Purpose
/dev The location of special or device files.
/etc Many configuration files are found here.
/home User's home directories are found in this directory.
/var Logs and mail are typically found here.
/root This is root's home directory.
/srv New for United Linux, web server documents
/tmp Temporary files are commonly written here. No materials in this directory need be kept when a system is booted.

There are a few other special-case or locally-defined directories as well:

Directory Purpose
/proc & /sys Kernel-created process and system information virtual file systems.
/opt/local Available for users to install packages. This directory is R/W.
/usr/local Available for users to install packages. This directory is R/W.
/basevol This is the mount-point for the bind mounts used for this implementation. This directory is R/O.
/guestvol This is the mount-point for the bind mounts used for this implementation. This directory is R/O.

Base Server Attributes

The Linux servers which are going to share materials were tested using the following attributes:

  • A temporary disk is used for swap space and is located at virtual address 200. The size of this disk depends entirely on your workload but you may want to start at 40 cylinders and adjust as needed.
  • The read-only base-volume root file system ("/" not "/root") is located at virtual address 201 and be known as /dev/dasdb1. The format of the base-volume is EXT2. Do not use a journalling file system; it is not needed since this disk will remain in R/O mode, except when the kernel or applications are being serviced by the systems programmer. The size of the disk should be 3338 3390 cylinders.
  • The read/write guest-volume file systems must be located at virtual address 202 and be known as /dev/dasdc1. The format of the guest-volume is EXT3. The size of the disk should be at least 150 3390 cylinders. Since end-users home directories are located on this disk you may wish to substantially increase the size of this disk from the suggested minimum.
  • Additional space added to the server should be found in subdirectories located off /spacenn. These disks should be in virtual address range 203 and above, and be known as /dev/dasdd1 through whatever.
  • Networking is supported as Virtual Switch Virtual Switch at C00-C02 with portname INTRANET.
  • On the 191 A-disk, create file PROFILE EXEC to IPL the Linux 201 boot-disk.

Given the above, a typical directory entry for the Linux base-volume server should appear similar to:

USER LINXBASE password 64M 512M G
* You may need up to 512M to install SLES 9 although SLES 9 will run in 64M
ACCOUNT acntcode distcode
MACHINE ESA
IPL CMS
CONSOLE 009 3215 T OPERATOR
SPOOL 00C 2540 READER *
SPOOL 00D 2540 PUNCH A
SPOOL 00E 1403 A
LINK MAINT 190 190 RR
LINK MAINT 19E 19E RR
LINK MAINT 19D 19D RR
NICDEF C00 TYPE QDIO DEVICES 3 LAN SYSTEM INTRANET
MDISK 191 3390 start-cyl 1 MR
MDISK 200 3390 T-DISK cyls
MDISK 201 3390 start-cyl 3338 MR
MDISK 202 3390 start-cyl  150 MR

Installation of a Base Server

At this point you're ready to create the "model" server, LINXBASE, which will hold the materials to be shared by several Linux servers. The disks owned by this model server are called the golden base volumes.

Keeping the above requirements in mind, you must now conduct a SLES 9 installation. Conduct a completely normal installation, following SUSE's instructions. The SLES 9 "Default System" installation can be used as the "Software Selection". If there are packages which you do or do not want installed ensure they are selected now. You may find it more convenient to install all the packages that any of your Linux servers may use. Remember, installing a package doesn't necessarily mean that any particular Linux server must be configured to utilize it.

Note: As always, remember the logon password for root. This password will be needed each time a guest server is cloned using the base server's materials.

Keep in mind that it is important to be careful not to make any unnecessary changes to the default installation. This is because the so-called "golden base volumes" will be replicated and used by several Linux servers. You should make only those changes which are to be used by all Linux servers. Any customization changes needed by a single Linux server can (and will) be made later, on that server's private (that is, non-shared) disk(s).

Post-Installation Modifications

After the standard SUSE installation is complete, a few minor local modifications are needed and are discussed below.

  • The following additional packages must be installed:
    • The cpint package, to allow Linux to communicate with the VM Control Program. This package is used to determine the read-only or read/write status of disks and is available as part of the standard SLES 9 distribution.
    • The Regina package adds the Rexx scripting language to Linux. The basevol/guestvol tools are written in Rexx. (Although SUSE does not include Regina with the SLES 9 distribution you can continue to use the installation RPM from the SLES 8 CDs. You may also find Regina at Freshmeat.)
  • If you want to use Virtual Disks (that is, FBA) for swap space you may need to update /etc/sysconfig/kernel to include the record:
    INITRD_MODULES="dasd_fba_mod"
    
    After any update to this file you will need to issue the mkinitrd and zipl commands.
  • Create several new mount-points, to be used later:
    mkdir /guestvol
    mkdir /basevol
    mkdir /basevol/var
    mkdir /opt/local
    mkdir /usr/local
    
    Directory /usr/local may already exist, depending on the packages you install.
  • Ensure file /etc/fstab includes records similar to:
    /dev/dasda1          swap                 swap       pri=42                0 0
    /dev/dasdb1          /                    ext2       acl,user_xattr        1 1
    devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
    proc                 /proc                proc       defaults              0 0
    sysfs                /sys                 sysfs      noauto                0 0
    # /dev/dasdc1        reserved for MIS use; do not use!
    # /dev/dasdd1        is the first available for your use... .
    
    Ensure that device /dev/dasdc1 is not mounted in /etc/fstab.
  • Because swap space is on z/VM temporary disk space the device is unformatted at boot time. For this reason the swap space will be formatted and activated via /etc/rc.d/boot.local. A typical /etc/rc.d/boot.local should include the following statements:
    #! /bin/sh
    
    # Enable the so-called "timer patch", for performance under z/VM
    sysctl -w kernel/hz_timer=0
    
    # Create and activate TDisk-based swap space for the server
    mkswap /dev/dasda1
    swapon -a
    
  • Make any final changes which you want reflected on all your Linux servers. For example, you may want to display a standard greeting to be shown to all users when they log onto Linux. In this case put that message in file /etc/motd now. You may also want to create a personal userid for yourself (or your peers or the staff in Operations) to allow you to gain access to the server at a future date.

Servicing the Base Server

Service the base server as normal.

Conversion to Basevol/Guestvol

At this point in these instructions you have created a standard Linux server, albeit with a few minor changes to support running Linux under z/VM. In the next steps you will alter this standard configuration to support running in a basevol/guestvol mode.

Create Supporting Boot Script

SUSE provides several scripts used at boot-time to prepare the server for operation. For the purposes of a basevol/guestvol implementation one additional boot script, shown below, is required, to be called boot.guestvol:

#!/usr/bin/regina
/*---------------------------------------------------------------------+
| Check if we're running on the so-called "model" Linux server or if   |
| we're running on a Linux server using the model's disks R/O.  If we  |
| are running on the model then we completely skip this tool so all    |
| the "standard" scripts run unmodified.  If we're running on a userid |
| which has the 201 disk in R/O mode then this tool's logic is used to |
| set up the server correctly.                                         |
+---------------------------------------------------------------------*/
Trace Normal

Say ''
Say 'Basevol/Guestvol script begins... .'

/*---------------------------------------------------------------------+
| Is the 201 disk R/W?  If so then this must be the systems programmer |
| logging onto the Linux server which manages the "golden" materials.  |
| Otherwise this must be one of the many Linux servers who are using   |
| these shared materials in R/O mode.  Bring up the server accordingly |
+---------------------------------------------------------------------*/
save_rc = Popen( 'hcp QUERY VIRTUAL 0201', rec. )
Select
When Word( rec.1, 5 ) = 'R/W'
Then Do
     Say 'Welcome to the Linux disk model userid!'
     Say 'All directories will be in R/W mode.'
     End
When Word( rec.1, 5 ) = 'R/O'
Then Do
     Say 'This server is running with shared disk support.'
     Say 'Many directories will be in R/O mode.'
     Call Do_It
     End
Otherwise Do
          Say 'Unexpected results checking disk status.'
          Say 'By default all directories will be in R/W mode.'
          End
End

Say 'Basevol/Guestvol script ends.'
Say ''

Exit 0


Do_It: /*--------------------------------------------------------------+
| Do what's needed to prepare and mount the various directories in R/O |
| or R/W mode                                                          |
+---------------------------------------------------------------------*/
Procedure

Say 'Forcing R/O mount of root file systems... .'
'mount -n -o remount,ro /' /* no update /etc/mtab, option remount R/O */

Say 'Forcing a filesystem check on R/W guest volume... .'
'e2fsck /dev/dasdc1'

Say 'Mounting R/W guest volume... .'
'mount -w -t ext3 -n /dev/dasdc1 /guestvol' /* R/W, type EXT3, no update /etc/mtab */

Say 'Temporarily enabling directory /etc from guest volume... .'
'mount -w -t ext3 --bind /guestvol/etc  /etc'

Say 'Discarding obsolete mtab file... .'
'rm -f /etc/mtab*'

Say 'Creating proper /etc/mtab file... .'
'mount -f /' /* -f is "fake", to add entries for devices mounted earlier with -n */

Say 'Overlapping R/O mount points with R/W materials... .'
'mount -w -t ext3 --bind /guestvol/dev       /dev'

Say 'Remounting /dev/pts... .'
'mount -t devpts devpts /dev/pts'

Say 'Continuing the overlapping mounts... .'
'mount -w -t ext3 --bind /guestvol/home      /home'
'mount -w -t ext3 --bind /guestvol/root      /root'
'mount -w -t ext3 --bind /guestvol/srv       /srv'
'mount -w -t ext3 --bind /guestvol/tmp       /tmp'
'mount -w -t ext3 --bind /guestvol/opt/local /opt/local'
'mount -w -t ext3 --bind /guestvol/usr/local /usr/local'

/* required, in this order, to force /var/lib/rpm to be R/O */
'mount -r -t ext3 --bind /var                 /basevol/var'
'mount -w -t ext3 --bind /guestvol/var        /var'
'mount -r -t ext3 --bind /basevol/var/lib/rpm /var/lib/rpm'

Return

This locally-written boot script must be made executable and visible to its owner, root, alone:

chmod u+rwx  /etc/rc.d/boot.guestvol
chmod go-rwx /etc/rc.d/boot.guestvol

Enabling Basevol/Guestvol

Above you created a boot-time script. The following steps will enable its execution.

  • Issue the following command to create a symbolic link in /etc/init.d/boot.d pointing to your script:
    ln -s /etc/rc.d/boot.guestvol /etc/init.d/boot.d/S03boot.guestvol
    
  • For clarity, rename one SUSE-provided boot script to make the run-order unambigous:
    mv /etc/rc.d/boot.d/S03boot.rootfsck /etc/rc.d/boot.d/S04boot.rootfsck
    
  • Create a useful tool to copy the read/write directories from the boot device. Called /etc/rc.d/make-guestvol, the script should appear similar to:
    #!/usr/bin/regina
    Trace Normal
    
    Say 'Make-GuestVol begins... .'
    
    'umount /guestvol'
    
    'mke2fs -j -b 4096 /dev/dasdc1'
    If RC <> 0 Then Exit RC
    
    'mount /dev/dasdc1 /guestvol'
    If RC <> 0 Then Exit RC
    
    dirs = '/dev /etc /home /var /root /srv /tmp /opt/local /usr/local'
    
    Do i = 1 To Words( dirs )
       dir = Subword( dirs, i, 1 )
       Say 'Copying 'dir' to guest volume... .'
       'tar -clpSf - 'dir' | (cd /guestvol ; tar -xpSf - )'
       If RC <> 0 Then Exit RC
    End i
    
    /*---------------------------------------------------------------------+
    | Discard from the copy of /etc/rc.d (that is, /guestvol/etc/rc.d) the |
    | SUSE-provided boot script boot.rootfsck. This is because when using  |
    | a R/O root file system there is no need for a file system check      |
    +---------------------------------------------------------------------*/
    'rm /guestvol/etc/rc.d/boot.d/S04boot.rootfsck'
    If RC <> 0 Then Exit RC
    
    Say 'Make-GuestVol ends.'
    
    Exit 0
    

    Ensure that only root can see and execute this tool:

    chmod u+rwx  /etc/rc.d/make-guestvol
    chmod go-rwx /etc/rc.d/make-guestvol
    
  • Recall that the objective is to keep the boot device in read-only mode with a small subset of all the directories read/write, located on a disk separate from the boot device. To create this small subset of directories issue the command:
    /etc/rc.d/make-guestvol
    
    The response is similar to:
    Make-GuestVol begins... .
    umount: /guestvol: not mounted
         6 *-* 'umount /guestvol'
           +++ RC=1 +++
    mke2fs 1.34 (25-Jul-2003)
    Filesystem label=
    OS type: Linux
    Block size=4096 (log=2)
    Fragment size=4096 (log=2)
    54016 inodes, 53997 blocks
    2699 blocks (5.00%) reserved for the super user
    First data block=0
    2 block groups
    32768 blocks per group, 32768 fragments per group
    27008 inodes per group
    Superblock backups stored on blocks:
            32768
    
    Writing inode tables: done
    Creating journal (4096 blocks): done
    Writing superblocks and filesystem accounting information: done
    
    This filesystem will be automatically checked every 38 mounts or
    180 days, whichever comes first.  Use tune2fs -c or -i to override.
    Copying /dev to guest volume... .
    tar: Removing leading `/' from member names
    tar: /dev/log: socket ignored
    Copying /etc to guest volume... .
    tar: Removing leading `/' from member names
    Copying /home to guest volume... .
    tar: Removing leading `/' from member names
    Copying /var to guest volume... .
    tar: Removing leading `/' from member names
    tar: /var/lib/ntp/dev/log: socket ignored
    tar: /var/run/.resmgr_socket: socket ignored
    tar: /var/run/.nscd_socket: socket ignored
    tar: /var/spool/postfix/private/rewrite: socket ignored
    tar: /var/spool/postfix/private/bounce: socket ignored
    tar: /var/spool/postfix/private/defer: socket ignored
    tar: /var/spool/postfix/private/trace: socket ignored
    tar: /var/spool/postfix/private/verify: socket ignored
    tar: /var/spool/postfix/private/proxymap: socket ignored
    tar: /var/spool/postfix/private/smtp: socket ignored
    tar: /var/spool/postfix/private/relay: socket ignored
    tar: /var/spool/postfix/private/error: socket ignored
    tar: /var/spool/postfix/private/local: socket ignored
    tar: /var/spool/postfix/private/virtual: socket ignored
    tar: /var/spool/postfix/private/lmtp: socket ignored
    tar: /var/spool/postfix/private/anvil: socket ignored
    tar: /var/spool/postfix/private/maildrop: socket ignored
    tar: /var/spool/postfix/private/cyrus: socket ignored
    tar: /var/spool/postfix/private/uucp: socket ignored
    tar: /var/spool/postfix/private/ifmail: socket ignored
    tar: /var/spool/postfix/private/bsmtp: socket ignored
    tar: /var/spool/postfix/private/vscan: socket ignored
    tar: /var/spool/postfix/private/procmail: socket ignored
    tar: /var/spool/postfix/public/cleanup: socket ignored
    tar: /var/spool/postfix/public/flush: socket ignored
    tar: /var/spool/postfix/public/showq: socket ignored
    Copying /root to guest volume... .
    tar: Removing leading `/' from member names
    Copying /srv to guest volume... .
    tar: Removing leading `/' from member names
    Copying /tmp to guest volume... .
    tar: Removing leading `/' from member names
    Copying /opt/local to guest volume... .
    tar: Removing leading `/' from member names
    Copying /usr/local to guest volume... .
    tar: Removing leading `/' from member names
    Make-GuestVol ends.
    

    If you review the logic of the Rexx tool, shown above, you'll see that the tool:

    • Mounts and formats the guest volume (minidisk 202, also known as /dev/dasdc1)
    • Copies the R/W directories from the base volume (minidisk 201, also known as /dev/dasdb1) to the guest volume
    • Discards the SUSE-provided script which on a conventional system does a file system check. This check is never needed when the Linux guest ID never has the root file system in R/W mode.

At this point you have completed the installation of Linux on the base server, LINXBASE, which owns the golden base volume. The guest server(s) can now be created and can utilize the DASD in shared R/O mode.

Implementation of a Guest Server

Now that the base volume has been created it is possible to exploit it. To do so create a Linux server which will utilize the read-only boot device, as well a its own copy of the necessary read/write materials. The steps to follow are detailed below. You must perform each of these steps every time you create a new Linux server:

  • Create or update the server so that the VM directory entry matches the supported configuration:
    USER LNXGUEST password 64M 256M G
    * 64M is acceptable to run SLES8.
    ACCOUNT acntcode distcode
    MACHINE ESA
    IPL CMS
    CONSOLE 009 3215 T OPERATOR
    SPOOL 00C 2540 READER *
    SPOOL 00D 2540 PUNCH A
    SPOOL 00E 1403 A
    LINK MAINT 190 190 RR
    LINK MAINT 19E 19E RR
    LINK MAINT 19D 19D RR
    NICDEF C00 TYPE QDIO DEVICES 3 LAN SYSTEM INTRANET
    MDISK 191 3390 start-cyl 1 MR
    * dsk 200 Swap          409600 512-byte blocks = 200M
    MDISK 200 FB-512 V-DISK 409600  MR
    * Link to "golden" Linux base-volume MUST be in R/O mode ALWAYS
    LINK LINXBASE 201 201 RR
    MDISK 202 3390 start-cyl 150 MR
    

    The server will find the kernel and the majority of its applications on the read-only 201 disk. Since these materials are never updated by the guest server a large number of Linux servers can share the materials on this disk, saving DASD for more valuable uses.

    The size of the 202 disk must match the size of the 202 disk which was used to create the model server. In the example shown here the 202 disk is the minimum size, 150 cylinders. This is enough space for the basic SLES materials. Any additional space for user home directories is above and beyond this value. You should also understand that the size of this device can be increased, at a later date, if needed, by following procedures documented on the LinuxVM.org web site.

  • Log onto the Linux guest server. On the 191 A-disk, create file PROFILE EXEC to IPL the Linux 201 boot-disk.
  • Recall that during the post-installation step the tool make-guestvol copied several directories to the base-servers's 202 disk. A copy of this disk will be created on the gust server's 202 disk now. Use the CP LINK command to gain access to the model's 202 disk in R/O mode:
    CP LINK LINXBASE 202 1202 RR
    
    Use DDR to copy the model's 202 disk to the new server's 202 disk:
    DDR
    SYSPRINT CONS
    INPUT 1202 DASD
    OUTPUT 202 DASD
    COPY ALL
    
    Detach the model's 202 disk when done:
    CP DETACH 1202
    
  • Run the PROFILE EXEC which will start the new Linux server for the first time.

At this point you're booting a functionally identical copy of the model base-volume server. All the attributes of the model server are still in effect. For example, the IP address and host-name used by the base server are still as configured when Linux was first installed. For this reason the base-volume server (LINXBASE) cannot run simultaniously with the guest-volume server (LNXGUEST).

Final Customizations

The files which Linux uses to uniquely identify this server are located on R/W disk space, and in particular in directory /etc. In the next few steps you'll perform the final customization for the guest server, by changing these files.

  • Allocate an IP address for the new server. This address must be on the same Virtual Switch as the original base-server. (After the server is completely operational you can change the IP address, or even the Virtual Switch used, if you see fit.)
  • Telnet to the IP address or DNS name of the base server. For example: linxbase.ca.com
  • Logon as root. Specify the logon password for root which you used when you first installed Linux onto LINXBASE.
  • Start YaST. Configure the IP address and host-name for the new Linux server. Exit from YaST.
  • Shutdown Linux using the reboot option:
    reboot
    
  • Telnet to the guest Linux server using its proper name. For example: lnxguest.ca.com
  • Logon as root once more. Change root's password:
    passwd
    
    This password you will give to the owner of the server.

Above are the minimal final customizations. However since directories such as /etc are read/write and since virtually all the customization of Linux is controlled by files in /etc, you can, if need be, make each server completely unique.

Note: Should you find a configuration file (or virtually any other file) which resides in a directory on the R/O shared disk space, you can use a Linux symbolic link to redirect that file from its current location to another R/W directory, such at /etc.

 

Site hosting courtesy of Velocity Software