Tux on VM

Last updated on:
Sunday, July 06, 2008

Software

Information

Community

News

Fun

Credits




Valid HTML 4.01!
Linux for Big Iron

LVM Volume Group Descriptor Area (VGDA) Recovery

This information was originally posted to the Linux-390 mailing list on January 16, 2005, by Peter Abresch.

We ran into an interesting situation. We suddenly lost access to our DASD due to human error. This corrupted the reiser file systems that we had defined under Logical Volume Manager (lvm). After running reiserfsck, we still had corruption of the Volume Group Descriptor Area (VGDA). Our symptom was that /dev/linuxd01/opt and /dev/linuxd01/tmp were the same file system.

An excerpt from the SuSE LVM white paper available at http://www.novell.com/products/linuxenterpriseserver8/whitepapers/LVM.pdf states:
The volume group descriptor area (or VGDA) holds the metadata of the LVM configuration. It is stored at the beginning of each physical volume. It contains four parts: one PV descriptor, one VG descriptor, the LV descriptors and several PE descriptors. LE descriptors are derived from the PE ones at activation time. Automatic backups of the VGDA are stored in files in /etc/lvmconf/ (please see the commands vgcfgbackup/vgcfgrestore too). Take care to include these files in your regular (tape) backups as well.

LVM will backup the configuration automatically when modifications are made to the LVM configuration. The corruption was on the physical volume somewhere. The goal was to recover the VGDA and not lose any data that was residing on the logical volume (LV). Here is what I did:
  1. Identify that corruption exists. As stated before, our symptom was that /dev/linuxd01/opt and /dev/linuxd01/tmp were the same file system. For the rest of this document, assume linuxd01 is the volume group name. You can substitute your own volume group name as appropriate.

    ls -l /dev/linuxd01
    crw-r-----    1 root     disk     109,   2 2005-01-16 15:49 group
    brw-rw----    1 root     disk      58,   1 2005-01-16 15:49 home
    brw-rw----    1 root     disk      58,   2 2005-01-16 15:49 local
    brw-rw----    1 root     disk      58,   0 2005-01-16 15:49 opt
    brw-rw----    1 root     disk      58,   0 2005-01-16 15:49 tmp
    brw-rw----    1 root     disk      58,   3 2005-01-16 15:49 var
    
    As you can see, opt and tmp shared the same minor number of 0. This obviously is incorrect. One of them is wrong.

  2. Verify which one is incorrect. We issued the following command to list the lvm backups:

    ls -l /etc/lvmconf
    -rw-r----- 1 root root 183168 2005-01-16 16:20 linuxd01.conf
    -rw-r----- 1 root root 183168 2004-11-30 16:37 linuxd01.conf.1.old
    -rw-r----- 1 root root 182368 2004-11-28 09:52 linuxd01.conf.2.old
    -rw-r----- 1 root root 183168 2004-11-27 09:07 linuxd01.conf.3.old
    -rw-r----- 1 root root 182368 2004-07-29 12:33 linuxd01.conf.4.old
    -rw-r----- 1 root root 180828 2004-07-29 12:19 linuxd01.conf.5.old
    -rw-r----- 1 root root 176312 2004-07-29 12:16 linuxd01.conf.6.old
    -rw-r----- 1 root root 172604 2004-07-28 10:34 linuxd01.conf.7.old
    -rw-r----- 1 root root 158664 2004-07-22 10:42 linuxd01.conf.8.old
    -rw-r----- 1 root root 150580 2004-07-22 10:13 linuxd01.conf.9.old
    
    linuxd01.conf is the most recent and the one actually in use.

  3. List the most recent backup and verify the information, paying particular attention to the "Block device" line under the logical volume section.

    vgcfgrestore -f /mnt/etc/lvmconf/linuxd01.conf -n linuxd01 -ll
    vgcfgrestore -- this is a backup of volume group "linuxd01"
    --- Volume group ---
    VG Name               linuxd01
    VG Access             read/write
    VG Status             NOT available/resizable
    VG #                  2
    MAX LV                256
    Cur LV                5
    Open LV               0
    MAX LV Size           255.99 GB
    Max PV                256
    Cur PV                3
    Act PV                3
    VG Size               6.86 GB
    PE Size               4 MB
    Total PE              1755
    Alloc PE / Size       1572 / 6.14 GB
    Free  PE / Size       183 / 732 MB
    VG UUID               ehREE0-MrMY-QyvP-PezE-RBbz-FDk8-noY8D4
    
    --- Logical volume ---
    LV Name                /dev/linuxd01/opt
    VG Name                linuxd01
    LV Write Access        read/write
    LV Status              available
    LV #                   1
    # open                 0
    LV Size                1.97 GB
    Current LE             505
    Allocated LE           505
    Allocation             next free
    Read ahead sectors     1024
    Block device           58:0
    
    --- Logical volume ---
    LV Name                /dev/linuxd01/home
    VG Name                linuxd01
    LV Write Access        read/write
    LV Status              available
    LV #                   2
    # open                 0
    LV Size                2.20 GB
    Current LE             564
    Allocated LE           564
    Allocation             next free
    Read ahead sectors     1024
    Block device           58:1
    
    --- Logical volume ---
    LV Name                /dev/linuxd01/local
    VG Name                linuxd01
    LV Write Access        read/write
    LV Status              available
    LV #                   3
    # open                 0
    LV Size                300 MB
    Current LE             75
    Allocated LE           75
    Allocation             next free
    Read ahead sectors     1024
    Block device           58:2
    
    --- Logical volume ---
    LV Name                /dev/linuxd01/var
    VG Name                linuxd01
    LV Write Access        read/write
    LV Status              available
    LV #                   4
    # open                 0
    LV Size                1.10 GB
    Current LE             282
    Allocated LE           282
    Allocation             next free
    Read ahead sectors     1024
    Block device           58:3
    
    --- Logical volume ---
    LV Name                /dev/linuxd01/tmp
    VG Name                linuxd01
    LV Write Access        read/write
    LV Status              available
    LV #                   5
    # open                 0
    LV Size                584 MB
    Current LE             146
    Allocated LE           146
    Allocation             next free
    Read ahead sectors     1024
    Block device           58:4
    
    
    --- Physical volume ---
    PV Name               /dev/dasdd1
    VG Name               linuxd01
    PV Size               2.29 GB [4807968 secs] / NOT usable 4.19 MB [LVM: 130 KB]
    PV#                   1
    PV Status             available
    Allocatable           yes (but full)
    Cur LV                2
    PE Size (KByte)       4096
    Total PE              585
    Free PE               0
    Allocated PE          585
    PV UUID               L3fZI4-8owE-rz5O-CKUq-FcEC-6BvK-9SWH5K
    
    --- Physical volume ---
    PV Name               /dev/dasde1
    VG Name               linuxd01
    PV Size               2.29 GB [4807968 secs] / NOT usable 4.19 MB [LVM: 130 KB]
    PV#                   2
    PV Status             available
    Allocatable           yes (but full)
    Cur LV                3
    PE Size (KByte)       4096
    Total PE              585
    Free PE               0
    Allocated PE          585
    PV UUID               v7cc4b-5vxP-yUFZ-xTyY-UjBw-7se4-62UDkn
    
    --- Physical volume ---
    PV Name               /dev/dasdc1
    VG Name               linuxd01
    PV Size               2.29 GB [4807968 secs] / NOT usable 4.19 MB [LVM: 130 KB]
    PV#                   3
    PV Status             available
    Allocatable           yes
    Cur LV                2
    PE Size (KByte)       4096
    Total PE              585
    Free PE               183
    Allocated PE          402
    PV UUID               1yy4v8-0XDU-3uJ8-ssmS-7kaJ-MBDx-gR3FBr
    
    As you can see, /dev/linuxd01/opt had a block device 58:0 and /dev/linuxd01/tmp had a block device 58:4. This is different from what was listed in step 1.
    brw-rw----    1 root     disk      58,   0 2005-01-16 15:49 opt
    brw-rw----    1 root     disk      58,   0 2005-01-16 15:49 tmp
    
    /dev/linuxd01/tmp should be 58:4. We decided to move forward with the VGDA restore of the physical volume that contained /dev/linuxd01/tmp.

  4. Issue the following command to identify the physical volumes (PVs) that make up the logcal volumes.

    lvdisplay -v /dev/linuxd01/tmp

    output similar to the following should be displayed:
    .
    .
    --- Distribution of logical volume on 1 physical volume ---
    PV Name PE on PV reads writes
    /dev/dasdc1 146 9755 2486
    .
    .
    .
    
    This identifies that logical volume /dev/linuxd01/tmp only resides on /dev/dasdc1. The VGDA on physical volume /dev/dasdc1 will need to be restored.

  5. Identify the UCB addresses (device numbers).

    cat /proc/dasd/devices | grep /dev/dasdc1

    output similar to the following should be displayed:
    232d(ECKD) at ( 94: 8) is dasdc : active at blocksize: 4096, 601020 blocks, 2347 MB
    
    which reveals that the necessary UCB address is 232d. We also made note of the root device from our /etc/zipl.conf parameters="dasd=232b-232e,232a root=/dev/dasda1" which was UCB 232b.

    Be aware that what is in /etc/zipl.conf may not reflect the parameters that were passed to the Linux system for the current IPL. You can verify if they are the same by doing a cat /proc/cmdline command.

  6. We chose to correct this problem under our recovery Linux system. Depending on the logical volume needing recovery, this might not be necessary. We could have
    • gone into single user mode,
    • unmounted all the file systems under volume group linuxd01, and then
    • deactivated the linuxd01 volume group using vgchange -an linuxd01.
    However, we chose to err on the side of caution and shut down our Linux at this point and booted our emergency recovery Linux.

  7. Log on to your emergency Linux and issue the following commands to gain access to your DASD
    modprobe dasd_mod dasd=2320-232f dasd_disciplines=dasd_eckd_mod
    echo "add device range=232b" >> /proc/dasd/devices
    echo "add device range=232d" >> /proc/dasd/devices
    
    If you look carefully, you will see that the modprobe command doesn't exactly match the parameters that were found in /etc/zipl.conf. This will have consequences that are seen in a couple of steps.

  8. Mounted your original root file system. This is necessary because the recovery Linux does not contain any LVM commands and the VGDA backups are on your root file system anyway. We issued the following commands to gain access to LVM commands.
    mount /dev/dasda1 /mnt
    export PATH=/mnt/sbin:$PATH
    cp /mnt/lib/liblvm-10.so.1 /lib
    vgscan
    
    The vgscan command is a simple test that should discover the volume group on the one volume as follows:
    vgscan -- reading all physical volumes (this may take a while...)
    vgscan -- found inactive volume group "linuxd01"
    vgscan -- "/etc/lvmtab" and "/etc/lvmtab.d" successfully created
    vgscan -- WARNING: This program does not do a VGDA backup of your volume group
    
  9. From the Linux recovery system, identify the old physical path and the new physical path. If the parameters to the DASD driver had been identical to what was in /etc/zipl.conf, these would be the same. Since we did not, remember that the old physical path is /dev/dasdc1 on UCB 232d as identified in previous steps. However, a cat /proc/dasd/devices under the Linux recovery system reveals that UCB 232d is now /dev/dasdb1. This is the new physical path. The restore can be performed using the following command:
    
    vgcfgrestore -f /mnt/etc/lvmconf/linuxd01.conf -o /dev/dasdc1 /dev/dasdb1
    
    
    Once vgcfgrestore is completed, the Linux recovery system can be shutdown and your production Linux rebooted. You'll know rather quickly if /opt and /tmp are correct. However, for peace of mind, you can confirm this:

    ls -l /dev/linuxd01
    crw-r-----    1 root     disk     109,   2 2005-01-16 15:49 group
    brw-rw----    1 root     disk      58,   1 2005-01-16 15:49 home
    brw-rw----    1 root     disk      58,   2 2005-01-16 15:49 local
    brw-rw----    1 root     disk      58,   0 2005-01-16 15:49 opt
    brw-rw----    1 root     disk      58,   4 2005-01-16 15:49 tmp
    brw-rw----    1 root     disk      58,   3 2005-01-16 15:49 var
    
  10. Drink beer, we deserve it. :)
 

Site hosting courtesy of Velocity Software