A Method For Deploying Software In a Multi-Architecture Distributed UNIX Environment

A Method For Deploying Software In a Multi-Architecture Distributed UNIX Environment: The Depot System

by John Flynn, flynnj@cs.fiu.edu

Preface:

Introduction

As UNIX system administrators, we frequently find that we have to install numerous software packages, both open-source and proprietary, on the machines we run. When the number of machines is small; for instance, under five, it is relatively simple to install the package individually on each computer, and keep them maintained that way. When the number of machines grows from there, we might find it easier to set up one system as a server, which hosts the software packages, exported via NFS^ⁱ. This works rather well, until the number of machines grows again, and we introduce machines of a different architecture, and the server starts becoming bogged down, or the users complain that they have to type /home/software/pine-4.12/i386-linux/bin/pine just to run pine, or the server’s disk fails, and no one can use any of the software, and….

Wait. Maybe there is a better solution. Why not keep the software distributed among several of the hosts, and use a program that can somehow choose, depending on what architecture the user is currently using, which server to find the appropriate files and binaries on? This paper describes a system for doing this, known as the Depot System.

The Depot System allows a user to simply type the name of the binary he wishes to run, and will automatically mount the appropriate directory for that architecture on the host the software is installed on. The system is transparent; the user doesn’t have to know its there, though it helps her to have some knowledge of the system if one uses it daily. The system as described here was originally implemented at Carnegie Mellon University^ⁱⁱ, and also has small tweaks implemented by the System Administration Group at the Florida International University School of Computer Science.

This paper is divided into five sections. This is what will be covered, in brief:

Audience

This paper is aimed at people who have basic UNIX system administration experience. This includes knowledge of filesystem semantics, NFS, paths, and so on. Experience with the BSD Automounter is also recommended.

Overview of the Depot System’s Implementation

In this section of the paper, a basic how-to on implementing the Depot System will be presented. Some real-life examples from the Computer Science department at Florida International University (FIU-SCS) will be brought up as needed.

Goals

When designing a software repository such as Depot, several goals need to be kept in mind:

The Depot System can be thought of as consisting of two parts: A software component, and a design component. The software component is the BSD Automounter^ⁱⁱⁱ, a tool which allows filesystems to be mounted automatically based on instructions given to it in mount maps. The design component is the way the automount maps are set up and the directory structures used on the server and client machines.

The BSD Automounter, or amd, is a tool that can automatically mount a directory when needed, and automatically unmount it later when it is no longer being used. Its operation is simple: Several directories, like /home, are given to the amd program as automount points, along with an automount map. The automounter attaches itself as an NFS server to the given directories, and intercepts any access through those directories. If the path being accessed matches an entry in the automount map, the automounter will mount the corresponding directory via NFS in a predetermined place, and create a symbolic link in the automount point to that mounted directory. Here is a concrete example to explain this better:

The automounter is run with the map “amd.home” on the automount point “/home”. The amd.home map contains the entry below. Note that this entry is all one line and the backslashes are used because it can’t all fit on one line here.

vixen \ host!=vixen;opts:=rw,retry=10,nosuid;type:=nfs;rhost:=vixen.cs.fiu.edu.;rfs:=/disk/75 \

The user then types “cd /home/vixen/images”. Since the directory this user wants is in the automount point /home, the automounter will intercept this request, and search for an entry named “vixen” in the automount map. It will find the entry listed above, which tells it to mount via NFS the directory “/disk/75” on vixen and create a symbolic link pointing to the newly mounted directory. “/home/vixen/images” will now contain the files in the real directory “/disk/75/images” on the host vixen.

vixen - This is the name of the directory that is accessed through /home.
host!=vixen - This specifies which remote host the directory is on.
opts:=rw,… - This specifies which NFS mount options to use when mounting.
type:=nfs - This specifies that we want an NFS mount.
rhost:=vixen… - This is the FQDN^¹ of the NFS server.
rfs:=/disk/75 - This is the remote directory on the NFS server to be mounted.

In addition to having map entries to mount filesystems, you can also have map entries that create virtual symbolic links inside an automount map. for instance, the entry:

in the home map will result in “/home/stuff” becoming a symbolic link to /home/vixen/stuff. The symbolic link is created automatically on demand and is removed after a period of inactivity, just like actual filesystem mounts.

All NFS mounts performed by the automounter are in a specified directory, such as “/.automount/vixen.cs.fiu.edu./disk/75”. These directories are automatically created when the filesystem is mounted, and torn down again when the mount expires after being unused for a specified timeout period.

The distribution of automount maps to the client can either be done through files that are distributed periodically to the machines from a central server, or through NIS. NIS is the preferred method, since it allows for quick updates to all the maps without the risk of any clients getting out of synchronization with the rest.

Of course, this is a very basic description of how the BSD Automounter works; more details can be found in its documentation.

Implementing the Depot System with amd

The Depot system consists of two primary automount maps: The home map, and the depot map. Other maps can exist in an automount setup, of course, but these are the two maps that implement that actual software repository.

The Home Map

The first step in creating a software repository is setting aside disk space for the software that is to be installed. The most convenient way to do this is to create a single automount map, the home map, which maps simple paths such as “/home/vixen” or “/home/hpdrc-users-3” to actual disks. The full name of this map is usually amd.home, and it doesn’t have to be specific to the Depot System; it is a handy way to make storage space available for any purpose, like users’ home directories. The automount point for the home map is, naturally, /home.

The entry displayed in Example 1 above shows what an entry in the home map might look like. This map can either be generated automatically via a script, or maintained manually. At FIU-SCS, it is built hourly from the disks file, which is a database of every NFS exported disk in the system. The same database is used for generating backup configurations and for keeping track of available disk space.

Once the home map is set up, and disks can be accessed through "/home/mapname", it is time to set up the Depot Map, which contains the actual entries that make the system work.

The Depot Map is the heart of the Depot System. Its automount point is /netdepot. It specifies where and how the software is installed. This map is divided into two primary sections, and each section has a corresponding set of directories on the fileserver where the actual software is kept.

The first section in the Depot Map describes depot directories on the file servers. These Depot directories, usually located at the root of a physical filesystem, are where each installed package's directory tree residesThe goal here is to assign a number to each one of these directories, and have the path "/depot/dirs/n", where n is that number, point to it. For example:

and so on. As you can see, this map builds on the home map, which has already described where a directory like "/home/vixen" exists. This is advantageous because we don't have to use NFS anymore at this point; we can simply use amd's ability to create dynamic links to other locations which are already mounted via NFS. . All Depot Dirs are owned by a designated user, unless there is a very good reason why a particular package shouldn’t be. This user should not be root, nor should it be the regular user account of the System Administrator. Having everything owned by this user makes maintanence much more convenient.

dirs type:=auto;cache:=none;fs:=${map};pref:=${key}/
dirs/1 type:=link;fs:=/home/mongoose2/depot
dirs/2 type:=link;fs:=/home/dizzy/compsci/depot
dirs/3 type:=link;fs:=/home/dizzy2/depot
dirs/4 type:=link;fs:=/home/n2/depot

The first entry "dirs" creates a new automount map under the already existing "netdepot" automount map (more on that later). The entries underneath assign various depot directories on different machines (in this case, mongoose, dizzy, and n2) to different DepotDirs numbers. To access the Depot directory on /home/mongoose2, one would simply access /depot/dirs/1.

The aim here is to make it convenient to maintain the software repository; when a software package is to be installed, all that has to be done is to choose a /depot/dirs with enough free space, set up the map for that software package and install. This brings us to the second section in the Depot Map: Software maps.

Every software package has a different preferred directory structure. When installing a software package, there are generally certain directories that contain architecture dependant files, such as binaries, and other directories that contain architecture indepdendant files, like source and documentation. Software maps allow one to tell Depot which parts are architecture dependant and which are independant, and enable the system to automatically choose the correct dependant files when requested by the user.

The best way to describe software maps is by example. Here is an example of two different types of software maps; one for a software package installed for only one architecture, and one for a multiple architecture installation of Pine.

The simplest kind of software map is for a package installed on only one architecture. It only involves a few lines, as follows:

To keep packages separate, comments are inserted in the map with the name of the package. Here is the map entry, broken down into its components:

OCRShop – The name of the map. The user will type “cd /depot/OCRShop” to get there.

fs:=/depot/dirs/11/${key} – The location the symbolic link points to. OCRShop will be stored in “/depot/dirs/11/OCRShop”. The “${key}” entry simply tells amd to substitute the name of the map in that location.

The more common software map is one that is set up for multiple architectures. The actual configuration of the map will vary between each package, but the same idea is used for each one. Here is what a multiple architecture map for Pine looks like:

#
# pine - ver 4.x
#
pine-4 type:=auto;cache:=none;fs:=${map};pref:=${key}/
pine-4/share type:=link;fs:=/depot/dirs/28/${key}
pine-4/bin type:=link;fs:=/depot/dirs/28/${key/}/${arch}-${os}/${/key}
pine-4/lib type:=link;fs:=/depot/dirs/28/${key/}/share/${/key}
pine-4/man type:=link;fs:=/depot/dirs/28/${key/}/share/${/key}
pine-4/src type:=link;fs:=/depot/dirs/28/${key/}/share/${/key}
pine-4/libexec type:=link;fs:=/depot/dirs/28/${key/}/${arch}-${os}/${/key}

As with the DepotDirs map, the first line creates an sub-automount map named pine-4, under which links can be created. Pine 4.xx, when installed, requires architecture dependant directories (bin, libexec) and architecture independent directories (lib, src, man).

To implement the architecture independant directories, a “share” directory is created in /depot/dirs/28/pine-4. This share directory then contains lib, man, and src.

To implement an architecture dependant directory, like “bin”, a special modifier is used in the map entry. This modifier is “${arch}-${os}”. It is translated based on the architecture amd is running on, or more precisely, options given to amd on boot. For Linux, this might be “i386-linux6”. For solaris, it might be “sun4-sos56”. Either way, there exists a directory in /depot/dirs/28/pine-4 for each architecture, and inside those directories lives the architecture dependant “bin” and “libexec” directories.

To show how this all fits together, let’s say a user tries to run “/depot/pine-4/bin/pine”. The /depot directory contains a link named “pine-4” that points to /netdepot/pine-4. Since /netdepot is the automount point that amd.depot controls, the request is translated as follows:

/netdepot/pine-4/bin/pine -> (amd.depot’s software map for pine-4)
/depot/dirs/28/i386-linux/bin/pine -> (amd.depot’s depotdirs section)
/home/ferret/depot/pine-4/i386-linux/bin/pine -> (amd.home map)
(NFS) ferret.cs.fiu.edu:/disk/14/depot/pine-4/i386-linux/bin/pine

This example may seem rather complicated at first, but it can be explained summarily as follows: Each software map in amd.depot simply contains pointers to architecture- and non-architecture-dependant directories. Architecture dependant directories are reached by using the ${arch}-${os} tag, which is substituted with a string unique to each architecture.

If one has implemented the maps in the manner described here, a user can get to a package by going to /netdepot/pine-4/bin and run it right there. However, this method is inconvenient for both users and system administrators. For one thing, a user wants to be able to run a package by simply typing its name. Secondly, /netdepot is an automount point. If one wants to install a package locally on a machine for some reason, they can’t manually put links in /netdepot.

For this reason, a /depot directory is created on each client machine, and for every software map entry in amd.depot, a symbolic link from /depot/packagename to /netdepot/packagename is created. These links are created every night by a script called “builddepotdir”^{^iv}.

The builddepotdir script does two things. First, it goes through the /depot directory and removes any links that don’t have corresponding packages in the amd.depot map. Next, it looks for any new packages in amd.depot and creates the correct links in /depot. It will not override any links that are already there, for a reason that will become apparent in the next part of the paper on installing software.

To solve the problem with a user having to type the entire path to a package to execute it, one simply has to add the binaries to their path. The most convenient way to do this is to maintain a /usr/local/bin directory on each client machine. This directory contains links to all the binaries of all the software packages installed in the Depot System. To maintain it, a master machine can be set aside for each architecture where the links are created every time a package is installed. These links are created by a script; at FIU-SCS we use FIUCSinstall-depot^{^v}. This machine is then set up to rdist^{^vi} the links out to all other machines of that architecture. When installing libraries and manual pages, the same method can be used to maintain the directories /usr/local/lib, /usr/local/include, and /usr/local/man, which will contain links to every package’s lib, include, and man directories, respectively. Then /usr/local/bin is added to the users’ PATH, /usr/local/man is added to the users’ MANPATH, and /usr/local/lib is added to the users’ LD_LIBRARY_PATH. If these directories are properly maintained, users will now be able to use all the packages installed in Depot without even knowing the Depot system is there.

Sometimes a package should be installed locally on a machine, for performance or reliability reasons. If the package is already installed into the Depot System, duplicating effort would be a Bad Thing. Therefore, we employ something called the Local Depot, or lcldepot.

Here is a simple step-by-step method for installing a Depot’ed package locally:

And that is all. The package will now be accessible locally, and links from /usr/local/bin and such will work fine.

Once the Depot System is operational, a simple guide on how to install software into the system might be useful. Installing the software can be divided into several steps:

The first step in installing a package into Depot is to simply audit the package to find out what the installation entails. If the package uses GNU Configure, the installation will probably be extremely simple, as Depot was based on GNU Configure paradigm. At this point, we read the README and INSTALL files and learn about the package. It might also be a good idea to configure and build the package, and test-install it into a temporary directory in order to determine what is built architecture dependant and what isn’t.

After determining what directories the package requires, it is time to create a software map for it. This process involves deciding on a DepotDir where the software will reside, creating the actual map entry, and then going to the decided location and actually creating the directory tree. A consistent naming scheme, such as programname-version, is a good thing to abide to. Good examples are pine-3.96, fvwm-1.24r, oracle-8, and so on. A different directory should be used for each version of the package installed. After setting up the maps and directories, one should update the automount maps, rebuild the /depot directory, and flush amd’s cache (amq –f) on the development machine so that configuring and building the package can begin immediately. Other hosts will simply have to wait until cron jobs do the work everywhere else.

Once the directories have been created, the software package can be unpacked into the src directory and configuration can begin. If the package uses GNU Configure, like most, the configuration process is incredibly simple. For example, MySQL:

The Depot System was designed for GNU Configure based packages, so installation will usually be a breeze. For packages that don’t use GNU Configure, or for some commercial software, some additional hackery may be needed.

After configuration, the package is compiled with a close eye to make sure everything builds smoothly.

For this step, a make install will generally suffice for most packages. The primary tip to keep in mind before running “make install” is to ensure that the directories required for installation exist! A good way to check is to try to cd to them first. A “df .” should reveal their location as being the server you want to install the package on, not the local machine.

After running “make install”, the appropriate links must be created on the “master” machine for that architecture, in /usr/local/bin, /usr/local/lib, and /usr/local/man, pointing to the binaries, libraries and manual pages of the newly installed package. A script such as FIUCS-installdepot is modified to suit the package and executed to do this, usually. Thorough testing is a must to ensure the package works correctly.

The Depot System makes life easier for administrators, but it also provides a convenient place for users to locate software. This section of the paper will serve as a guide for users operating in a Depot environment. It may appear to re-hash some things said earlier, but that is because this section of the paper is meant to stand on its own, in a way.

If you are a user in an environment that uses Depot, you will find that all the software that is available for your use is normally accessible by typing the name of the binary you wish to run. While this works for most people, having a basic understanding of where software and documentation live can help you do your work more efficiently.

In the Depot System, software is located all over the network. Any machine, from the server locked away in the data center to the workstation on your desk can serve the software you run. This is not really all that important, though, since you can access the majority of software through the directory /depot on your machine.

The /depot directory contains links to directories on the network where the software is installed. If you do an “ls” inside /depot, you will see the names of many packages. Inside each of the listed directories, there are several virtual directories. These directories aren’t normally visible unless they are accessed, so you have to actually “cd” into them in order to find them. Because of this, there is a standard directory naming convention by which Depot System software abides by:

Not all packages contain these directories. You can find out which directories are available for a package by typing “ypcat –k amd.depot |grep packagename” on most systems.

When you execute a Depotized program by typing its name, the system will automatically choose the correct binary to run for your architecture. This frees you from having to choose yourself. Of course, it is possible that a particular program hasn’t been compiled for, or isn’t available for a certain architecture. In cases like these, you should talk to your system administrator to find out if software for the architecture you use can be installed.

The Depot System is there to make life easier. You can learn as little or as much as you want about it, for it has been designed to allow you to run your software quickly and transparently.

In this section, some improvements to the Depot System will be suggested. Since the system is simply a creative use of the BSD Automounter, the possibilities for improvement are endless.

One primary annoyance when installing software for the Depot System is having to set up an automount map entry for each directory a package wants to install files into. This can become tedious; especially if you are installing an unfamiliar package and it wants directories you haven’t set up in the map yet.

A simple modification that banks on the fact that in a particular directory, “..” points to the real parent directory, not the symlink that was traversed to get to that directory is described here.

Rather than creating an entry in the automount map for each directory we think the package is going to use, we create a single architecture dependent entry as follows:

Then, in the directory we are to install the software in, we create a share directory, and also a directory for each architecture. Inside the directory for each architecture lives a group of architecture-dependent directories, and symbolic links pointing backwards into the share directory for each architecture independent directory, as follows:

bin/
etc -> ../share/etc
lib -> ../share/lib
man -> ../share/man
share -> ../share
src -> ../share/src
swat/
var/

Since “..” refers to the REAL parent directory, and not the directory that contained the symbolic link to the present location, if you go to /depot/samba-2.0.7/src, you will end up in the correct directory:

At first, having to create all those symbolic links and such may seem irritating, but it is a lot less irritating than having to go back to the automount map, make changes, push the maps out, and flush the automounter’s cache every time you want to modify your software installation.

Earlier in this paper, it was suggested that we use different depot directory trees for each version of a software package we install. This allows one to prototype and test a new version without disabling the old version. However, for very minor software revisions, (for instance, pine-4.03 to pine-4.04), creating a new Depot may seem tedious. It may be better to create the depot directory tree as “pine-4.x” and store all the different source trees under the “share/src” directory of this tree.

Of course, this setup limits us to one particular install of the software package. This system is best used for software that can be tested right after running a “make”; Pine can be tested this way. If it works, the package can then be installed with a “make install”. This also saves you the work of having to delete and re-create the links in /usr/local/{bin|lib|man} every time you do a minor version update.

This section of the paper will take a brief look at other software repositories that provide similar functionality to Depot. Links to Web Sites where one can read more about these systems will be provided.

The Depot-Lite^{^vii} system, developed by John P. Rouillard and Richard B. Martin at the University of Massachusetts at Boston, simplifies the software installation process and allows for several different version of a particular package to be installed and available at the same time.

This system splits the directory tree by architecture first, instead of by package. It also uses hard links between non-architecture dependant files in separate architecture trees for the same package, rather than have a lone share directory for said files.

NIST Depot^{^viii} is the Depot system that started it all. It is similar to CMU Depot, but relies on the Sun Microsystems Automount utility, rather than the cross-platform BSD Automounter.

1 FQDN = Fully Qualified Domain Name. This is the full name of a host that is valid anywhere on Internet.

i NFS: Network File System Protocol Specification, RFC 1094, Sun Microsystems, March 1989

ii Depot: A Tool For Managing Software Environments, by Wallace Colyer and Walter Wong, Carnegie Mellon University; part of the USENIX Systems Administration (LISA VI) Conference, October 19-23, 1992

iii The current version of the BSD Automounter is also known as am-utils. This can be found on Internet at http://www.am-utils.org.

iv FIU-SCS’ version of builddepotdir can be found at http://www.cs.fiu.edu/~jflynn02/depotsystem/builddepotdir

v FIUCS-installdepot simply looks in /depot/packagename/bin, /depot/packagename/lib, /depot/packagename/man, and so on, and makes appropriate links to /usr/local/(bin|lib|man|include). A copy of the script can be found on Internet at http://www.cs.fiu.edu/~jflynn02/depotsystem/FIUCS-installdepot

vi There are several versions of rdist that can be used to do this. The one used at FIU-SCS is USC rdist, available at ftp.usc.edu/pub/rdist

vii Depot-Lite: A Mechanism For Managing Software, by John P. Rouillard and Richard B. Martin, University of Massachusetts at Boston, part of the USENIX System Administration (LISA VIII) Conference, September 19-23, 1994

viii The Depot: A Framework for Sharing Software Installation Across Organizational and UNIX Platform Boundaries, by Kenneth Manheimer, Barry A. Warsaw, Stephen N. Clark, and Walter Rowe, NIST and Century Computing, part of the USENIX System Administration (LISA IV) Conference, October 18-19, 1990