Deploying the web service

To actually deploy the web service, it is necessary to package the Python classes that implement the backend and frontend, then use the build system to install these classes in the correct location, together with other resources such as images, style sheets or text files needed by the web interface.

Prerequisites

Every service needs some basic setup:

  • The service needs its own MySQL database, and two MySQL users set up, one for the backend and the other for the frontend. A sysadmin can set this up on the modbase machine.

  • The service needs its own user on the modbase machine; for example, there is a modloop user for the ModLoop service. It is this user that runs scons (below). All of the backend also runs as this user, and jobs on the SGE clusters also run under this user’s account. (It is not a good idea to use a regular user for this purpose, as it will use up the regular user’s disk and runtime quota on the cluster, and bugs in the service could lead to deletion of that user’s files or their exposure to outside attack.) A sysadmin can also set up this user account.

  • The web service user needs a directory on the /wynton disk in order to store running jobs, and at least one directory on a local modbase disk so the frontend can create incoming jobs.

  • A sysadmin needs to configure the web server on modbase so that the web service files are visible to the outside world. They can also password protect the page if it is not yet ready for a full release.

  • It is usually a good idea to put the implementation files for a web service on GitHub, or in an SVN repository.

Quick start

The easiest way to set up a new web service is to have a sysadmin run the make_web_service script on the modbase machine. Given the name of the web service it will set up all the necessary files used for a basic web service. Run make_web_service with no arguments for further help.

Note

make_web_service should be run on a local disk (not /wynton). Most users on modbase have their home directories on a local disk, so this is generally OK by default. Note that the home directory should be accessible by the backend user in order for the build system to work; running chmod a+rx ~ should usually be sufficient.

Example usage

For example, the user ‘bob’ wants to set up a web service for peptide docking.

  1. He first chooses a “human readable” name for his service, “Peptide Docking”. This name will appear on web pages and in emails, but can be changed later by editing the configuration file, if desired.

  2. He also chooses a “short name” for his service, “pepdock”. The short name should be a single lowercase word; it is used to name system and MySQL users, the Perl and Python modules, etc. It is difficult to change later, but is never seen by end users so is essentially arbitrary.

  3. He asks a sysadmin to set up the web service, giving him or her the “short name” and the human readable name. (The sysadmin will run the make_web_service script.)

  4. Bob can then get the web service from git or Subversion by running:

    $ git clone git@github.com:salilab/pepdock.git [git]
    $ svn co https://svn.salilab.org/pepdock/trunk pepdock [Subversion]
    $ cd pepdock/conf
    $ sudo -u pepdock cat ~pepdock/service/conf/backend.conf > backend.conf
    $ sudo -u pepdock cat ~pepdock/service/conf/frontend.conf > frontend.conf
    
  5. Bob edits the configuration file in conf/live.conf to adjust install locations, etc. if necessary, and fills in the template Python modules for the backend and frontend, in backend/pepdock/__init__.py and frontend/pepdock/__init__.py, respectively.

  6. He writes test cases for both the frontend and backend (see Testing) and runs them to make sure they work by typing scons test in the pepdock directory.

  7. He deploys the web service by simply typing scons in the pepdock directory. This will give him further instructions to complete the setup (for example, providing a set of MySQL commands to give to a sysadmin to set up the database).

  8. Once deployment is successful, he asks a sysadmin to set up the web server on modbase so that the URL given in urltop in conf/live.conf works.

  9. Whenever Bob makes changes to the service in his pepdock directory, he simply runs scons test to make sure the changes didn’t break anything, then scons to update the live copy of the service, then git commit and git push to publish the changes at GitHub. (The backend will also need to restarted when he does this, but scons will show a suitable command line to achieve this.)

  10. If Bob wants to share development of the service with another user, Joe, they should ask a sysadmin to give Joe sudo access to the pepdock account. Joe can then set up his own pepdock directory by cloning the repository from GitHub and then developing in the same way as Bob, above.

Note

Development of the service should generally be done by the regular (‘bob’) user; only the backend itself runs as the backend (‘pepdock’) user. Bob can however run any command as the ‘pepdock’ user using ‘sudo’ (e.g. sudo -u pepdock scons to run scons as the pepdock user). Note that sudo will ask for the regular user’s (Bob’s) password, not the pepdock account (which does not have a password anyway, and cannot be logged into). For advanced access, a shell can be opened as the backend user by running something like sudo -u pepdock bash.

Design tips

When designing a web service, the following design tips may be useful:

  • The web service should implement little or none of the actual algorithm; instead, the algorithm should be implemented in another package that can be used independently. This allows others to use your algorithm on their own machines, rather than having to use Sali lab resources via the web service. The web service itself should only handle generating input files and nicely presenting any results (e.g. with interactive plots or protein structures). For example, ModLoop relies on MODELLER for the actual algorithm, while the algorithm used by the AllosMod web service is implemented in a separate AllosMod library, which allows the AllosMod protocol to be run from a command line.

  • A web service must be self contained. If you absolutely must use external scripts in your web service, don’t put them in your home directory or some other random place on the disk. Include and install them with the rest of the web service. See the MultiFoXS service for an example (in that case the external scripts are put in a scripts directory and installed in a cluster-accessible location).

  • Web service dependencies must be well defined. If you need to use external software, like IMP, scikit, or gnuplot, don’t compile your own version of that software and install it in a random place. Use “module load” to load the module for that software instead (if a module isn’t available, ask a sysadmin to build one for you).

The following sections describe the various components of a web service in more detail, for developers that wish to set things up themselves without using the convenience scripts.

Backend Python package

The backend for the service should be implemented as a Python package in the backend subdirectory. Its name should be the same as the service, except that it should be all lowercase, and any spaces in the service name should be replaced with underscores. For example, the ‘ModFoo’ web service should be implemented by the file backend/modfoo/__init__.py). This package should implement a Job subclass and may also optionally implement Database or Config subclasses. It should also provide a function get_web_service which, given the name of a configuration file, will instantiate a WebService object, using these custom subclasses, and return it. This function will be used by utility scripts set up by the build system to run and maintain the web service. An example, building on previous ones, is shown below.

import saliweb.backend
import glob

class Database(saliweb.backend.Database):
    def __init__(self, jobcls):
        saliweb.backend.Database.__init__(self, jobcls)
        self.add_field(saliweb.backend.MySQLField('number_of_pdbs', 'INTEGER'))


class Job(saliweb.backend.Job):
    runnercls = saliweb.backend.WyntonSGERunner

    def preprocess(self):
        pdbs = glob.glob("*.pdb")
        self._metadata['number_of_pdbs'] = len(pdbs)

    def run(self):
        script = """
for f in *.pdb; do
  grep '^HETATM' $f > $f.het
done
"""
        r = self.runnercls(script)
        r.set_options('-l diva1=1G')
        return r


def get_web_service(config_file):
    db = Database(Job)
    config = saliweb.backend.Config(config_file)
    return saliweb.backend.WebService(config, db)

Frontend Python package

The frontend for the service should be implemented as a Python package in the frontend subdirectory, named as for the backend (e.g. the ‘ModFoo’ web service’s frontend should be implemented by the file frontend/modfoo/__init__.py). An example is shown below. For clarity, only the methods are shown, not their contents; for full implementations of the methods see the Frontend page.

from flask import render_template, request
import saliweb.frontend


app = saliweb.frontend.make_application(__name__)


@app.route('/')
def index():
    return render_template('index.html')


@app.route('/job', methods=['GET', 'POST'])
def job():
    # submit new job or show all jobs (queue)


@app.route('/job/<name>')
def results(name):
    # show results page


@app.route('/job/<name>/<path:fp>')
def results_file(name, fp):
    # download results file

Configuration file

The service’s configuration should be placed in a configuration file in the conf subdirectory. Multiple files can be created if desired, for example to maintain both a testing and a live version of the service. Each configuration file can specify a different install location, MySQL database, etc. This directory will also contain the supplementary configuration files that contain the usernames and passwords that the backend and frontend need to access the MySQL database. Since these files contain sensitive information (passwords), they should not be group- or world-readable (chmod 0600 backend.conf), and if using SVN or git, do not put these database configuration files into the repository.

Using the build system

The build system is a set of extensions to SCons that simplifies the setup and installation of a web service. To use, create a directory in which to develop the web service, and create a file SConstruct in that directory similar to the following:

import saliweb.build

v = Variables('config.py')
env = saliweb.build.Environment(v, ['conf/live.conf', 'conf/test.conf'])
Help(v.GenerateHelpText(env))

env.InstallAdminTools()

Export('env')
SConscript('backend/modfoo/SConscript')
SConscript('frontend/modfoo/SConscript')

This script creates an Environment object which will set up the web service using either the configuration file live.conf or the file test.conf in the conf subdirectory.

The Environment class derives from the standard SCons Environment class, but adds additional methods which simplify the setup of the web service. For example, the InstallAdminTools() method installs a set of command-line admin tools in the web service’s directory (see below). SConscript files in subdirectories can use similar methods (such as InstallPython()) to set up the rest of the necessary files for the web service.

To test the web service, run scons test from the command line on the modbase machine (see Testing).

To actually install the web service, run scons build=live or scons build=test from the command line on the modbase machine, as the web service backend user, to install using either of the two configuration files listed in the example above. (If scons is run with no arguments, it will use the first one, live.conf.) Before actually installing any files, this will check to make sure things are set up for the web service to work properly - for example, that the necessary MySQL users and databases are present.

Command-line admin tools

The build system creates several command-line admin tools in the bin subdirectory under the web service’s install directory. These can be run by the web service user to control the service itself and manipulate jobs in the system.

service.py

This tool is used to start, stop or restart the backend itself for the web service. This daemon performs all functions of the web service, waiting for jobs submitted by the web frontend and submitting them to the cluster, harvesting completed cluster jobs, and expiring old job results. The tool also has a condstart option which will only start the service if it is not already running (the regular start option will complain if the service is running).

resubmit.py

This tool will move one or more jobs from the FAILED state back to the INCOMING state. It is designed to be used to resubmit failed jobs once whatever problem with the web service that caused these jobs to fail the first time around has been resolved.

deljob.py

This tool will delete one or more jobs in a given state. It can be used to remove failed jobs from the system, or to purge information from the database on expired jobs. Jobs in other states (such as RUNNING or COMPLETED) can also be deleted, but only if the backend service is stopped first, since that service actively manages jobs in these states.

failjob.py

This tool will force one or more jobs into the FAILED state. This is useful if, for example, due to a bug in the backend, a job didn’t work properly but went into the COMPLETED state. The backend service must first be stopped in order to use this tool.

delete_all_jobs.py

This tool will delete all of the jobs from the web service, so can be used to ‘restore to factory settings’. It deletes the database table, and all the files in all the job directories (even extraneous files that do not correspond to jobs in the database). It should be used with caution, as this cannot be undone.

list_jobs.py

This tool will show all the jobs in the given state(s). It is helpful for internal web services that don’t have an easily accessible queue web page.

Testing

Before the framework is put into production it should be tested to make sure it works correctly. There are two main types of tests that should be done:

  • Unit tests test individual parts of the service to make sure they work in isolation.

  • System tests test the service as a whole.

Unit tests

To test the frontend, make a test/frontend subdirectory and put one or more Python scripts there. Each script can use the functions and classes in the saliweb.test module, together with test functionality provided by the Flask framework, to create simple instances of the web frontend and test various methods given different inputs. For example, a script to test the index page might look like:

import unittest
import saliweb.test

# Import the modfoo frontend with mocks
modfoo = saliweb.test.import_mocked_frontend("modfoo", __file__,
                                             '../../frontend')


class Tests(saliweb.test.TestCase):

    def test_index(self):
        """Test index page"""
        c = modfoo.app.test_client()
        rv = c.get('/')
        self.assertIn(b'ModFoo: Modeling using Foo', rv.data)


if __name__ == '__main__':
    unittest.main()

Then write an SConscript file in the same directory to actually run the scripts, using the RunPythonFrontendTests() method. This might look like:

Import('env')

env.RunPythonFrontendTests(Glob("*.py"))

To test the backend, make a test/backend subdirectory and put one or more Python scripts there. Each script should define a subclass of saliweb.test.TestCase and define one or methods starting with test_ using standard Python unittest methods such as assertEquals. A number of other utility classes are also provided in the saliweb.test module.

For example, to test that the archive() method of the ModFoo service (Simple job example) really does gzip all of the PDB files, a test case like that below could be used:

import unittest
import modfoo
import saliweb.test
import os


class JobTests(saliweb.test.TestCase):
    """Check custom ModFoo Job class"""

    def test_archive(self):
        """Test the archive method"""
        # Make a ModFoo Job test job in ARCHIVED state
        j = self.make_test_job(modfoo.Job, 'ARCHIVED')
        # Run the rest of this testcase in the job's directory
        with saliweb.test.working_directory(j.directory):
            # Make a test PDB file and another incidental file
            with open('test.pdb', 'w') as f:
                print("test pdb", file=f)
            with open('test.txt', 'w') as f:
                print("text file", file=f)

            # Run the job's "archive" method
            j.archive()

            # Job's archive method should have gzipped every PDB file but not
            # anything else
            self.assertTrue(os.path.exists('test.pdb.gz'))
            self.assertFalse(os.path.exists('test.pdb'))
            self.assertTrue(os.path.exists('test.txt'))


if __name__ == '__main__':
    unittest.main()

Then write an SConscript file in the same directory to actually run the scripts, using the RunPythonTests() method. This might look like:

Import('env')

env.RunPythonTests(Glob("*.py"))

Run scons test to actually run the tests.

System tests

There is currently no rigorous way to carry out system tests other than deploying the service, then using the web interface to submit a job.

Examples

A simple example of a complete web service is ModLoop. The source code for this service can be found at https://github.com/salilab/modloop/ and the service can be seen in action at https://salilab.org/modloop/