.. currentmodule:: saliweb.backend .. _backend: Backend ******* The backend is a set of Python classes that manages jobs after the initial submission. This typically runs as a daemon on our 'modbase' machine, picking up submitted jobs from the frontend, submitting jobs to the cluster and gathering results, and doing any necessary pre- or post-processing. The backend is implemented by the :mod:`saliweb.backend` Python module. Classes ======= Job --- The :class:`Job` class represents a single job known to the backend. This could have just been submitted by the frontend, it could be running on the cluster, it could have finished and its results placed on long-term storage, or the results from a very old job could have been deleted (only the job metadata remains). Each job corresponds to a single row in the jobs database table and a directory on disk. For any web service, the :class:`Job` class must first be subclassed and then one or more of its methods implemented to actually do the work of running jobs. For example, the :meth:`Job.run` method will be called by the backend when the job starts; it is expected to start the job running on the cluster, typically by using a :class:`WyntonSGERunner` object. There are similar methods that can be used to do extra processing before the job starts (:meth:`Job.preprocess`), after it finishes running on the cluster (:meth:`Job.postprocess` and :meth:`Job.finalize`) or when the job is moved to long-term storage (:meth:`Job.archive`), for example. Each method is run in the directory containing all of the job's data (i.e. any files uploaded by the end user when the job was submitted, plus any output files after the job has run). (The exception is :meth:`Job.expire`, which is called after the job directory has been deleted.) If any of these methods raises an exception, it is caught by the backend; the job is put into a failed state and the server admin is notified. Thus, exceptions should be used only to indicate a technical error in the web service, not something wrong with the user's input (in the latter case, the job output should simply indicate what the problem is). .. note:: Each of these methods is automatically run by the backend at the correct time; they should not be run manually by any method in the subclass. For example, to run a new job, call :meth:`Job.reschedule_run`, not the :meth:`Job.run` method directly. As mentioned above, a :class:`WyntonSGERunner` class is provided that takes care of the details of running a script on the cluster and checking if it has completed. Typically the script run here should use the local /scratch disk on the cluster nodes if possible - this is not implemented automatically by the framework, since the best usage of local and network disks is specific to a given web service. Database -------- Each job's metadata is stored in a database; the :class:`Database` class manages this database and creates :class:`Job` objects automatically when requested to by other classes. The base :class:`Database` class interfaces with a MySQL database on the 'modbase' machine and manages database tables containing fields used by all web services. It can be subclassed to add additional fields (for example, to store some extra job metadata in the database rather than within the job's directory) or (potentially) to use a different database engine. Config ------ The :class:`Config` class parses the configuration file for the web service and stores all of the configuration information. It can be subclassed if desired to read extra service-specific information from the configuration file, usually by extending the :meth:`Config.populate` method. WebService ---------- The :class:`WebService` class provides high-level backend functionality. The most commonly-used method is :meth:`WebService.do_all_processing`, which simply runs in an endless loop, submitting new jobs to the cluster, collecting the results of finished jobs, and archiving old completed jobs. It is rarely necessary to subclass. .. _jobstates: Job states ========== A single job in the system is represented by a row in the database table and a single directory that contains the job inputs and/or outputs. Each job can be in one of a set of distinct states, described below. In normal operation a job will move from the first state in the list below to the last. * The **INCOMING** state is for jobs that have just been submitted to the system by the frontend, but not yet picked up by the frontend. * Jobs move into the **PREPROCESSING** state when they are picked up by the frontend. At this point the :meth:`Job.preprocess` method is called, which can be overridden to do any necessary preprocessing. Note that this method runs on the server machine ('modbase') and serially (for only a single job at a time), so it should not run any calculations that take more than a few seconds. * Next, jobs usually move to the **RUNNING** state. At this point the :meth:`Job.run` method is called, which typically will submit an SGE job to do the bulk of the processing. * When the SGE job finishes, the job moves to the **POSTPROCESSING** state and the :meth:`Job.postprocess` method is called. Like preprocessing, this runs serially on the server machine and so should not be computationally expensive. * It may be decided in postprocessing that further runs are required, in which case the job moves back to the **RUNNING** state for another cycle. Otherwise, it moves to the **FINALIZING** state and :meth:`Job.finalize` is called. This is the same as **POSTPROCESSING** except that is only called on the last cycle (no further runs can be started here). * Next the job moves to the **COMPLETED** state and the :meth:`Job.complete` method is called. If the user provided an email address to the frontend, they are emailed at this point to let them know job results are now available. * After a defined period of time, the job moves to the **ARCHIVED** state and the :meth:`Job.archive` method is called. At this point the job results are still present on disk, but are no longer accessible to the end user and may be moved to long-term storage. * After another defined period of time, the job moves to the **EXPIRED** state, the job directory is deleted, and the :meth:`Job.expire` method is called. At this point, only the job metadata in the database remains. If a problem is encountered at any point (usually a Python exception) the job is moved to the **FAILED** state. At this point the server admin is emailed and is expected to fix the problem (usually a bug in the web service, or a system problem such as a broken or full hard disk). Note also the :meth:`Job.preprocess` method can, if desired, signal to the framework that running a full SGE job is unnecessary (by calling the :meth:`Job.skip_run` method). In this case, the **RUNNING** and **POSTPROCESSING** steps are skipped and the job moves directly from **PREPROCESSING** to **COMPLETED**. Similarly, the :meth:`Job.postprocess` method can request that the framework runs a new job (by calling the :meth:`Job.reschedule_run` method). In this case, the job moves from **POSTPROCESSING** back to **RUNNING**. Each job state (with the exception of **EXPIRED**) can be given a directory in the service's configuration file. Job data are automatically moved between directories when the state changes. For example, the **INCOMING** directory generally needs to reside on a local disk, and have special permissions so that the frontend can create files within it. The **RUNNING** directory usually needs to be accessible by the cluster, so it needs to be on the :file:`/wynton` disk. The **ARCHIVED** directory may live on long-term storage, such as a park disk. Examples ======== .. _simplejob: Simple job ---------- The example below demonstrates a simple :class:`Job` subclass that, given a set of PDB files from the frontend, runs an SGE job on the cluster that extracts all of the HETATM records from each PDB. This is done by overriding the :meth:`Job.run` method to pass a set of shell script commands to an :class:`WyntonSGERunner` instance; this instance is then returned to the backend. The backend will then keep track of the SGE job, and notice when it finishes. The subclass also overrides the :meth:`Job.archive` method, so that when the job results are moved from short-term to long-term storage, all of the PDB files are compressed with gzip to save space. .. literalinclude:: ../examples/simplejob.py :language: python Custom database class --------------------- The :class:`Database` class can be customized by adding additional fields to the database table. This is useful if you need to pass small amounts of job metadata between the frontend and backend, or between different stages of the job, and the metadata are useful to keep after the job has finished. .. note:: In many cases, it makes more sense to store job data as files in the job directory itself. For example, it is probably easier to store a PDB file as a real file rather than trying to insert the contents into the database table! This example adds a new integer field *number_of_pdbs* to the database. The field can then be accessed (read or write) from within the :class:`Job` object by referencing *self._metadata['number_of_pdbs']*. The *_metadata* attribute stores all of the job metadata in a Python dictionary-like object; it is essentially a dump of the database row corresponding to the job. .. literalinclude:: ../examples/customdb.py :language: python Logging ======= It is often useful for debugging purposes to log progress of a job. While the job is running on the cluster, the only way to do this is to write output into a log file. For other steps in the processing, however, the standard Python `logging module `_ is utilized. Each job method (such as :meth:`Job.run`, :meth:`Job.preprocess`) with the exception of :meth:`Job.expire` can use the :attr:`Job.logger` object to write out log messages. It is a standard Python Logger object, so supports the regular methods of a Logger, such as :meth:`~Logger.warning` and :meth:`~Logger.critical` to write log messages, and :meth:`~Logger.setLevel` to set the threshold for log output. By default, anything logged that exceeds the threshold will be written to a file called 'framework.log' in the job's directory. The file will only be created when the first log message is printed. This behavior can be modified if desired by overriding the :meth:`Job.get_log_handler` method. .. literalinclude:: ../examples/logging.py :language: python Testing ======= The best way to test the backend is as part of the entire web service (see :ref:`testing`). However, the backend can be tested directly without invoking the frontend, by manually modifying the MySQL database. Note, however, that the interface between the backend and frontend, as well as the details of the MySQL tables, are not guaranteed to be stable (future iterations of the framework may change some of the details for performance or additional features), so this method could fail in future. To manually submit a job: #. Decide on a job name. This must be unique. Create a directory with the same name, as the backend user, under the web service's incoming directory (as specified in the configuration file). #. Put all necessary input files into this directory. #. Connect to the MySQL server using the `mysql` client on `modbase`, and the username and password from the web service's configuration file. Either the backend or frontend user can be used; the frontend user can only submit jobs and so is recommended, while the backend user can also delete or modify jobs, which is dangerous as it may break the service. For example, ``mysql -u modfoo_frontend -p -D modfoo``. #. To actually submit a job use something like:: INSERT INTO jobs (name,passwd,user,contact_email, directory,url,submit_time) VALUES (a,b,c,d..., UTC_TIMESTAMP()); a,b,c,d are the values for the columns, described below: * 'name' is the name of the job, from above. * 'passwd' is used by the frontend to protect job results. Any alphanumeric string can be used here. * 'user' is the user that submitted the job. NULL can be used here. * 'contact_email' is the email address that the backend will notify when the job completes, or NULL for no email notification. * 'directory' is the filesystem directory containing the job inputs, which must match that created above. * 'url' is a web link that the backend will include in the email it sends out, telling the user where the results can be downloaded. A dummy value can be used here, since the frontend usually handles this. * 'submit_time' is the time (UTC) when the job was submitted. Usually, the MySQL function UTC_TIMESTAMP() is used here to put in the current time. 5. The job will only be run if the backend is running (use the `bin/service.py` script as the backend user in the installation directory). The backend polls periodically for new jobs. Alternatively, `service.py` can be used to restart the backend, to force it to check immediately.