Version: 1.1.1
Date: 2023-02-24 Minor Version: 1.1

Scientific Filesystem (SCIF)

Authors:

Terminology

Term Definition
SCIF The Scientific Filesystem
Host The machine or container where the scientific filesystem is installed alongside but separate from the core operating system
App A scientific filesystem module that is installed onto the Host
Data A root location in the scientific filesystem exclusively for data inputs and outputs that belong to Apps
Recipe A text file with extension .scif with sections that define a SCIF
Discoverability An ability to programatically uncover software and metadata
Predictibility A means of interaction with software without having prior knowledge to its creation
Client A controller for a SCIF, either for a developer or a user
Integration A third party software or tool that understands the SCIF structure and interacts with all or some portion of it
Container A linux container technology that can be installed with a SCIF
SC The scientific community, including academics, researchers, and associated engineering and development staff
ENV Environment variables in the namespace of SCIF (prefixed with SCIF_ that drive interaction of a user agent (UA)
UA A user agent (controller) that interacts with environment variables (ENV) to act as an interface between the user with the scientific filesystem

Objective

To define an organizational standard for the scientific community, a Scientific Filesystem (SCIF) that will make it easy for the scientific community (SC) to generate predictible, and programmatically accessible scientific apps optimized for encapsulated, reproducible environments.

Goals

The Scientific Filesystem (SCIF) will:

Non-Goals

The Scientific Filesystem (SCIF) explicitly does not implement, define, or provide in v0.2:

Solution Overview

This specification defines a filesystem structure and environment variable namespace that interact with functions to produce discoverable software applications. We focus on recommendations for a container technology or other encapsulated environment to implement a SCIF compatible (containerized) filesystem (or integrations thereof) along with controllers for it (client). A compatible client or integration must provide a means to generate and interact with the SCIF filesystem and environment namespace, along with documentation to developers and users.

Architecture

The Scientific Filesytem has a default root on the host to intentionally be separate from standard linux folders, and to not interfere with likely existing folders (e.g., just /data or /apps). This is done by using a root $SCIF_BASE, defaulting to /scif

/scif
   /apps
   /data

Apps

The base of /scif/apps is where software modules will live, and can be generated automatically from a scif recipe and parsed to generate one. Clients and integrations that meet this specification to interact with a SCIF can parse these directories programatically without knowing specifics of the software apps installed. The structure of a complete app folder (foo) looks like the following:

/scif
   /apps
     /foo
        /bin
        /lib
        /scif
          runscript
          runscript.help
          labels.json
          environment.sh

Recipes

The scif recipe is the primary method by which a user from the SC can define and then generate a SCIF. The recipe consists of sections, where the delineation of a section is determined by starting with a % and then being followed by a software app name that the section is relevant for.

%section foo

The recipe itself is a text file that must have extension .scif, and can serve as input to clients and integrations for SCIF.

Sections

Specifically, this means that any software that implements the SCIF would do the following procedure to produce a SCIF from a recipe:

  1. Find any app section alongside a name (e.g., %appinstall foo) as indication of an application command.
  2. Parse the string following the section name (e.g. foo) as the name of the application. and create a folder for the app under the roots /scif/apps and /scif/data in lowercase given that each folder doesn’t already exist.
  3. If an installation procedure is present (%appinstall foo), perform it to install dependencies relative to the app’s install folder.
  4. Create a metadata folder, scif, inside the application folder.
  5. For those defined, save the environment (%appenv foo), help (%apphelp foo), runscript (%apprun foo), and labels (%applabels foo) to the metadata folder.
  6. Generate a “bin” folder for foo, and automatically add it to the $PATH when foo is being used. Generate a “lib” folder for libraries, to also be added to the $LD_LIBRARY_PATH when foo is used.
  7. If the (%apptest foo) section is defined, tests are run on the newly installed app to insure it works as expected.

This general list is hashed out in more detail in the following sections.

Data

The base of /scif/data is structured akin to apps - each installed application has its own folder, and additionally (but not required) subfolders are created for inputs and outputs:

/scif/data
   /foo
      /input
      /output

SCIF does not enforce or state how the container creator should use the data folders, but rather encourages the creator to use the organization so that a user can intutiively know that any input for app foo might go into /scif/data/foo/input, general data for foo might be in /scif/data/foo, and global data for the entire container might be in /scif/data. For example, this scientific workflow mounts a host folder at /scif/data, and then each application creates it’s needed data folders at runtime simply by referencing an environment variable specific to the app.

Environment Namespace

Variable Default Definition
SCIF_BASE /scif the root location for SCIF
SCIF_DATA /scif/data the root location for apps data
SCIF_APPS /scif/apps the root location for installed apps
SCIF_SHELL /bin/bash shell to use for “shell” command
SCIF_PYSHELL ipython interactive python shell for pyshell command
SCIF_ENTRYPOINT /bin/bash the command to run given no runscript or app defined
SCIF_ENTRYFOLDER SCIF_BASE the entry folder to run the entrypoint command
SCIF_MESSAGELEVEL INFO a client level of verbosity. Must be one of CRITICAL, ABORT, ERROR, WARNING, LOG, INFO, QUIET, VERBOSE, DEBUG

Table 1 During interaction and runtime of SCIF, the following environment variables must be defined.

Active App Environment Namespace

Variable Default Definition
SCIF_APPNAME example the active software app
SCIF_APPDATA /scif/data/example the data root for the active software app
SCIF_APPROOT /scif/apps/example the install root for the active software app
SCIF_APPBIN /scif/apps/example/bin the app bin, which is automatically added to the path when active
SCIF_APPLIB /scif/apps/example/lib the app bin, which is automatically added to the path when active
SCIF_APPMETA /scif/apps/example/scif the metadata folder
SCIF_APPHELP /scif/apps/example/scif/runscript.help a text file with help to print for the user to the terminal
SCIF_APPRUN /scif/apps/example/scif/runscript the commands to run as the app entrypoint
SCIF_APPSTART /scif/apps/example/scif/startscript the start script (if provided) for an app
SCIF_APPTEST /scif/apps/example/scif/test the commands to run to test the app
SCIF_APPLABELS /scif/apps/example/scif/labels.json a key:value json lookup dictionary of labels
SCIF_APPENV /scif/apps/example/scif/environment.sh a shell script to source for the software app environment

Table 2: During interaction with a specific app (e.g., example) in the table above, these environment variables must be defined in additional to the global SCIF variables above.

With the convention above, any tool that interacts with SCIF could, for example, know that data is expected to be written to SCIF_APPDATA, or that a label (e.g., a parameter) can be read from the SCIF_APPLABELS, or that an application has its binaries at SCIF_APPBIN. The integration doesn’t need to know that example is running.

Sleeping App Environment Namespace

Variable Default Definition
SCIF_APPNAME_sleeper sleeper the inactive software app
SCIF_APPDATA_sleeper /scif/data/sleeper the data root for the inactive software app
SCIF_APPROOT_sleeper /scif/apps/sleeper the install root for the active software app
SCIF_APPBIN_sleeper /scif/apps/sleeper/bin the app bin, which is automatically added to the path when active
SCIF_APPLIB_sleeper /scif/apps/sleeper/lib the app bin, which is automatically added to the path when active
SCIF_APPMETA_sleeper /scif/apps/sleeper/scif the metadata folder
SCIF_APPHELP_sleeper /scif/apps/sleeper/scif/runscript.help a text file with help to print for the user to the terminal
SCIF_APPRUN_sleeper /scif/apps/sleeper/scif/runscript the commands to run as the app entrypoint
SCIF_APPSTART_sleeper /scif/apps/sleeper/scif/startscript startscript for the app
SCIF_APPTEST_sleeper /scif/apps/sleeper/scif/test the commands to run to test the app
SCIF_APPLABELS_sleeper /scif/apps/sleeper/scif/labels.json a key:value json lookup dictionary of labels
SCIF_APPENV_sleeper /scif/apps/sleeper/scif/environment.sh a shell script to source for the software app environment

Table 3 :During interaction with a specific app, the other apps (not active) must also reveal themselves through an equivalent set of environment variables specific to them. In the table below we describe environment variables defined for such an app (e.g., sleeper). The variables are equivalent except for the appending of the app name at the end:

This strategy ensures that any application built to have applications interact (e.g., a container installed with several steps that internally work together) can easily reference another application within the same container without knowing the exact path. Further, it ensures that entire spaces of environment variables (e.g., all application help text files) can be revealed by way of discovering environment variables that match a particular pattern.

Additional Variables

Additional variables are appropriate for integrations to define, but must begin with SCIF_ to remain in the scientific filesystem namespace. This also ensures that SCIF, when installed alongside other software and environments, can have its environemnt settings isolated by way of filtering down based on a common prefix.

Integration Requirements

The following sections detail requirements for integrations (third party software that uses SCIF) that are distinguished from clients (a controller or generator of a SCIF) because they don’t necessarily need to create a SCIF, but just interact with one.

Initialization

At the initialization step of an integration or client, the following must be true:

Client Requirements

A client, in that it is a more advanced integration, must also follow the integration requirements previously stated.

Recipe Parsing

A parser of a recipe should be able to read chunks of text and return a data structure that includes a dictionary of app settings, each indexed by the app name, and the next level having sections, each a list of the commands from the file. The ordering that the sections are presented, in terms of apps that are discovered first, should be preserved. For example, if app “foo” has a section for an install that is discovered first, an app “bar” with an install section should be run after. If an app “red” has an environment section that comes before an app “blue” install section, by way of being discovered first in the recipe, the sections for “red” will be processed first, even if the install section for red came after blue’s.

For example, the following chunk in a text file recipe sregistry.scif would parse to

#######################################
# Singularity Registry
#######################################
%appenv registry
SREGISTRY_CLIENT=registry
export SREGISTRY_CLIENT
%apprun registry
exec /opt/conda/bin/sregistry "$@"

For example,

{"apps":
    { "registry":
             { 'appenv': ['SREGISTRY_CLIENT=hub', 'export SREGISTRY_CLIENT'],
               'appinstall': ['pip install sregistry[registry]'],
               'apprun': ['exec sregistry "$@"']}
             }
    }
}

And then the integration can add other sections or parameters that are needed to higher level indices here. For example, if an %install section is added, it would be parsed to:

{"apps":
    { "registry":
             { 'appenv': ['SREGISTRY_CLIENT=hub', 'export SREGISTRY_CLIENT'],
               'appinstall': ['pip install sregistry[registry]'],
               'apprun': ['exec sregistry "$@"']}
             }
    },

 "install": { ... }
}

During the parsing process, the following rules must be followed:

Creation

The creation step coincides with generation of a SCIF.

Introspection

Any integration or tool must reveal metadata, and SCIF content to its users. This means that:

Controller Requirements

Any integration or tool that is considered a client and implements SCIF must provide a controller that:

Commands

The minimal set of functions for the controlled must support the following higher level commands. We use the controller name ctrl in the examples below, and suggested action names: