Date: 2023-02-24 Minor Version: 1.1
Scientific Filesystem (SCIF)
Authors:
- Vanessa Sochat «vsochat@stanford.edu» (@vsoch)
- Satrajit Ghosh «satra@mit.edu» (@satra)
- Christopher Madan «christopher.madan@nottingham.ac.uk» (@cMadan)
- Pim Schravendijk «p.schravendijk@rijkzwaan.nl» (@pimpim)
- singularity@lbl.gov (Singularity Community)
Terminology
Term | Definition |
---|---|
SCIF | The Scientific Filesystem |
Host | The machine or container where the scientific filesystem is installed alongside but separate from the core operating system |
App | A scientific filesystem module that is installed onto the Host |
Data | A root location in the scientific filesystem exclusively for data inputs and outputs that belong to Apps |
Recipe | A text file with extension .scif with sections that define a SCIF |
Discoverability | An ability to programatically uncover software and metadata |
Predictibility | A means of interaction with software without having prior knowledge to its creation |
Client | A controller for a SCIF, either for a developer or a user |
Integration | A third party software or tool that understands the SCIF structure and interacts with all or some portion of it |
Container | A linux container technology that can be installed with a SCIF |
SC | The scientific community, including academics, researchers, and associated engineering and development staff |
ENV | Environment variables in the namespace of SCIF (prefixed with SCIF_ that drive interaction of a user agent (UA) |
UA | A user agent (controller) that interacts with environment variables (ENV) to act as an interface between the user with the scientific filesystem |
Objective
To define an organizational standard for the scientific community, a Scientific Filesystem (SCIF) that will make it easy for the scientific community (SC) to generate predictible, and programmatically accessible scientific apps optimized for encapsulated, reproducible environments.
Goals
The Scientific Filesystem (SCIF) will:
- Enable a UA to write tools and software that work with a predictibly structured SCIF.
- Enable the SC to generate discoverable and predicable SCIF that work easily with integrations
- Define a set of environment variables with reasonable defaults that:
- Can be referenced by other installed applications, ideal for integrations that foster inter-app functionality
- Reliably determine locations for data, apps, and metadata.
- Define a recipe format that maps to a SCIF, and vice versa.
- Provide basic software to generate and manage SCIF.
- A single entrypoint that directs to app entrypoints and environments
- Preview a filesystem from a recipe file
- Generate a filesystem from a recipe file
- Generate a recipe file from a SCIF
- Generate a build specification for a container using a recipe file
- An API for developers to interact and control a SCIF
- In the context of modular, encapsulated container environment, provide reproducible scientific software that can be shared with confidence of discoverability.
Non-Goals
The Scientific Filesystem (SCIF) explicitly does not implement, define, or provide in v0.2:
- Packaging of an Integration of SCIF
- instructions on how to install or compile software for apps, or preference for organization of data files (under
/scif/data
) - Package management for Integrations, Apps, or the entirety of a SCIF
- Guarantee the quality for any specific section of a SCIF recipe generated by the SC. E.g.,
- An install section for an App may not compile on the system, or otherwise have a bug
- Any specific mechanism for workflow management or control of inputs, outputs, or other parameters specific to the software apps.
- A mechanism for lifecycle management of any particular SCIF.
- A mechanism for authentication and authorization or management of file permissions.
- Support or integration with non-linux operating systems.
Solution Overview
This specification defines a filesystem structure and environment variable namespace that interact with functions to produce discoverable software applications. We focus on recommendations for a container technology or other encapsulated environment to implement a SCIF compatible (containerized) filesystem (or integrations thereof) along with controllers for it (client). A compatible client or integration must provide a means to generate and interact with the SCIF filesystem and environment namespace, along with documentation to developers and users.
Architecture
The Scientific Filesytem has a default root on the host to intentionally be separate from standard linux folders, and to not interfere with likely existing folders (e.g., just /data
or /apps
). This is done by using a root $SCIF_BASE
, defaulting to /scif
/scif
/apps
/data
Apps
The base of /scif/apps
is where software modules will live, and can be generated automatically from a scif recipe and parsed to generate one. Clients and integrations that meet this specification to interact with a SCIF can parse these directories programatically without knowing specifics of the software apps installed. The structure of a complete app folder (foo
) looks like the following:
/scif
/apps
/foo
/bin
/lib
/scif
runscript
runscript.help
labels.json
environment.sh
- /scif is the root folder of the entire SCIF
- /scif/apps the apps folder, each subfolder is a single (modular) software app
- /scif/apps/foo an example software app folder, contains any files relevant or needed
- /scif/apps/foo/bin contains binaries for foo that should be on the
$PATH
- /scif/apps/foo/lib contains libraries for foo that should be on the
$LD_LIBRARY_PATH
- /scif/apps/foo/scif foo’s metadata folder for labels, environment, runscript, and help
Recipes
The scif recipe is the primary method by which a user from the SC can define and then generate a SCIF. The recipe consists of sections, where the delineation of a section is determined by starting with a %
and then being followed by a software app name that the section is relevant for.
%section foo
The recipe itself is a text file that must have extension .scif
, and can serve as input to clients and integrations for SCIF.
Sections
%appinstall
corresponds to executing commands within the folder to install the application. The writer of the recipe should expect the commands to be executed in$SCIF_APPROOT
, and thus write final outputs to$SCIF_APPBIN
that is located at$SCIF_APPROOT
/bin%apphelp
is written as a file calledrunscript.help
in the application’s metadata folder, where a client knows where to find it.%apprun
is also written as a file called runscript in the application’s metadata folder, and again looked for when the user asks to run the software.%appstart
is similar to run, but targets the startscript. In layman’s terms, a start would be the entrypoint for a service, and a run the default entrypoint.%applabels
will write alabels.json
in the application’s metadata folder, allowing for application specific labels.%appenv
will write an environment file in the application’s metadata folder (environment.sh
), allowing for definition of application specific environment variables. These variables are sourced when the application is active.%appfiles
a list of files to add from the host (or other location known to the integration or client) to the application root.%apptest
will run tests specific to the application, with present working directory assumed to be the software module’s folder
Specifically, this means that any software that implements the SCIF would do the following procedure to produce a SCIF from a recipe:
- Find any app section alongside a name (e.g.,
%appinstall foo
) as indication of an application command. - Parse the string following the section name (e.g. foo) as the name of the application. and create a folder for the app under the roots /scif/apps and /scif/data in lowercase given that each folder doesn’t already exist.
- If an installation procedure is present (
%appinstall foo
), perform it to install dependencies relative to the app’s install folder. - Create a metadata folder, scif, inside the application folder.
- For those defined, save the environment (
%appenv foo
), help (%apphelp foo
), runscript (%apprun foo
), and labels (%applabels foo
) to the metadata folder. - Generate a “bin” folder for foo, and automatically add it to the
$PATH
when foo is being used. Generate a “lib” folder for libraries, to also be added to the$LD_LIBRARY_PATH
when foo is used. - If the (
%apptest foo
) section is defined, tests are run on the newly installed app to insure it works as expected.
This general list is hashed out in more detail in the following sections.
Data
The base of /scif/data
is structured akin to apps - each installed application has its own folder, and additionally (but not required) subfolders are created for inputs and outputs:
/scif/data
/foo
/input
/output
SCIF does not enforce or state how the container creator should use the data folders, but rather encourages the creator to use the organization so that a user can intutiively know that any input for app foo
might go into /scif/data/foo/input
, general data for foo
might be in /scif/data/foo
, and global data for the entire container might be in /scif/data
. For example, this scientific workflow mounts a host folder at /scif/data
, and then each application creates it’s needed data folders at runtime simply by referencing an environment variable specific to the app.
Environment Namespace
Variable | Default | Definition |
---|---|---|
SCIF_BASE | /scif | the root location for SCIF |
SCIF_DATA | /scif/data | the root location for apps data |
SCIF_APPS | /scif/apps | the root location for installed apps |
SCIF_SHELL | /bin/bash | shell to use for “shell” command |
SCIF_PYSHELL | ipython | interactive python shell for pyshell command |
SCIF_ENTRYPOINT | /bin/bash | the command to run given no runscript or app defined |
SCIF_ENTRYFOLDER | SCIF_BASE | the entry folder to run the entrypoint command |
SCIF_MESSAGELEVEL | INFO | a client level of verbosity. Must be one of CRITICAL , ABORT , ERROR , WARNING , LOG , INFO , QUIET , VERBOSE , DEBUG |
Table 1 During interaction and runtime of SCIF, the following environment variables must be defined.
Active App Environment Namespace
Variable | Default | Definition |
---|---|---|
SCIF_APPNAME | example | the active software app |
SCIF_APPDATA | /scif/data/example | the data root for the active software app |
SCIF_APPROOT | /scif/apps/example | the install root for the active software app |
SCIF_APPBIN | /scif/apps/example/bin | the app bin, which is automatically added to the path when active |
SCIF_APPLIB | /scif/apps/example/lib | the app bin, which is automatically added to the path when active |
SCIF_APPMETA | /scif/apps/example/scif | the metadata folder |
SCIF_APPHELP | /scif/apps/example/scif/runscript.help | a text file with help to print for the user to the terminal |
SCIF_APPRUN | /scif/apps/example/scif/runscript | the commands to run as the app entrypoint |
SCIF_APPSTART | /scif/apps/example/scif/startscript | the start script (if provided) for an app |
SCIF_APPTEST | /scif/apps/example/scif/test | the commands to run to test the app |
SCIF_APPLABELS | /scif/apps/example/scif/labels.json | a key:value json lookup dictionary of labels |
SCIF_APPENV | /scif/apps/example/scif/environment.sh | a shell script to source for the software app environment |
Table 2: During interaction with a specific app (e.g., example
) in the table above, these environment variables must be defined in additional to the global SCIF variables above.
With the convention above, any tool that interacts with SCIF could, for example, know that data is expected to be written to SCIF_APPDATA
, or that a label (e.g., a parameter) can be read from the SCIF_APPLABELS
, or that an application has its binaries at SCIF_APPBIN
. The integration doesn’t need to know that example
is running.
Sleeping App Environment Namespace
Variable | Default | Definition |
---|---|---|
SCIF_APPNAME_sleeper | sleeper | the inactive software app |
SCIF_APPDATA_sleeper | /scif/data/sleeper | the data root for the inactive software app |
SCIF_APPROOT_sleeper | /scif/apps/sleeper | the install root for the active software app |
SCIF_APPBIN_sleeper | /scif/apps/sleeper/bin | the app bin, which is automatically added to the path when active |
SCIF_APPLIB_sleeper | /scif/apps/sleeper/lib | the app bin, which is automatically added to the path when active |
SCIF_APPMETA_sleeper | /scif/apps/sleeper/scif | the metadata folder |
SCIF_APPHELP_sleeper | /scif/apps/sleeper/scif/runscript.help | a text file with help to print for the user to the terminal |
SCIF_APPRUN_sleeper | /scif/apps/sleeper/scif/runscript | the commands to run as the app entrypoint |
SCIF_APPSTART_sleeper | /scif/apps/sleeper/scif/startscript | startscript for the app |
SCIF_APPTEST_sleeper | /scif/apps/sleeper/scif/test | the commands to run to test the app |
SCIF_APPLABELS_sleeper | /scif/apps/sleeper/scif/labels.json | a key:value json lookup dictionary of labels |
SCIF_APPENV_sleeper | /scif/apps/sleeper/scif/environment.sh | a shell script to source for the software app environment |
Table 3 :During interaction with a specific app, the other apps (not active) must also reveal themselves through an equivalent set of environment variables specific to them. In the table below we describe environment variables defined for such an app (e.g., sleeper
). The variables are equivalent except for the appending of the app name at the end:
This strategy ensures that any application built to have applications interact (e.g., a container installed with several steps that internally work together) can easily reference another application within the same container without knowing the exact path. Further, it ensures that entire spaces of environment variables (e.g., all application help text files) can be revealed by way of discovering environment variables that match a particular pattern.
Additional Variables
Additional variables are appropriate for integrations to define, but must begin with SCIF_
to remain in the scientific filesystem namespace. This also ensures that SCIF, when installed alongside other software and environments, can have its environemnt settings isolated by way of filtering down based on a common prefix.
Integration Requirements
The following sections detail requirements for integrations (third party software that uses SCIF) that are distinguished from clients (a controller or generator of a SCIF) because they don’t necessarily need to create a SCIF, but just interact with one.
Initialization
At the initialization step of an integration or client, the following must be true:
- the users level of verbosity is respected
- the global variables
SCIF_DATA
andSCIF_APPS
are found in the environment, and if not found, set to the defaults. These bases should be used to discover installed apps, and their correponding data and metadata files according to the environment variables specified previously. - if the user is calling a function to make changes to the filesystem, a check that the user has permission to write is performed first.
Client Requirements
A client, in that it is a more advanced integration, must also follow the integration requirements previously stated.
Recipe Parsing
A parser of a recipe should be able to read chunks of text and return a data structure that includes a dictionary of app settings, each indexed by the app name, and the next level having sections, each a list of the commands from the file. The ordering that the sections are presented, in terms of apps that are discovered first, should be preserved. For example, if app “foo” has a section for an install that is discovered first, an app “bar” with an install section should be run after. If an app “red” has an environment section that comes before an app “blue” install section, by way of being discovered first in the recipe, the sections for “red” will be processed first, even if the install section for red came after blue’s.
- comments should be removed
- illegal characters or uppercase in app names is not allowed
For example, the following chunk in a text file recipe sregistry.scif
would parse to
#######################################
# Singularity Registry
#######################################
%appenv registry
SREGISTRY_CLIENT=registry
export SREGISTRY_CLIENT
%apprun registry
exec /opt/conda/bin/sregistry "$@"
For example,
{"apps":
{ "registry":
{ 'appenv': ['SREGISTRY_CLIENT=hub', 'export SREGISTRY_CLIENT'],
'appinstall': ['pip install sregistry[registry]'],
'apprun': ['exec sregistry "$@"']}
}
}
}
And then the integration can add other sections or parameters that are needed to higher level indices here. For example, if an %install section is added, it would be parsed to:
{"apps":
{ "registry":
{ 'appenv': ['SREGISTRY_CLIENT=hub', 'export SREGISTRY_CLIENT'],
'appinstall': ['pip install sregistry[registry]'],
'apprun': ['exec sregistry "$@"']}
}
},
"install": { ... }
}
During the parsing process, the following rules must be followed:
- Finding the section
%appinstall
,%apphelp
,%apprun
is indication of an application command. Not shown but also relevant are%applabels
,%appfiles
. and%apptest
.
Creation
The creation step coincides with generation of a SCIF.
- initialization must come first, and the requirements maintained from there.
- the SCIF filesystem must be created from the recipe, with exit on any error and message to the user.
- Given the parsing of the name of a software app (e.g.,
foo
) from a recipe, a folder must be created, in lowercase, under/scif/apps
if it doesn’t exist. Ascif
metadata folder,scif
is generated inside the application. The metadata folder itself could be thought of as another level of scif applications within the application, if interesting or relevant. - A “bin” folder is automatically genereated for
foo
, and will be automatically added to$PATH
whenfoo
is being used. A “lib” folder is also generated, and is added to$LD_LIBRARY_PATH
whenfoo
is used.
Introspection
Any integration or tool must reveal metadata, and SCIF content to its users. This means that:
- a preview of a SCIF recipe shows how different sections of a recipe file map to a filesystem
- an installed SCIF must reveal its applications and the application metadata without any prior knowledge other than running the tool.
- The user must be able to specifically ask to see the following (for a particular or all apps):
- labels
- runscripts
- help files
- environments
- original relevant sections in the recipe
Controller Requirements
Any integration or tool that is considered a client and implements SCIF must provide a controller that:
- interacts with a SCIF or a recipe describing a SCIF
- allows the user to run the client or integration in context of the app (or not)
- when active, the apps
bin
is added to the$PATH
, and itslib
is added to the$LD_LIBRARY_PATH
. - has a reasonable fall back to a default (e.g., executing a container for an app without a runscript might launch an interactive shell)
- optional an ability to validate the correctness of a recipe or structure of a SCIF
Commands
The minimal set of functions for the controlled must support the following higher level commands. We use the controller name ctrl
in the examples below, and suggested action names:
- listing the UA must, with one command (e.g., “apps” or “list” or “apps.list” are recommended but not enforced) show all the software apps installed in a SCIF. E.g.,
ctrl apps
orctrl apps <container>
. - run the UA must provide a means for the user to execute the entrypoint for a chosen app (the
apprun
section from the recipe), e.g.,ctrl run example
. For a container, running the container in context of the app should direct to some central entrypoint that then can forward to the targeted app. If no specific apps are provided, the target should end in a reasonable default. - start akin to “run,” the UA can optionally provide a means for the user to execute the start script for a chosen app (the
appstart
section from the recipe), e.g.,ctrl start example
. Since start is optional, the command can be provided from a subcommand or not provided at all. - test if an app has an
apptest
section defined, the UA must run the tests for the targeted app when the user specifies a test command. - inspect: if an app has any metadata (labels, environment, help) coinciding with the same sections in the recipe prefixed with
app
then the integration must provide some kind of inspect command. - shell: if applicable (e.g., containers) provide an entrypoint to an interactive shell where software app of choice is active (or none are active).
- help: if an
apphelp
section is defined, a command must be exposed to show help for a targeted app. If no help section is provided, the software should alert the user and show the files provided for inspection or some other reasonable default. - labels: if an
applabels
section is defined, a command must be exposed to show labels for a targeted app. - environment: if an
appenv
section is defined, a command must be exposed to show labels for a targeted app.