Date: 2018-01-08
Scientific Filesystem
Here we present the Scientific Filesystem (SCIF), an organizational format that supports exposure of executables and metadata for discoverability. The format includes a known filesystem structure, a definition for a set of environment variables describing it, and functions for generation of the variables and interaction with the libraries, metadata, and executables located within. Some quick resources:
- quick start or really quick start tutorials to understand the format.
- Scientific Filesystem Container Builder a
build
–>test
–>deploy
template to quickly turn your analysis scripts into a reproducible container.
How does scif related to containers?
Although scif is not exclusively for containers, in that a container can provide an encapsulated, reproducible environment, the scientific filesystem works optimally when contained. Containers traditionally have one entrypoint, one environment context, and one set of labels to describe it. A container created with a Scientific Filesystem can expose multiple entry points, each that includes its own environment, metadata, installation steps, tests, files, and a primary executable script. SCIF thus brings internal modularity and programatic accessibility to encapsulated, reproducible environments.
What will I learn reading this?
We will start by reviewing the background and rationale for a scientific organizational format, and how SCIF achieves the goals of modularity, transparency, and consistency. We then review the organizational structure of the standard, and the different levels of internal modules that it affords. For this work, we provide several tutorials to demonstrate using the scientific filesystem with Docker and Singularity, and additionally have implemented and released the organizational format as a native integration with the Singularity software. Finally, we discuss use cases for SCIF in context of containers, including how SCIF can be used to evaluate software, provide metrics, serve scientific workflows, and execute a primary function under different contexts. To encourage collaboration and sharing of apps, we have developed an open source, version controlled, tested, and programmatically accessible web infrastructure at https://sci-f.github.io/apps. For developers, we provide a getting started guide for integration of SCIF into other container technologies or contexts. The ease of using SCIF to develop scientific containers offers promise for scientists to easily generate self-documenting containers that are programmatically parseable, exposing software and associated metadata, environments, and files to be quickly found and used.
Getting Started
- Tutorials: are a good place to start if you are a user or developer. These pages will show you how you generate and interact with a scientific filesystem. Or jump right in to a quick start or really quick start.
- Goals: here we review how SCIF allows for internal modularity and consistency, transparency, and reproducible practices.
- Examples: Whether you are a system admin, a developer, or a research scientist, why would you want to use SCIF anyway?
- Specification: reviews the current specification for SCIF.
Resources
- Community: community resources including APIs, version control and testing, and open source forums for tracking issues and discussion related to SCIF and scientific filesystem apps.
- Python Client: The SCIF client is the quickest start to creating and using scientific filesystems. Choose this client if you want interactivity, or are more comfortable with Python.
- GoLang Client: is the equivalent GoLang library, intended for integration with other GoLang libraries.
- Builder the Scientific Filesystem Container Builder
We have provided several examples and tutorials for getting started with SCIF. If you have a workflow or container that you’d like to see added, please reach out. If you would like to see other ways to contribute, here are some suggestions. This work will remain open for contributions, and early contributions will be represented in an official submission.
Citation
If SCIF has been useful to you, please cite our work on GigaScience!
Vanessa Sochat; The Scientific Filesystem (SCIF), GigaScience, giy023,
https://doi.org/10.1093/gigascience/giy023
Philosophy
The philosophy behind the scientific filesystem is that we are empowered to better use software when clear entrypoints are defined, predictible, and discoverable.
- discoverability means that we have knowledge that software exists
- predictibility is a means to interact with software without having prior knowledge to its creation
At it’s core, the scientific filesystem is a simple description of how to organize software and metadata on a filesystem for discoverability. This description encompasses a filesystem structure to ensure that scientific software is distinct from standard software on the host, and is interacted with by way of a set of environment variables and functions to expose the structure to the user. In that containers provide encapsulated, reproducible environments, SCIF works optimally when installed and used within a container. With SCIF you can:
- write recipes that define environments, metadata, executables, and dependencies for a grouping of scientific software
- install the recipe into a container
- programatically and easily interact with the scientific software
- share your containers or recipes with confidence of discoverability
A container that is found to have a SCIF filesystem, without any additional definition, has a known method of interaction without any special knowledge. We will discuss this in more detail in the following sections.
What SCIF is not:
SCIF handles the step that comes before any kind of workflow manager is used, and is best used alongside some kind of container technology. It works with, and does not replace either of those things.
a workflow manager The scientific filesystem does not handle the details of connecting inputs and outputs, or typical functions we associate with workflow managers. SCIF, in that it gives structure to executables and content for these managers, adds to this ecosystem, but is agnostic to the particulars of user interaction with any particular software.
a dependency manager SCIF is agnostic to where you are using it. For reproducible software modules, especially those that require dependencies on top of the system, it is strongly recommended to use a container technology, and we provide numerous tutorials and examples for doing this. For those that have preference for working without a container, SCIF works just as easily on a host machine as it does a Docker or Singularity container, and in fact, many of the same functions can be run in a preview mode to assess functionality.
Goals
For the official description of goals, see the specification. In the following sections, we hope to show that SCIF is useful because it allows for:
- flexible, internal modularity where the definition of modularity is entirely based on the needs of the creator and user, and the resulting container reflects that.
- reproducible practices by way of providing portable environments with modular internal contents that are easily discovered.
While SCi-F is not a workflow manager, it follows naturally that the creator of a SCIF app might use these internal variables to have modules internally talk to one another. The user and creator do not need to know the structural specifics of the standard, but only how to reference them.
Modules
Modularity can be understood as the level of dimensionality that a user is instructed to operate, and for the purposes of this discussion we will suggest three general levels.
- Node For those familiar with container technology, it is commonly the case that an entire container is considered a module. An example is a container that performs the task of variant calling. If the container itself is considered the module, the user would expect to provide raw data inputs, and receive final results as an output. The container acts as a node that plugs into higher level orchestration tools. The node representation is ideal if the container is expected to plug into a workflow manager and perform one task.
- Internal: A second common scenario might be a single container that holds executables to perform different steps of a pipeline, perhaps so that the researcher can use the same container to run multiple steps, or perform any number of steps in parallel. This container would come with multiple internal modules, each performing a series of commands for one step in the pipeline (e.g., the step “mapping” uses internal commands from software
bwa
andsamtools
). The user doesn’t need to know the specifics of the steps, but how to call them. We call this level “internal modules” because without any formal structure for the contents of containers, they are hidden, internal executables that must be found or described manually. - Development: Containers can also serve modules that are represented at the ideal level for development. For this example, instead of providing the container as a node, or actions inside like “mapping”, the smallest units of software are exposed, such as the executables
bwa
andsamtools
. It would be likely that a researcher developing a scientific pipeline would find this useful.
Given the different needs briefly explained above, it is clear that there is no correct level of dimensionality to define a module.
The definition of modularity is entirely based on the needs of the creator and user.
If we discover a container after creation, it cannot be clear without suitable documentation what level is represented, or how to interact with the container. What is needed is an ability for the creator of a container to implicitly define this level of usage simply by way of creating the container. SCIF allows us to do this. We can define modules on the levels of single files, or groups of software to perform a task. The metadata and organization of our preferences is automatically generated to create a complete, and programmatically understandable software package or scientific analysis.
Reproducible Practices
We have just discussed why internal modularity is important for container interaction, and this is also the case that it’s useful for reproducibility. It is important to distinguish the entire container as a reproducible product, and different software modules inside of it that depend on being served through the container to ensure reproducibility. While the container itself is portable, and designed to contain all dependencies to support reproducibility, the SCi-F module in and of itself is not guaranteed to be. For example, a user might define a module only with an %apprun
section, implying that the folder only contains a runscript to execute. The user may have chosen to install dependencies for this script globally in the container, in the %post
section, because perhaps they are shared across multiple modules. Under these conditions, if another user expected to add the module to a different build recipe, the dependencies from %post
would be needed too. The host operating system also needs to be taken into consideration. A module with dependencies installed from the package manager “yum” would not move seamlessly into a debian base. However, appropriate checks and balances can be implemented into the process of moving applications:
- For applications that must be portable outside of their initial container, users would be encouraged to include all dependency installs within the
%appinstall
section. If they were already installed during%post
, the package would be found and skipped. - Installing an application into a container would check for OS compatibility. This can be done automatically by storing information about the base OS with each application as a label. To encourage this practice, we have added a test and requirements of specifying one or more operating systems for any module contributed at https://sci-f.github.io/apps. With these checks, we can have some confidence that the recipes for generating the apps are maximally portable.
Modular internal contents combined with reproducible portable environments via Singularity containers is a starting point for practicing good science.
Getting Started
We provide several tutorials for getting started with the scientific filesystem. If you haven’t read about the goals or specification for the standard (and are interested) you might do that first.
Quick Starts
If you want to quickly familarize with a pre-built example scientific filesystem, check out our container quickstart or really quick start to pull a pre-bulit Docker and/or Singularity image and start using it! Or if you are more visual, just watch an asciinema:
Preview and Install
Here we will start with a scientific filesystem recipe, preview what it produces, and then install a scientific filesystem. This is likely the most common use case. preview and install.
Running Commands
The scientific filesystem comes with the expected commands to run, exec, inspect, start a shell, an interactive python shell, and a new syntax to easily distinguish an environment variable (introducing the beautiful [e]
to say that [e]MYVAR
= $MYVAR
). Want to give it a spin? Read more about how to run commands.
This document reviews the current specification for the Scientific Filesystem. For the actual specification documents, including older versions, see the spec folder or the full spec. As stated in the introduction, the scientific filesystem is optimized for provide tools to generate predicible and discoverable scientific containers. The description here describes how to organize software and metadata toward this goal, and encompasses two components:
Read the Specification Document
1. Filesystem Structure
Filesystem structure refers to the structure and organization of the filesystem on a host operating system. Importantly, a scientific filesystem must live alongside a standard operating system, but maintain the core of content outside of it. The filesystem can exist in absence of any environment variables or functions to control it, and in that it is organized predictibly, the static folder and file organization has embedded with it knowledge about its usage. We can derive a filesystem structure from a SCIF recipe and in turn, we can derive a recipe from a filesystem.
2. Environment Variables
The environment is the means by which a scientific filesystem is interacted with. Upon installation or interaction, these variables are defined to drive further operations. For example, when you create a scif
container, the variables used to generate the filesystem are maintained with the container for subsequent uses. If the user does not have preference, a set of meaningful defaults is used, making general use of scif quick and easy.
3. Software and Tools
The Scientific Filesystem provides software (a command line utility called scif
) that can be installed into containers, and then generate and serve as a controller for the entire scientific filesystem when the container is built. See our examples for how this works.
While there are many conventions and tools for organization and control of environments and software, the scientific filesystem is unique in its ease of use and optimized integration with reproducible container technology, and focus on scientific containers.
Structure
We will start with a review of traditional file organization on a linux machine, and explain the rationale for the organization of SCIF.
Traditional File Organization
File organization is likely to vary a bit based on the host OS, but arguably most Linux flavor operating systems can said to be similar to the Filesystem Hierarchy Standard (FHS). For this discussion, we will disregard the inclusion of package managers, symbolic links, and custom structures, and focus on the core of FHS. We will discuss these locations in the context of how they do (or should) relate to a scientific container. It was an assessment of this current internal standard that led to the original development of SCIF.
Do Not Touch
Arguably, the following folders should not be touched by scientific software:
/boot
: boot loader, kernel files/bin
: system-wide command binaries (essential for OS)/etc
: host-wide configuration files/lib
: again, system level libraries/root
: root’s home. Unless you are using Docker, putting things here leads to trouble./sbin
: system specific binaries/sys
: system, devices, kernel features
While these locations likely have libraries and functions needed by the host to support software, it should not be the case that a scientist installs his or her software under any of these locations. It would not be easy or intuitive to find or untangle it from what is already provided by the host.
Variable and Working Locations
The following locations are considered working directories in that they hold variables defined at runtime, or intermediate files that are expected to be purged at some point:
/run
: run time variables, should only be used for that, during running of programs./tmp
: temporary location for users and programs to dump things./home
: can be considered the user’s working space. Singularity mounts by default, so nothing would be valued there. The same is true for..
For example, in the context of a container, it is common practice (at least in the case of Singularity) to mount the user’s /home
. Thus, if a scientist installed his or her software there, the user would not be able to see it unless this default was changed. For these reasons, it is not advisable to assume stability in putting software in these locations.
Connections
Connections for containers are devices and mount points. A container will arguably always need to be able to support mount points that might be necessary from its host, so it would be important for a scientific container to not put valuables in these locations.
/dev
: essential devices/mnt
: temporary mounts./srv
: is for “site specific data” served by the system. This might be a logical mount for cluster resources./proc
: connections between processes and resources and hardware information
SCIF File Organization
The Standard Container Integration Format defines a root base (/scif
) that can be known and consistently mounted across research clusters. The location was chosen to be independent of any locations on traditional linux filesystems for the sole purpose of avoiding conflicts. Using this namespace, the SCIF has a set of defaults that are paired with an environment namespace to make it easy to find content. For example, if you were to discover an app named hello-world
you would have confidence about it’s location under /scif/apps/hello-world
($SCIF_APPROOT
).
Along with these locations and environment variables, SCIF is entirely a set of rules about how a container software installs, organizes, and exposes software modules. For the specifics of SCIF we again direct the reader to the specification. Here we will wontinue with discussion of its development and rationale, starting with a review of some basic background about Linux Filesystems.
Apps
Software modules in the context of SCIF are called “apps.” and the base of /scif/apps
is where these apps will live. To read more about apps, read the Specification Document.
Data
The base of /scif/data
is structured akin to apps - each installed application has its own folder, and additionally a subfolder is created for inputs and outputs:
/scif/data
/foo
/input
/output
To read more about data, read the Specification Document.
Example Use Cases
SCIF is powerful in that it supports multiple general use cases for scientific and systems evaluation and high level introspection. These use cases broadly fall in the areas of providing modular software, systems and metric evaluation, and guided collaboration to answer a scientific question. For all use cases, the recommendation is to use a contained environment (e.g., Singularity) for maximum reproducibility of the work.
Quick Examples
You might find SCIF useful if you:
A Research Scientist
- want to package multiple environments or software modules to publish alongside a paper. A SCIF app might coincide with a particular step in your pipeline, and the entire analysis is run with a few calls to scientific filesystem entrypoints. You can share the scientific filesystem recipe for others to create, or (better) share a reproducible container built with it (that you used to conduct your analyses).
- want to provide modular tools for your lab or other scientists. A SCIF recipe or container with a filesystem could serve different environments for tools for your domain of interest.
- want to easily expose different interactive environments. For example, SCIF could be used to expose the same python virtual environment, but exporting different variables to the environment to determine the machine learning backend to use.
An Administrator
- want to provide users with small apps that perform a function (for a host or container). We can provide this example in the context of tests. A SCIF app could be a test that perhaps has its own environment and executables, and is easily accessed by your users by interacting with the filesystem.
- want to to assess metrics. For example, if any SCIF app can discover all the other installed apps (that perhaps are domain specific functions or processing steps) we can assess each for a metric of interst. Since the environments, raw files, and runscripts are available, they can be discovered and used as desired. A machine learning “metric” app might parse the others for features, run them to produce an output, and then parse the output.
- want to easily add consistent applications to user (or provided) containers. In that a SCIF app can be added in a modular fashion to a container (or other) recipe, during a build time an administrator can easily add one or more helper modules to user recipes. An administrator can also offer apps for the user to select from to install.
Developers
- akin to using a module system, you could use SCIF to provide different entrypoints for software (coinciding with version, or even the same software with a different environment variable exported to determine runtime behavior.) A good example of this is using common machine learning libraries with different backends (e.g., keras, tensorflow, torch). An entrypoint
keras-python2
would run python with an environment variable triggering using keras as a backend,keras-python3
would do the same with python3, and thenkeras-python
would target the most recent. - you want to implement (functionally) the “same thing” in different ways, and assess differences. The simplest example is to imagine recording runtime metrics for a console print of “Hello World” in multiple langauges. Each variation or implementation is a SCIF app, and the common base is the host or container.
If you have more examples, please add them!.
Community
To encourage sharing and distribution of useful apps, we have developed an online interface for easily exploring and sharing SCIF apps, and generating recipes using the apps, available at https://sci-f.github.io/apps.
Infrastructure Overview
The interface is served from a Github repository that renders static template files into a complete website that includes search across all content, exploration by tag (e.g., language or operating system), and instruction by way of reading examples and tutorials. Programmatic access to all apps is provided with a RESTful API for all apps or for a single one, as is an feed for interested users to be notified when new content is added. The interface also includes a recipe generator that allows a user to browse the site, save apps of interest in the browser’s local storage, and then combine them in a Singularity build file that can be downloaded in a compressed archive that includes instructions and any associated, required files for the app.
Contributing
Importantly, as the infrastructure is served from a Github repository, contributing does not require any expertise with web development or related technologies. The user can simply use Github to fork the repo, add a text file to the _apps
folder, and submit a pull request (PR) to evaluate the contribution. The text file itself has a header section that contains bulletpointed lists of metadata like name, tags, and files, and the remainder of the file is the Singularity sections for the app (e.g., %apprun hello-world
). When the PR is approved, the contribution will automatically render into all areas of the site. If an app includes associated files like scripts or configuration, this data is also easily added into a folder named equivalently to the file, alongside it (e.g., _apps/hello-world/hello-world-bash.md
would have associated files in _apps/hello-world/hello-world-bash
).
Importantly, by way of using version control, all changes and contributions are tracked and credit allocated.
Testing
Github also allows for complete testing of all contributions, and the repository is set up with a continuous integration (CI, which means testing) service called CircleCI that checks the following:
- The file name for the app corresponds with the app’s name declared in the file
- The folder path under
_apps
also corresponds to the app’s file name. For example, an app located at_apps/hello-world/bash/
must start withhello-world-bash
. Matching app names to the folder structure ensures uniqueness of the names within the repository. - The user has not provided any empty keys or values in the header section.
- Each declared file in the header has been provided in the repo
- The app minimally has a tag for one operating system, to help determine compatibility.
- The header date is in valid format to be rendered correctly
- Fields allowed in the header do not go beyond “author,” “title,” “date,” “files,” and “tags.
- Required fields (“author,” “title,” “date,” and “tags” are present
Any contribution that does not meet these requirements will get feedback during the PR, and the contributor can adjust the file to address any issues. As soon as the content is merged into the master branch, it is immediately live on the site.
Contribute to SCIF
There are so many ways to contribute! Here are just a few:
- Contribute to the Specification meaning these pages!
- Contribute to Client is the client that drives the examples discussed here.
- Find useful examples and apps for your Singularity containers.
- Contribute to the implementation in the Singularity software.
- Contribute an App for others to use.
- Ask a question, anything on your mind.
Future Work
SCIF is exciting because it makes basic scientific application development (optimized in containers) and usage easier. The user can immediately inspect and see software, and how to use it. The user can install additional software, copy from one container to another, or view metadata and help documentation. The developer is provided guidance for how and where to install and configure software, but complete freedom with regard to the software itself. The minimum requirements for any package are a name for its folder, and then optionally a runscript and help document for the user. In addition to these basic examples, we provide other future use cases that would be possible with the Scientific Filesystem.
Mapping of container landscape
Given separation of the software from the host, we can more easily derive features that compare software modules. These features can be used with standard unsupervised clustering to better understand how groups of software are used together. We can further apply different labels like domains and understand what modules are shared (or not shared) between scientific domains. We can find opportunity by discovering gaps, that perhaps a software module isn’t used for a particular domain (and it might be).
Artificial Intelligence (AI) Generated Containers
Given some functional goal, and given a set of containers with measurable features to achieving it, we can (either by brute force or more elegantly) programmatically generate and test containers toward some metric. The landscape of containers can easily be pruned in that the best containers for specific use cases can be easily determined automatically.
Appendix
Discussion
This discussion would not be complete without a mention for external modules or dependencies that are required by the software. For example, pip is a package manager that installs to some python base. Two equivalent python installations with different submodules are, by definition, different. There are two possible choices to take, and we leave this choice up to the generator of the container.
- In that a python module is likely a shared dependency, or different software modules under
apps
all use python, the user could choose to install shared dependencies to a system python. In the case of conflicting versions, the user would either provide the software in entirely different containers, or install (as would be required regardless of SCIF) different python environments per each software module. - The user might also choose to install python from a package resource such as miniconda, miniconda, or similar. Given this task, the miniconda (or similar) installation would be considered equivalent to any other software installed to apps. As the developer would do now, the folder /scif/apps/miniconda3 would need to be installed first, and then following commands to use it directed to
/scif/apps/miniconda3/bin/python
. If the user wanted this python to be consistently on the path, across modules, it should be added to the%environment
section.
In practice, we have found that global installs tend to be larger, well maintained libraries (e.g., libraries installed with apt-get
or package managers like pip
) and having them represented in the %post
section, to be shared among apps, helps with any kind of analysis that wants to separate what might be considered the general container “base” against the different custom software installed.
We do not enforce using SCIF for Singularity images or any other container. It’s creation and discussion is implemented and provided to only help scientists more easily create reproducible, transparent containers.
Conclusion
The Scientific Filesystem is advantageous in that the container creator can embed his or her work with implied metadata about software and container contents. SCIF also makes it easier to package different run scripts with the container, and expose them easily to the user. However, this does not mean that the standard approach of using a container as a general toolbox and distributing it with a series of external callers is bad or wrong. The choice to use (or not use) SCIF apps is largely dependent on the goals of the creator, and the intended users.
List of (possibly) related standards,formats and initiatives
- FHS: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard
- OCI: https://github.com/opencontainers/image-spec/blob/master/spec.md
- CWL: http://commonwl.org
- FAIR: https://www.force11.org/group/fairgroup/fairprinciples
- (Opt.) OpenStand: https://open-stand.org
- https://reproducible-builds.org/
- DASPOS: https://daspos.crc.nd.edu
- TANGO: http://tango-project.eu/
Additional reading material
- https://indico.esrf.fr/indico/event/6/session/2/contribution/3/material/slides/
- https://indico.cern.ch/event/567550/contributions/2656689/
- https://github.com/TANGO-Project/alde#application-lifecycle-deployment-engine-alde
- https://figshare.com/articles/Orchestration_and_Workflows_in_eScience_Problems_Standards_and_Solutions/4746931
Scientific Filesystem (SCIF)
- Version 1.0: Version 1.0.0 of the specification, as released with the initial manuscript about the Scientific Filesystem.
Here we will start with a scientific filesystem recipe, preview what it produces, and then install a scientific filesystem. This is likely the most common use case, and we will do the following:
- Create a SCIF recipe
- Preview recipe install on the host
- Install SCIF in container (Docker or Singularity) using Recipe
- Export or reverse engineer recipe from SCIF
In the above, you will see that we will go full circle! Using SCIF, we are able to start with a text description of a set of applications (the recipe), produce a SCIF, and then reverse engineer the recipe from the SCIF.
+--------------------+ +--------------------+ +--------------------+
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| Recipe +--------> Preview +-------> Scientific |
| | | then | | Filesystem |
| | | Install | | |
| | | | | |
| | | | | |
+----------^---------+ +--------------------+ +----------+---------+
| |
| |
| |
+----------------------------------------------------------+
and once we have a SCIF, even if we don’t have the knowledge of the creator that wrote its recipe, because it exposes commands and metadata for discoverability, we can easily, programmatically interact with it without knowing the details:
+--------------------+ +-------------+
| +--> Run |
| | +-------------+
| | +-------------+
| +--> Inspect |
| Scientific | +-------------+
| Filesystem | +-------------+
| +--> Shell |
| | +-------------+
| | +-------------+
| +--> Exec |
+--------------------+ +-------------+
You will recognize many of these functions map to those you are familiar with for interacting with containers, and in some respect they will feel very similar. The difference is that we are interacting directly with the SCIF in the container. If you installed the SCIF in another container (perhaps with an different external interface for interaction) you could have confidence that given that the container has SCIF installed at the entrypoint, your interaction will not be very different. We will discuss this more later in the next tutorial to run SCIF. For now, let’s take a look at writing, previewing, and installing a recipe to generate a SCIF.
Writing a SCIF recipe
Let’s first write a recipe, a text file called hello-world.scif, to produce different variations of hello world (with an echo, and with executing of a script).
%appenv hello-world-echo
THEBESTAPP=$SCIF_APPNAME
export THEBESTAPP
%apprun hello-world-echo
echo "The best app is $THEBESTAPP"
%appinstall hello-world-script
echo "echo 'Hello World!'" >> bin/hello-world.sh
chmod u+x bin/hello-world.sh
%appenv hello-world-script
THEBESTAPP=$SCIF_APPNAME
export THEBESTAPP
%apprun hello-world-script
/bin/bash hello-world.sh
Notice that I’m using three sections,
appinstall
for any install routine particular for the appapprun
for the entrypoint to the app. This could be executing of a script, or a general command.appenv
for any environment variables speific to the app.
and you might not even need that many! An app can just be an environment, for example, or a single command to run. If you are interested in the different sections allowed for the specification, or the many flexible ways to generate an app, read our recipes guide to learn more.
Preview the recipe
I next would want to preview the recipe. What changes would be made on a host, and where? I feel comfortable doing this on my host because it isn’t actually going to make any changes. After I have installed scif
(pip install scif
) I can do this from the command line:
[base] /scif
[apps] /scif/apps
[data] /scif/data
[root] /scif/apps/hello-world-echo
[lib] /scif/apps/hello-world-echo/lib
[bin] /scif/apps/hello-world-echo/bin
[data] /scif/data/hello-world-echo
+ apprun hello-world-echo
/scif/apps/hello-world-echo/scif/runscript
/scif/apps/hello-world-echo/scif/runscript.help
echo "The best app is $THEBESTAPP"
+ appenv hello-world-echo
/scif/apps/hello-world-echo/scif/environment.sh
THEBESTAPP $SCIF_APPNAME
+ apprecipe hello-world-echo
/scif/apps/hello-world-echo/scif/hello-world-echo.scif
[root] /scif/apps/hello-world-script
[lib] /scif/apps/hello-world-script/lib
[bin] /scif/apps/hello-world-script/bin
[data] /scif/data/hello-world-script
+ apprun hello-world-script
/scif/apps/hello-world-script/scif/runscript
/scif/apps/hello-world-script/scif/runscript.help
/bin/bash hello-world.sh
+ appenv hello-world-script
/scif/apps/hello-world-script/scif/environment.sh
THEBESTAPP $SCIF_APPNAME
+ appinstall hello-world-script
echo "echo 'Hello World!'" >> $SCIF_APPBIN/hello-world.sh
chmod u+x $SCIF_APPBIN/hello-world.sh
+ apprecipe hello-world-script
/scif/apps/hello-world-script/scif/hello-world-script.scif
It’s pretty straight forward - each section pertains to an app, and first shows the root, lib, bin, and data folders for the app, and then the files that will be produced on install. The very top section shows the global changes (e.g., the global scif folders for data and apps. I can also run this command to just preview one app from the recipe:
$ scif preview hello-world.scif hello-world-echo
[base] /scif
[apps] /scif/apps
[data] /scif/data
[root] /scif/apps/hello-world-echo
[lib] /scif/apps/hello-world-echo/lib
[bin] /scif/apps/hello-world-echo/bin
[data] /scif/data/hello-world-echo
+ apprun hello-world-echo
/scif/apps/hello-world-echo/scif/runscript
/scif/apps/hello-world-echo/scif/runscript.help
echo "The best app is $THEBESTAPP"
+ appenv hello-world-echo
/scif/apps/hello-world-echo/scif/environment.sh
THEBESTAPP $SCIF_APPNAME
+ apprecipe hello-world-echo
/scif/apps/hello-world-echo/scif/hello-world-echo.scif
If you choose an app that doesn’t exist, it will tell you that.
[base] /scif
[apps] /scif/apps
[data] /scif/data
ERROR Cannot find app hello-world in config.
We can also do this interactively! scif comes with a development shell command, pyshell
, that will either let you interact with a recipe, or a filesystem. The general command works as follows:
SCIF Python (pyshell) Options
example | use case |
---|---|
scif pyshell [recipe] | interact with a recipe, no changes to the filesystem (akin to preview) |
scif pyshell | interact with a SCIF (no SCIF app active) |
scif pyshell [app] | interact with a SCIF with an app active. |
scif pyshell [recipe] [app] | interact with a recipe, still no changes, but load context of an app |
There also is a shell
command for the equivalent interaction, however we would need to install a SCIF first:
scif shell
WARNING /scif is not detected as a recipe or base.
Well let’s install it then! But first, let’s interactively explore the recipe, hello-world.scif
with the python shell (pyshell)
$ scif pyshell hello-world.scif
[scif] /scif hello-world-echo | hello-world-script
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
We can look at the ScifRecipe object, and then client instantiated from it:
In [1]: ScifRecipe
Out[1]: scif.main.base.ScifRecipe
In [2]: client
Out[2]: [scif]
and then produce the same output from the preview above (the recipe is already loaded, so we don’t need to supply it)
client.preview()
We can list all apps
In [4]: client.apps()
Out[4]: ['hello-world-echo', 'hello-world-script']
or look at the specifics for a particular app:
In [5]: client.app('hello-world-script')
Out[5]:
OrderedDict([('appinstall',
['echo "echo \'Hello World!\'" >> $SCIF_APPBIN/hello-world.sh',
'chmod u+x $SCIF_APPBIN/hello-world.sh']),
('appenv', ['THEBESTAPP $SCIF_APPNAME']),
('apprun', ['/bin/bash hello-world.sh'])])
Notice that we have an ordered dict, and the reason is so that the app lookup respects the order that you write the sections in the recipe file. Finally, we can look at all environment variables:
client.environment
{'SCIF_APPBIN_hello_world_echo': '/scif/apps/hello-world-echo/bin',
'SCIF_APPBIN_hello_world_script': '/scif/apps/hello-world-script/bin',
'SCIF_APPDATA_hello_world_echo': '/scif/data/hello-world-echo',
'SCIF_APPDATA_hello_world_script': '/scif/data/hello-world-script',
'SCIF_APPENV_hello_world_echo': '/scif/apps/hello-world-echo/scif/environment.sh',
'SCIF_APPENV_hello_world_script': '/scif/apps/hello-world-script/scif/environment.sh',
'SCIF_APPHELP_hello_world_echo': '/scif/apps/hello-world-echo/scif/runscript.help',
'SCIF_APPHELP_hello_world_script': '/scif/apps/hello-world-script/scif/runscript.help',
'SCIF_APPLABELS_hello_world_echo': '/scif/apps/hello-world-echo/scif/labels.json',
'SCIF_APPLABELS_hello_world_script': '/scif/apps/hello-world-script/scif/labels.json',
'SCIF_APPLIB_hello_world_echo': '/scif/apps/hello-world-echo/lib',
'SCIF_APPLIB_hello_world_script': '/scif/apps/hello-world-script/lib',
'SCIF_APPMETA_hello_world_echo': '/scif/apps/hello-world-echo/scif',
'SCIF_APPMETA_hello_world_script': '/scif/apps/hello-world-script/scif',
'SCIF_APPNAME_hello_world_echo': 'hello-world-echo',
'SCIF_APPNAME_hello_world_script': 'hello-world-script',
'SCIF_APPRECIPE_hello_world_echo': '/scif/apps/hello-world-echo/scif/hello-world-echo.scif',
'SCIF_APPRECIPE_hello_world_script': '/scif/apps/hello-world-script/scif/hello-world-script.scif',
'SCIF_APPROOT_hello_world_echo': '/scif/apps/hello-world-echo',
'SCIF_APPROOT_hello_world_script': '/scif/apps/hello-world-script',
'SCIF_APPRUN_hello_world_echo': '/scif/apps/hello-world-echo/scif/runscript',
'SCIF_APPRUN_hello_world_script': '/scif/apps/hello-world-script/scif/runscript',
'SCIF_APPS': '/scif/apps',
'SCIF_DATA': '/scif/data'}
You will notice of an absence of variables that aren’t relative to one of our apps (e.g., we don’t see SCIF_APPNAME
without hello_world_echo
or hello_world_script
), and this is because we are running the shell that doesn’t have the context of a particular SCIF app. If you wanted to activate an app, which comes down to exporting its environment, just ask for it:
client.activate('hello-world-script')
You can also do this directly from the terminal by giving pyshell the app name:
$ scif pyshell hello-world.scif hello-world-echo
[scif] /scif hello-world-echo | hello-world-script
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: client._active
Out[1]: 'hello-world-echo'
In both cases, when we look at the environment we see what we saw before, but we also see another set of environment variables with general names (e.g. SCIF_APPBIN
) that are pointing to the active app, which in this case, is hello-world-echo
$ client.environment
{'SCIF_APPBIN': '/scif/apps/hello-world-echo/bin',
'SCIF_APPDATA': '/scif/data/hello-world-echo',
'SCIF_APPENV': '/scif/apps/hello-world-echo/scif/environment.sh',
'SCIF_APPHELP': '/scif/apps/hello-world-echo/scif/runscript.help',
'SCIF_APPLABELS': '/scif/apps/hello-world-echo/scif/labels.json',
'SCIF_APPLIB': '/scif/apps/hello-world-echo/lib',
'SCIF_APPMETA': '/scif/apps/hello-world-echo/scif',
'SCIF_APPNAME': 'hello-world-echo',
'SCIF_APPRECIPE': '/scif/apps/hello-world-echo/scif/hello-world-echo.scif',
'SCIF_APPROOT': '/scif/apps/hello-world-echo',
'SCIF_APPRUN': '/scif/apps/hello-world-echo/scif/runscript',
'SCIF_APPS': '/scif/apps',
'SCIF_DATA': '/scif/data'
...
}
The reason that we have both is, you can imagine a case where you want to run one SCIF app, and while it’s running, have it interact in some way with another known app. Given the information about other apps is exposed in the environment, this is possible to do! You can reference an app environment variable in another’s runscript. Here are some fun examples:
# have running foo call bar's runscript
%apprun foo
/bin/bash $SCIF_APPRUN_bar
# source bar's environment first, then run it
%apprun foo
source $SCIF_APPENV_bar
/bin/bash $SCIF_APPRUN_bar
Having these general environment variables for the active app also ensures that a tool created to work with SCIF applications knows how to find the active application. For example, let’s say I create an app that anyone can install into a container, and it only serves to loop through all the other apps and parse the text content of the runscript for some kind of machine learning algorithm? I’d be able to discovery them all without much work, and without knowing anything about the particular apps installed:
# This gives me the runscript locations
env | grep SCIF_APPRUN
SCIF_APPRUN_hello_world_script=/scif/apps/hello-world-script/scif/runscript
SCIF_APPRUN_hello_world_echo=/scif/apps/hello-world-echo/scif/runscript
I could also blindly parse through help files, environments, labels, or any other content that is programatically accessible! You don’t need to know anything in advance beyond these SCIF variables to interact with scientific filesystems. We will go into more command examples as we progress in these tutorials, let’s continue now and talk about how to build a SCIF into a container, which is the recommended approach for reproducibility.
Install SCIF in Docker using Recipe
Once we have previewed, we likely want to install. I’m not a fan of making changes on my host, so I’m going to test doing this inside of a Docker image. The image we will be building in the example below is provided on docker hub:
docker pull vanessa/scif:hw
First we will do it interactively, and then add a few lines to a recipe to do it properly. Here is the interactive version. We will use an miniconda image so we know that it comes with pip to install scif
. Note that I’m mapping the present working directory to /tmp
so I have my hello-world.scif
recipe.
$ docker run -v $PWD:/tmp -it continuumio/miniconda3 /bin/bash
$ pip install scif
$ scif install /tmp/hello-world.scif
Installing base at /scif
+ apprun hello-world-echo
+ appenv hello-world-echo
+ apprun hello-world-script
+ appenv hello-world-script
+ appinstall hello-world-script
the same can be done from within the shell:
client.install()
Installing base at /scif
+ apprun hello-world-echo
+ appenv hello-world-echo
+ apprun hello-world-script
+ appenv hello-world-script
+ appinstall hello-world-script
and then we can see the tree hierarchies for the top level folders and apps
# apt-get install -y tree
$ tree /scif/
/scif/
├── apps
│ ├── hello-world-echo
│ │ ├── bin
│ │ ├── lib
│ │ └── scif
│ │ ├── environment.sh
│ │ ├── hello-world-echo.scif
│ │ └── runscript
│ └── hello-world-script
│ ├── bin
│ ├── lib
│ └── scif
│ ├── environment.sh
│ ├── hello-world-script.scif
│ └── runscript
└── data
├── hello-world-echo
└── hello-world-script
Want to see the apps installed more quickly than using tree?
scif apps
SCIF [app] [root]
1 hello-world-script /scif/apps/hello-world-script
2 hello-world-echo /scif/apps/hello-world-echo
If we had done this before install, we would get a ruhroh message.
$ scif apps
WARNING /scif is not detected as a recipe or base.
We have a scientific filesystem ready to go! At this point, if you want this generation to be reproducible, you need to generate a build specification (a Dockerfile) to build the container above.
FROM continuumio/miniconda3
RUN pip install scif
ADD hello-world.scif
RUN scif install /hello-world.scif
CMD ["scif"]
and then we could build that:
docker build -t vanessa/scif:hw .
and run it to see the help:
docker run -it vanessa/scif:hw
We will review more interesting commands and interaction with your SCIF in the next tutorial on how to run SCIF. This tutorial will finish to show installation in a Singularity container, followed by reverse engineer of a recipe from a SCIF.
Install SCIF in Singularity Natively
Singularity containers, in that they are optimized for scientific reproducibility, work extremely well with SCIF. When you build a Singularity container you create a squashfs filesystem that isn’t writable after build time, and so you can have confidence that the contents of your container will not be modified. The second important note is that Singularity has native integration for SCIF, meaning you can write the recipe directly into the build file! Let’s take a look at our hello world example above, but writing into a Singularity build file:
Bootstrap: docker
From: continuumio/miniconda3
# sudo singularity build hello-world.simg Singularity
########################################################
##
## SCIF:
##
## These sections are equivalent to a SCIF Recipe
##
########################################################
%appenv hello-world-echo
THEBESTAPP=$SCIF_APPNAME
export THEBESTAPP
%apprun hello-world-echo
echo "The best app is $THEBESTAPP"
%appinstall hello-world-script
echo "echo 'Hello World!'" >> bin/hello-world.sh
chmod u+x bin/hello-world.sh
%appenv hello-world-script
THEBESTAPP=$SCIF_APPNAME
export THEBESTAPP
%apprun hello-world-script
/bin/bash hello-world.sh
Notice the following:
- we have literally copied the recipe into the Singularity build file (middle section) and we are using the same image base (
continuumio/miniconda3
) as we did for Docker. I also like to use thecontinuumio/miniconda3
container for a tinier conda snake :). - We are not installing the scif software anywhere. Singularity has the integration natively.
sudo singularity build hello-world.simg Singularity
And then we would run a particular app as follows:
# hello-world-script
singularity run --app hello-world-script hello-world.simg
Hello World!
# hello-world-echo
$ singularity run --app hello-world-echo hello-world.simg
The best app is hello-world-echo
# no app specified (runscript)
$ singularity run hello-world.simg
$
For the last example, you don’t see any obvious change in output because we’ve merely shelled into the container! If you had defined a Singularity %runscript
section with a different command, it would be issued instead. If you are using the scif
software as a driver, you could export SCIF_ENTRYPOINT
to define a different command from the default of /bin/bash
.
Install SCIF in Singularity using Recipe
You can install use SCIF within Singularity containers using the scif software (akin to what we did with Docker. For this example, the Singularity build specification would be adjusted to look like this:
Bootstrap: docker
From: continuumio/miniconda3
# sudo singularity build hello-world-scif.simg Singularity.scif
%files
hello-world.scif
%environment
PATH=/opt/conda/bin:$PATH
export PATH
%post
/opt/conda/bin/pip install scif
/opt/conda/bin/scif install /hello-world.scif
%runscript
exec scif "$@"
if you wanted a development version of scif, you could instead install from Github:
Bootstrap: docker
From: continuumio/miniconda3
# sudo singularity build hello-world-scif.simg Singularity.scif
%files
hello-world.scif
%environment
PATH=/opt/conda/bin:$PATH
export PATH
%post
cd /opt && git clone https://www.github.com/vsoch/scif.git
cd scif && /opt/conda/bin/python setup.py install
/opt/conda/bin/scif install /hello-world.scif
%runscript
exec scif "$@"
Notice that although the sections are similar, they aren’t SCIF app sections, but rather traditional Singularity. Both are in spirit based on the idea of an rpm format: a description (recipe) to build an encapsulated environment. Also notice how we are again just copying the recipe file into the container, installing scif, and then handing the container’s entrypoint to scif to manage. I’m being very careful to add executables to the path and reference them directly, in the case that some future user might have a different version of the software installed locally. Any similar container or virtalization technology that follows these steps could support SCIF. To build the container, again we do:
sudo singularity build hello-world-scif.simg Singularity.scif
Importantly, since we have defined the entrypoint to be the scif executable, all of our commands and functions to interact with the SCIF, given the user runs the container, are exposed for us, and the filesystem is built based on the recipe provided. Just try running the container, you find the scif entrypoint:
./hello-world-scif.simg
Scientific Filesystem [v0.0.3]
usage: scif [-h] [--debug] [--quiet]
{version,pyshell,shell,preview,install,inspect,run,apps,dump,exec,write}
...
scientific filesystem tools
optional arguments:
-h, --help show this help message and exit
--debug use verbose logging to debug.
--quiet suppress print output
actions:
actions for Scientific Filesystem
{version,pyshell,shell,preview,install,inspect,run,apps,dump,exec,write}
scif actions
version show software version
pyshell Interactive python shell to scientific filesystem
shell shell to interact with scientific filesystem
preview preview changes to a filesytem
install install a recipe on the filesystem
inspect inspect an attribute for a scif installation
run entrypoint to run a scientific filesystem
apps list apps installed
dump dump recipe
exec execute a command to a scientific filesystem
Importantly, the container still can serve as a portable, reproducible product, and it isn’t the case that it has complete dependency on SCIF for all of its needs. Here is how you would list the apps in the container using the SCIF entrypoint:
./hello-world-scif.simg apps
SCIF [app] [root]
1 hello-world-echo /scif/apps/hello-world-echo
2 hello-world-script /scif/apps/hello-world-script
We would run a scif app like this:
./hello-world-scif.simg run hello-world-script
[hello-world-script] executing /bin/bash /scif/apps/hello-world-script/scif/runscript
Hello World!
And the user (you!) can of course use all of the other features that a container solution like Singularity provides.
Reverse engineer Recipe from SCIF
Finally, given that you have an existing scientific filesystem, you can easily produce its recipe file from the various metadata folders that are discovered, and given that the creator has not changed this content manually, although you can’t have guarantee that it’s reproducible, there is a good chance given all previous depdendencies are still available. Let’s use the container that we generated above, specifically with the command inspect
to take a look, and then dump
to export the recipe. First, you can inspect an entire filesystem and get the results in json printed to the screen:
scif inspect
{
"hello-world-script": {
"appinstall": [
"echo \"echo 'Hello World!'\" >> $SCIF_APPBIN/hello-world.sh",
"chmod u+x $SCIF_APPBIN/hello-world.sh"
],
"appenv": [
"THEBESTAPP $SCIF_APPNAME"
],
"apprun": [
"/bin/bash hello-world.sh"
]
},
"hello-world-echo": {
"appenv": [
"THEBESTAPP $SCIF_APPNAME"
],
"apprun": [
"echo \"The best app is $THEBESTAPP\""
]
}
}
And then dump the same content as a recipe:
scif dump
%appinstall
echo "echo 'Hello World!'" >> $SCIF_APPBIN/hello-world.sh
chmod u+x $SCIF_APPBIN/hello-world.sh
%appenv
THEBESTAPP $SCIF_APPNAME
%apprun
/bin/bash hello-world.sh
%appenv
THEBESTAPP $SCIF_APPNAME
%apprun
echo "The best app is $THEBESTAPP"
Now that you’ve gotten a hang for writing recipes, previewing and installing SCIF, let’s move on to look at some commands.
Date: 2023-02-24 Minor Version: 1.1
Scientific Filesystem (SCIF)
Authors:
- Vanessa Sochat «vsochat@stanford.edu» (@vsoch)
- Satrajit Ghosh «satra@mit.edu» (@satra)
- Christopher Madan «christopher.madan@nottingham.ac.uk» (@cMadan)
- Pim Schravendijk «p.schravendijk@rijkzwaan.nl» (@pimpim)
- singularity@lbl.gov (Singularity Community)
Terminology
Term | Definition |
---|---|
SCIF | The Scientific Filesystem |
Host | The machine or container where the scientific filesystem is installed alongside but separate from the core operating system |
App | A scientific filesystem module that is installed onto the Host |
Data | A root location in the scientific filesystem exclusively for data inputs and outputs that belong to Apps |
Recipe | A text file with extension .scif with sections that define a SCIF |
Discoverability | An ability to programatically uncover software and metadata |
Predictibility | A means of interaction with software without having prior knowledge to its creation |
Client | A controller for a SCIF, either for a developer or a user |
Integration | A third party software or tool that understands the SCIF structure and interacts with all or some portion of it |
Container | A linux container technology that can be installed with a SCIF |
SC | The scientific community, including academics, researchers, and associated engineering and development staff |
ENV | Environment variables in the namespace of SCIF (prefixed with SCIF_ that drive interaction of a user agent (UA) |
UA | A user agent (controller) that interacts with environment variables (ENV) to act as an interface between the user with the scientific filesystem |
Objective
To define an organizational standard for the scientific community, a Scientific Filesystem (SCIF) that will make it easy for the scientific community (SC) to generate predictible, and programmatically accessible scientific apps optimized for encapsulated, reproducible environments.
Goals
The Scientific Filesystem (SCIF) will:
- Enable a UA to write tools and software that work with a predictibly structured SCIF.
- Enable the SC to generate discoverable and predicable SCIF that work easily with integrations
- Define a set of environment variables with reasonable defaults that:
- Can be referenced by other installed applications, ideal for integrations that foster inter-app functionality
- Reliably determine locations for data, apps, and metadata.
- Define a recipe format that maps to a SCIF, and vice versa.
- Provide basic software to generate and manage SCIF.
- A single entrypoint that directs to app entrypoints and environments
- Preview a filesystem from a recipe file
- Generate a filesystem from a recipe file
- Generate a recipe file from a SCIF
- Generate a build specification for a container using a recipe file
- An API for developers to interact and control a SCIF
- In the context of modular, encapsulated container environment, provide reproducible scientific software that can be shared with confidence of discoverability.
Non-Goals
The Scientific Filesystem (SCIF) explicitly does not implement, define, or provide in v0.2:
- Packaging of an Integration of SCIF
- instructions on how to install or compile software for apps, or preference for organization of data files (under
/scif/data
) - Package management for Integrations, Apps, or the entirety of a SCIF
- Guarantee the quality for any specific section of a SCIF recipe generated by the SC. E.g.,
- An install section for an App may not compile on the system, or otherwise have a bug
- Any specific mechanism for workflow management or control of inputs, outputs, or other parameters specific to the software apps.
- A mechanism for lifecycle management of any particular SCIF.
- A mechanism for authentication and authorization or management of file permissions.
- Support or integration with non-linux operating systems.
Solution Overview
This specification defines a filesystem structure and environment variable namespace that interact with functions to produce discoverable software applications. We focus on recommendations for a container technology or other encapsulated environment to implement a SCIF compatible (containerized) filesystem (or integrations thereof) along with controllers for it (client). A compatible client or integration must provide a means to generate and interact with the SCIF filesystem and environment namespace, along with documentation to developers and users.
Architecture
The Scientific Filesytem has a default root on the host to intentionally be separate from standard linux folders, and to not interfere with likely existing folders (e.g., just /data
or /apps
). This is done by using a root $SCIF_BASE
, defaulting to /scif
/scif
/apps
/data
Apps
The base of /scif/apps
is where software modules will live, and can be generated automatically from a scif recipe and parsed to generate one. Clients and integrations that meet this specification to interact with a SCIF can parse these directories programatically without knowing specifics of the software apps installed. The structure of a complete app folder (foo
) looks like the following:
/scif
/apps
/foo
/bin
/lib
/scif
runscript
runscript.help
labels.json
environment.sh
- /scif is the root folder of the entire SCIF
- /scif/apps the apps folder, each subfolder is a single (modular) software app
- /scif/apps/foo an example software app folder, contains any files relevant or needed
- /scif/apps/foo/bin contains binaries for foo that should be on the
$PATH
- /scif/apps/foo/lib contains libraries for foo that should be on the
$LD_LIBRARY_PATH
- /scif/apps/foo/scif foo’s metadata folder for labels, environment, runscript, and help
Recipes
The scif recipe is the primary method by which a user from the SC can define and then generate a SCIF. The recipe consists of sections, where the delineation of a section is determined by starting with a %
and then being followed by a software app name that the section is relevant for.
%section foo
The recipe itself is a text file that must have extension .scif
, and can serve as input to clients and integrations for SCIF.
Sections
%appinstall
corresponds to executing commands within the folder to install the application. The writer of the recipe should expect the commands to be executed in$SCIF_APPROOT
, and thus write final outputs to$SCIF_APPBIN
that is located at$SCIF_APPROOT
/bin%apphelp
is written as a file calledrunscript.help
in the application’s metadata folder, where a client knows where to find it.%apprun
is also written as a file called runscript in the application’s metadata folder, and again looked for when the user asks to run the software.%appstart
is similar to run, but targets the startscript. In layman’s terms, a start would be the entrypoint for a service, and a run the default entrypoint.%applabels
will write alabels.json
in the application’s metadata folder, allowing for application specific labels.%appenv
will write an environment file in the application’s metadata folder (environment.sh
), allowing for definition of application specific environment variables. These variables are sourced when the application is active.%appfiles
a list of files to add from the host (or other location known to the integration or client) to the application root.%apptest
will run tests specific to the application, with present working directory assumed to be the software module’s folder
Specifically, this means that any software that implements the SCIF would do the following procedure to produce a SCIF from a recipe:
- Find any app section alongside a name (e.g.,
%appinstall foo
) as indication of an application command. - Parse the string following the section name (e.g. foo) as the name of the application. and create a folder for the app under the roots /scif/apps and /scif/data in lowercase given that each folder doesn’t already exist.
- If an installation procedure is present (
%appinstall foo
), perform it to install dependencies relative to the app’s install folder. - Create a metadata folder, scif, inside the application folder.
- For those defined, save the environment (
%appenv foo
), help (%apphelp foo
), runscript (%apprun foo
), and labels (%applabels foo
) to the metadata folder. - Generate a “bin” folder for foo, and automatically add it to the
$PATH
when foo is being used. Generate a “lib” folder for libraries, to also be added to the$LD_LIBRARY_PATH
when foo is used. - If the (
%apptest foo
) section is defined, tests are run on the newly installed app to insure it works as expected.
This general list is hashed out in more detail in the following sections.
Data
The base of /scif/data
is structured akin to apps - each installed application has its own folder, and additionally (but not required) subfolders are created for inputs and outputs:
/scif/data
/foo
/input
/output
SCIF does not enforce or state how the container creator should use the data folders, but rather encourages the creator to use the organization so that a user can intutiively know that any input for app foo
might go into /scif/data/foo/input
, general data for foo
might be in /scif/data/foo
, and global data for the entire container might be in /scif/data
. For example, this scientific workflow mounts a host folder at /scif/data
, and then each application creates it’s needed data folders at runtime simply by referencing an environment variable specific to the app.
Environment Namespace
Variable | Default | Definition |
---|---|---|
SCIF_BASE | /scif | the root location for SCIF |
SCIF_DATA | /scif/data | the root location for apps data |
SCIF_APPS | /scif/apps | the root location for installed apps |
SCIF_SHELL | /bin/bash | shell to use for “shell” command |
SCIF_PYSHELL | ipython | interactive python shell for pyshell command |
SCIF_ENTRYPOINT | /bin/bash | the command to run given no runscript or app defined |
SCIF_ENTRYFOLDER | SCIF_BASE | the entry folder to run the entrypoint command |
SCIF_MESSAGELEVEL | INFO | a client level of verbosity. Must be one of CRITICAL , ABORT , ERROR , WARNING , LOG , INFO , QUIET , VERBOSE , DEBUG |
Table 1 During interaction and runtime of SCIF, the following environment variables must be defined.
Active App Environment Namespace
Variable | Default | Definition |
---|---|---|
SCIF_APPNAME | example | the active software app |
SCIF_APPDATA | /scif/data/example | the data root for the active software app |
SCIF_APPROOT | /scif/apps/example | the install root for the active software app |
SCIF_APPBIN | /scif/apps/example/bin | the app bin, which is automatically added to the path when active |
SCIF_APPLIB | /scif/apps/example/lib | the app bin, which is automatically added to the path when active |
SCIF_APPMETA | /scif/apps/example/scif | the metadata folder |
SCIF_APPHELP | /scif/apps/example/scif/runscript.help | a text file with help to print for the user to the terminal |
SCIF_APPRUN | /scif/apps/example/scif/runscript | the commands to run as the app entrypoint |
SCIF_APPSTART | /scif/apps/example/scif/startscript | the start script (if provided) for an app |
SCIF_APPTEST | /scif/apps/example/scif/test | the commands to run to test the app |
SCIF_APPLABELS | /scif/apps/example/scif/labels.json | a key:value json lookup dictionary of labels |
SCIF_APPENV | /scif/apps/example/scif/environment.sh | a shell script to source for the software app environment |
Table 2: During interaction with a specific app (e.g., example
) in the table above, these environment variables must be defined in additional to the global SCIF variables above.
With the convention above, any tool that interacts with SCIF could, for example, know that data is expected to be written to SCIF_APPDATA
, or that a label (e.g., a parameter) can be read from the SCIF_APPLABELS
, or that an application has its binaries at SCIF_APPBIN
. The integration doesn’t need to know that example
is running.
Sleeping App Environment Namespace
Variable | Default | Definition |
---|---|---|
SCIF_APPNAME_sleeper | sleeper | the inactive software app |
SCIF_APPDATA_sleeper | /scif/data/sleeper | the data root for the inactive software app |
SCIF_APPROOT_sleeper | /scif/apps/sleeper | the install root for the active software app |
SCIF_APPBIN_sleeper | /scif/apps/sleeper/bin | the app bin, which is automatically added to the path when active |
SCIF_APPLIB_sleeper | /scif/apps/sleeper/lib | the app bin, which is automatically added to the path when active |
SCIF_APPMETA_sleeper | /scif/apps/sleeper/scif | the metadata folder |
SCIF_APPHELP_sleeper | /scif/apps/sleeper/scif/runscript.help | a text file with help to print for the user to the terminal |
SCIF_APPRUN_sleeper | /scif/apps/sleeper/scif/runscript | the commands to run as the app entrypoint |
SCIF_APPSTART_sleeper | /scif/apps/sleeper/scif/startscript | startscript for the app |
SCIF_APPTEST_sleeper | /scif/apps/sleeper/scif/test | the commands to run to test the app |
SCIF_APPLABELS_sleeper | /scif/apps/sleeper/scif/labels.json | a key:value json lookup dictionary of labels |
SCIF_APPENV_sleeper | /scif/apps/sleeper/scif/environment.sh | a shell script to source for the software app environment |
Table 3 :During interaction with a specific app, the other apps (not active) must also reveal themselves through an equivalent set of environment variables specific to them. In the table below we describe environment variables defined for such an app (e.g., sleeper
). The variables are equivalent except for the appending of the app name at the end:
This strategy ensures that any application built to have applications interact (e.g., a container installed with several steps that internally work together) can easily reference another application within the same container without knowing the exact path. Further, it ensures that entire spaces of environment variables (e.g., all application help text files) can be revealed by way of discovering environment variables that match a particular pattern.
Additional Variables
Additional variables are appropriate for integrations to define, but must begin with SCIF_
to remain in the scientific filesystem namespace. This also ensures that SCIF, when installed alongside other software and environments, can have its environemnt settings isolated by way of filtering down based on a common prefix.
Integration Requirements
The following sections detail requirements for integrations (third party software that uses SCIF) that are distinguished from clients (a controller or generator of a SCIF) because they don’t necessarily need to create a SCIF, but just interact with one.
Initialization
At the initialization step of an integration or client, the following must be true:
- the users level of verbosity is respected
- the global variables
SCIF_DATA
andSCIF_APPS
are found in the environment, and if not found, set to the defaults. These bases should be used to discover installed apps, and their correponding data and metadata files according to the environment variables specified previously. - if the user is calling a function to make changes to the filesystem, a check that the user has permission to write is performed first.
Client Requirements
A client, in that it is a more advanced integration, must also follow the integration requirements previously stated.
Recipe Parsing
A parser of a recipe should be able to read chunks of text and return a data structure that includes a dictionary of app settings, each indexed by the app name, and the next level having sections, each a list of the commands from the file. The ordering that the sections are presented, in terms of apps that are discovered first, should be preserved. For example, if app “foo” has a section for an install that is discovered first, an app “bar” with an install section should be run after. If an app “red” has an environment section that comes before an app “blue” install section, by way of being discovered first in the recipe, the sections for “red” will be processed first, even if the install section for red came after blue’s.
- comments should be removed
- illegal characters or uppercase in app names is not allowed
For example, the following chunk in a text file recipe sregistry.scif
would parse to
#######################################
# Singularity Registry
#######################################
%appenv registry
SREGISTRY_CLIENT=registry
export SREGISTRY_CLIENT
%apprun registry
exec /opt/conda/bin/sregistry "$@"
For example,
{"apps":
{ "registry":
{ 'appenv': ['SREGISTRY_CLIENT=hub', 'export SREGISTRY_CLIENT'],
'appinstall': ['pip install sregistry[registry]'],
'apprun': ['exec sregistry "$@"']}
}
}
}
And then the integration can add other sections or parameters that are needed to higher level indices here. For example, if an %install section is added, it would be parsed to:
{"apps":
{ "registry":
{ 'appenv': ['SREGISTRY_CLIENT=hub', 'export SREGISTRY_CLIENT'],
'appinstall': ['pip install sregistry[registry]'],
'apprun': ['exec sregistry "$@"']}
}
},
"install": { ... }
}
During the parsing process, the following rules must be followed:
- Finding the section
%appinstall
,%apphelp
,%apprun
is indication of an application command. Not shown but also relevant are%applabels
,%appfiles
. and%apptest
.
Creation
The creation step coincides with generation of a SCIF.
- initialization must come first, and the requirements maintained from there.
- the SCIF filesystem must be created from the recipe, with exit on any error and message to the user.
- Given the parsing of the name of a software app (e.g.,
foo
) from a recipe, a folder must be created, in lowercase, under/scif/apps
if it doesn’t exist. Ascif
metadata folder,scif
is generated inside the application. The metadata folder itself could be thought of as another level of scif applications within the application, if interesting or relevant. - A “bin” folder is automatically genereated for
foo
, and will be automatically added to$PATH
whenfoo
is being used. A “lib” folder is also generated, and is added to$LD_LIBRARY_PATH
whenfoo
is used.
Introspection
Any integration or tool must reveal metadata, and SCIF content to its users. This means that:
- a preview of a SCIF recipe shows how different sections of a recipe file map to a filesystem
- an installed SCIF must reveal its applications and the application metadata without any prior knowledge other than running the tool.
- The user must be able to specifically ask to see the following (for a particular or all apps):
- labels
- runscripts
- help files
- environments
- original relevant sections in the recipe
Controller Requirements
Any integration or tool that is considered a client and implements SCIF must provide a controller that:
- interacts with a SCIF or a recipe describing a SCIF
- allows the user to run the client or integration in context of the app (or not)
- when active, the apps
bin
is added to the$PATH
, and itslib
is added to the$LD_LIBRARY_PATH
. - has a reasonable fall back to a default (e.g., executing a container for an app without a runscript might launch an interactive shell)
- optional an ability to validate the correctness of a recipe or structure of a SCIF
Commands
The minimal set of functions for the controlled must support the following higher level commands. We use the controller name ctrl
in the examples below, and suggested action names:
- listing the UA must, with one command (e.g., “apps” or “list” or “apps.list” are recommended but not enforced) show all the software apps installed in a SCIF. E.g.,
ctrl apps
orctrl apps <container>
. - run the UA must provide a means for the user to execute the entrypoint for a chosen app (the
apprun
section from the recipe), e.g.,ctrl run example
. For a container, running the container in context of the app should direct to some central entrypoint that then can forward to the targeted app. If no specific apps are provided, the target should end in a reasonable default. - start akin to “run,” the UA can optionally provide a means for the user to execute the start script for a chosen app (the
appstart
section from the recipe), e.g.,ctrl start example
. Since start is optional, the command can be provided from a subcommand or not provided at all. - test if an app has an
apptest
section defined, the UA must run the tests for the targeted app when the user specifies a test command. - inspect: if an app has any metadata (labels, environment, help) coinciding with the same sections in the recipe prefixed with
app
then the integration must provide some kind of inspect command. - shell: if applicable (e.g., containers) provide an entrypoint to an interactive shell where software app of choice is active (or none are active).
- help: if an
apphelp
section is defined, a command must be exposed to show help for a targeted app. If no help section is provided, the software should alert the user and show the files provided for inspection or some other reasonable default. - labels: if an
applabels
section is defined, a command must be exposed to show labels for a targeted app. - environment: if an
appenv
section is defined, a command must be exposed to show labels for a targeted app.
Scientific Filesystem Quick Start
This is the quick start for using a Scientific Filesystem. We won’t be designing or creating one, but using a pre-generated Docker or Singularity container. The awesome part is that despite two different container technologies, the guts inside, and interacting with them, will be the same! Thus, we will first show how to pull the different containers, and then show usage side by side. If you do want to learn these first steps, see the preview and install tutorial. If you want a REALLY quick start, skip the prose and go to it! If you want code for a quick start, see our hello-world.scif Github repository.
What you should understand for this quick start is what we’ve done thus far. We (as the creator of a container with a scientific filesystem) wrote a recipe, a text file with instructions for interacting with different scientific filesystem software modules (“apps”) and then exposed a single entrypoint that is the controller for the different apps. Then we build the container.
Then the user that doesn’t have knowledge to the creation is able to interact with the software modules defined by the filesystem in a consistent and user friendly way. Right now I am playing the role of the creator, and you the user for doing this quick start. Let’s jump in!
Singularity
For our first container we are using the Singularity scientific filesystem container that is built in the preview and install tutorial. First, you should pull the image, either with the sregistry client or singularity natively:
singularity pull --name scif-cli shub://vsoch/scif:scif
Progress |===================================| 100.0%
Done. Container is at: /home/vanessa/Desktop/scif-cli
# or using sregistry client
# sregistry pull --name scif-cli vsoch/scif:scif
Importantly note that we are pulling the tag scif
. If you pull latest (e.g., vsoch/scif
then you will get a container with the scif client installed, but no filesystem.
Docker
For our second container, we are starting with the Dockerized scientific filesystem container that is built in the preview and install tutorial. We will use the latest for this example, and you can also see the tags available for other versions. First, pull the image:
docker pull vanessa/scif:hw
Note that the container vanessa/scif
is a base without any SCIF installed, but you would use it to install your own SCIF.
FROM vanessa/scif
ADD recipe.scif /
RUN scif install /recipe.scif
What you should understand from the install tutorial is that we (as the creator of the container) wrote a recipe, a text file with instructions for interacting with different scientific filesystem software modules (“apps”) and then exposed a single entrypoint that is the controller for the different apps. Then we build the container.
Then the user that doesn’t have knowledge to the creation is able to interact with the software modules defined by the filesystem in a consistent and user friendly way. Right now I am playing the role of the creator, and you the user. Let’s jump in! For each example below, you can test with Docker, Singularity, or both. For some, the $PS1
prompt might not match, but the output is equivalent.
Test the entrypoint
We can first test the entrypoint. We have just pulled the container, and we know nothing. So we run it.
docker run vanessa/scif:hw
./scif-cli
Scientific Filesystem [v0.0.3]
usage: scif [-h] [--debug] [--quiet] [--writable]
{version,pyshell,shell,preview,help,install,inspect,run,apps,dump,exec}
...
scientific filesystem tools
optional arguments:
-h, --help show this help message and exit
--debug use verbose logging to debug.
--quiet suppress print output
--writable, -w for relevant commands, if writable SCIF is needed
actions:
actions for Scientific Filesystem
{version,pyshell,shell,preview,help,install,inspect,run,apps,dump,exec}
scif actions
version show software version
pyshell Interactive python shell to scientific filesystem
shell shell to interact with scientific filesystem
preview preview changes to a filesytem
help look at help for an app, if it exists.
install install a recipe on the filesystem
inspect inspect an attribute for a scif installation
run entrypoint to run a scientific filesystem
apps list apps installed
dump dump recipe
exec execute a command to a scientific filesystem
Apps
We are familiar with the scientific filesystem, so we can use the apps
command to see what is installed. Since this conainer had the scientific filesystem installed on build, it already has three apps for us to play with:
docker run vanessa/scif:hw apps
./scif-cli apps
SCIF [app] [root]
1 hello-world-echo /scif/apps/hello-world-echo
2 hello-world-env /scif/apps/hello-world-env
3 hello-world-script /scif/apps/hello-world-script
Help
We can then ask for help for a particular app. This section is important for the creator to put some time into describing the basic important things that should be known.
docker run vanessa/scif:hw help hello-world-env
./scif-cli help hello-world-env
This is the help section for hello-world-env! This app
does not have anything other than an environment installed.
It just defines the environment variable `OMG=TACOS`. Try issuing
a command to the scif entrypoint to echo this variable:
scif exec hello-world-env echo [e]OMG
docker run vanessa/scif:hw exec hello-world-env echo [e]OMG
[hello-world-env] executing /bin/echo $OMG
TACOS
And you see a hasty help message that I wrote for the module hello-world-env
telling you how to run it. We will do that soon.
Inspect
We can also inspect an app of interest, which will spit out a metadata structure for it. You can think of the help command as returning a human friendly thing, and inspect something that can be programmatically parsed.
docker run vanessa/scif:hw inspect hello-world-env
./scif-cli inspect hello-world-env
{
"hello-world-env": {
"appenv": [
"OMG=TACOS"
],
"apphelp": [
"This is the help section for hello-world-env! This app",
"does not have anything other than an environment installed.",
"It just defines the environment variable `OMG=TACOS`. Try issuing",
"a command to the scif entrypoint to echo this variable:",
"scif exec hello-world-env echo [e]OMG",
"docker run vanessa/scif:hw exec hello-world-env echo [e]OMG",
"[hello-world-env] executing /bin/echo $OMG",
"TACOS"
]
}
}
Yes, it really just is an environment, and a help message for it! Now that we’ve seen this instruction twice, let’s give run a try with specification of an environment variable, $OMG
in the container.
Run
We can run the hello-world-echo
app like this:
docker run vanessa/scif:hw run hello-world-echo
./scif-cli run hello-world-echo
[hello-world-echo] executing /bin/bash /scif/apps/hello-world-echo/scif/runscript
The best app is hello-world-echo
What about our example above with hello-world-env
? It can be weird trying to pass an environment variable into a container from the host, because it gets evaluated (and then winds up something unexpected or empty!) To help this, with scif we use a modified syntax to pass the variable into the container. We replace $
with [e]
so that $VARIABLE
is [e]VARIABLE
. Here is an example:
docker run vanessa/scif:hw exec hello-world-env echo [e]OMG
./scif-cli exec hello-world-env echo [e]OMG
[hello-world-env] executing /bin/echo $OMG
TACOS
If we had done that with $
it would have evaluated the variable on our host shell, and passed nothing into the container (unless in fact $OMG
was defined on the host)/
Test
When the recipe has an %apptest
section for an app, the content is written to a file
test.sh
in the metadat folder. In this example application, we have a bash script
that will print a message and exit with 0 (success) given no argument, or exit
with a return code set to a provided argument:
%apptest hello-world-script
echo "Running tests!"
if [ $# -eq 0 ]
then
echo "No arguments supplied, tests pass!"
exit 0
fi
echo "Argument supplied, exiting with ${1}"
exit ${1}
Thus, we can test the hello-world-script
app like this to get a return code of 0:
docker run vanessa/scif:hw test hello-world-script
[hello-world-script] executing /bin/bash /scif/apps/hello-world-script/scif/test.sh
Running tests!
No arguments supplied, tests pass!
$ echo $?
0
And like this to get a non-zero return code (e.g., 255)
docker run vanessa/scif:hw test hello-world-script 255
[hello-world-script] executing /bin/bash /scif/apps/hello-world-script/scif/test.sh 255
Running tests!
Argument supplied, exiting with 255
ERROR Return code 255
$ echo $?
255
If an app doesn’t have tests, it prints this:
$ docker run vanessa/scif:hw test hello-world-echo
No tests defined for this app.
And for now, I’ve decided to provide a return code of 1, because it should be encouraged to write tests.
$ echo $?
1
Exec
You can also execute a command:
docker run vanessa/scif:hw exec hello-world-echo echo "Another hello!"
./scif-cli exec hello-world-echo echo "Another hello!"
[hello-world-echo] executing /bin/echo Another hello!
Another hello!
Bash Shell
If you want to interact with your container in the context of an app, there is a command for that! We can either shell into the container with the global scif environment (and no activated apps):
./scif-cli shell
WARNING No app selected, will run default ['/bin/bash']
executing /bin/bash
vanessa@thinkpad:/scif$
Notice how the path ($PS1
) in the terminal window changed to /scif
? The same happens for Docker of course:
docker run -it vanessa/scif:hw shell
WARNING No app selected, will run default ['/bin/bash']
executing /bin/bash
root@1ab15ba4cc3b:/scif
And the main difference here is that inside of the Docker image, we are (usually always) root :) Let’s see where we are:
$ echo $PWD
/scif
$ ls
apps
data
Since the scientific filesystem is not an entire environment like a container (it is rooted at /scif
this is where we shell to. If we shelled in context of an app (next) we would be in the apps root. Before exit, try looking at the environment to see that we have activated the global SCIF environment variables.
env | grep SCIF
...
SCIF_APPBIN_hello_world_echo=/scif/apps/hello-world-echo/bin
SCIF_MESSAGELEVEL=INFO
exit
...
After exiting, we can do the same in the context of a specific app:
./scif-cli shell hello-world-env
[hello-world-env] executing /bin/bash
vanessa@thinkpad:/scif/apps/hello-world-env$
Notice how we are in the root of hello-world-env
. It works the same for Docker.
docker run -it vanessa/scif:hw shell hello-world-env
[hello-world-env] executing /bin/bash
root@1ab15ba4cc3b:/scif/apps/hello-world-env# echo $OMG
TACOS
root@1ab15ba4cc3b:/scif/apps/hello-world-env#
This is a great example of how a single container can be used to serve different interactive environments.
Python Shell
We can enter an interactive shell for exploring the container filesystem, if we want to do more than execute commands. For docker, we have to append an -it
to mean we want an “interactive terminal” to the run command, using “pyshell” as the entrypoint:
./scif-cli pyshell
Found configurations for 2 scif apps
hello-world-echo
hello-world-script
[scif] /scif hello-world-echo | hello-world-script
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.
docker run -it vanessa/scif:hw pyshell
Found configurations for 3 scif apps
hello-world-env
hello-world-script
hello-world-echo
[scif] /scif hello-world-env | hello-world-script | hello-world-echo
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.
We can now look at details for the client:
In [1]: client
Out[1]: [scif]
In [2]: client.apps()
Out[2]: ['hello-world-env', 'hello-world-script', 'hello-world-echo']
The “preview” command is most useful when you have a recipe (and haven’t installed yet) and want to look at what will be created on the filesystem. But you can also run it for an already installed scif:
In [3]: client.preview()
[base] /scif
[apps] /scif/apps
[data] /scif/data
[root] /scif/apps/hello-world-env
[lib] /scif/apps/hello-world-env/lib
[bin] /scif/apps/hello-world-env/bin
[data] /scif/data/hello-world-env
+ appenv hello-world-env
/scif/apps/hello-world-env/scif/environment.sh
OMG=TACOS
+ apprecipe hello-world-env
/scif/apps/hello-world-env/scif/hello-world-env.scif
...
You can also get the full set of environment variables:
client.get_env()
Out[5]:
{'SCIF_APPBIN_hello_world_echo': '/scif/apps/hello-world-echo/bin',
'SCIF_APPBIN_hello_world_env': '/scif/apps/hello-world-env/bin',
'SCIF_APPBIN_hello_world_script': '/scif/apps/hello-world-script/bin',
'SCIF_APPDATA_hello_world_echo': '/scif/data/hello-world-echo',
'SCIF_APPDATA_hello_world_env': '/scif/data/hello-world-env',
'SCIF_APPDATA_hello_world_script': '/scif/data/hello-world-script',
'SCIF_APPENV_hello_world_echo': '/scif/apps/hello-world-echo/scif/environment.sh',
'SCIF_APPENV_hello_world_env': '/scif/apps/hello-world-env/scif/environment.sh',
'SCIF_APPENV_hello_world_script': '/scif/apps/hello-world-script/scif/environment.sh',
'SCIF_APPHELP_hello_world_echo': '/scif/apps/hello-world-echo/scif/runscript.help',
'SCIF_APPHELP_hello_world_env': '/scif/apps/hello-world-env/scif/runscript.help',
'SCIF_APPHELP_hello_world_script': '/scif/apps/hello-world-script/scif/runscript.help',
'SCIF_APPLABELS_hello_world_echo': '/scif/apps/hello-world-echo/scif/labels.json',
'SCIF_APPLABELS_hello_world_env': '/scif/apps/hello-world-env/scif/labels.json',
'SCIF_APPLABELS_hello_world_script': '/scif/apps/hello-world-script/scif/labels.json',
'SCIF_APPLIB_hello_world_echo': '/scif/apps/hello-world-echo/lib',
'SCIF_APPLIB_hello_world_env': '/scif/apps/hello-world-env/lib',
'SCIF_APPLIB_hello_world_script': '/scif/apps/hello-world-script/lib',
'SCIF_APPMETA_hello_world_echo': '/scif/apps/hello-world-echo/scif',
'SCIF_APPMETA_hello_world_env': '/scif/apps/hello-world-env/scif',
'SCIF_APPMETA_hello_world_script': '/scif/apps/hello-world-script/scif',
'SCIF_APPNAME_hello_world_echo': 'hello-world-echo',
'SCIF_APPNAME_hello_world_env': 'hello-world-env',
'SCIF_APPNAME_hello_world_script': 'hello-world-script',
'SCIF_APPRECIPE_hello_world_echo': '/scif/apps/hello-world-echo/scif/hello-world-echo.scif',
'SCIF_APPRECIPE_hello_world_env': '/scif/apps/hello-world-env/scif/hello-world-env.scif',
'SCIF_APPRECIPE_hello_world_script': '/scif/apps/hello-world-script/scif/hello-world-script.scif',
'SCIF_APPROOT_hello_world_echo': '/scif/apps/hello-world-echo',
'SCIF_APPROOT_hello_world_env': '/scif/apps/hello-world-env',
'SCIF_APPROOT_hello_world_script': '/scif/apps/hello-world-script',
'SCIF_APPRUN_hello_world_echo': '/scif/apps/hello-world-echo/scif/runscript',
'SCIF_APPRUN_hello_world_env': '/scif/apps/hello-world-env/scif/runscript',
'SCIF_APPRUN_hello_world_script': '/scif/apps/hello-world-script/scif/runscript',
'SCIF_APPS': '/scif/apps',
'SCIF_DATA': '/scif/data'}
or activate an app (and then see how the enviroment variables change, they will have added a few!
client.activate('hello-world-env')
In [7]: client.get_env()
Out[7]:
{'OMG': 'TACOS',
'SCIF_APPBIN': '/scif/apps/hello-world-env/bin',
'SCIF_APPDATA': '/scif/data/hello-world-env',
'SCIF_APPENV': '/scif/apps/hello-world-env/scif/environment.sh',
'SCIF_APPHELP': '/scif/apps/hello-world-env/scif/runscript.help',
'SCIF_APPLABELS': '/scif/apps/hello-world-env/scif/labels.json',
'SCIF_APPLIB': '/scif/apps/hello-world-env/lib',
'SCIF_APPMETA': '/scif/apps/hello-world-env/scif',
'SCIF_APPNAME': 'hello-world-env',
...
'SCIF_APPRECIPE': '/scif/apps/hello-world-env/scif/hello-world-env.scif',
'SCIF_APPROOT': '/scif/apps/hello-world-env',
'SCIF_APPRUN': '/scif/apps/hello-world-env/scif/runscript',
'SCIF_APPS': '/scif/apps',
'SCIF_DATA': '/scif/data'}
And then deactivate to undo that.
client.deactivate()
I’m skiing over sunshine with happiness to be working on this! The response to reviewers (with this update) will be submit in early March. Please contribute feedback (no matter how small!) to the docs and spec or the client (all served from that repo) and add your name to the specification. If you have already contributed, then please submit a pull request there and add your name! Onwards to scientific filesystem galaxies, friends!
Same commands, but more detail here, or get quick start code on Github here.
1. Get containers with the same scientific filesystem
singularity pull --name scif-cli shub://vsoch/scif:scif
docker pull vanessa/scif:hw
2. View the scientific filesystem entrypoint
docker run vanessa/scif:hw
./scif-cli
3. Discover Installed Apps
docker run vanessa/scif:hw apps
./scif-cli apps
4. Commands
Help
docker run vanessa/scif:hw help hello-world-env
./scif-cli help hello-world-env
Inspect
docker run vanessa/scif:hw inspect hello-world-env
./scif-cli inspect hello-world-env
Run
docker run vanessa/scif:hw run hello-world-echo
./scif-cli run hello-world-echo
Test
# Passing Test (test script returns 0 with no arguments)
docker run vanessa/scif:hw test hello-world-script
./scif-cli run hello-world-echo
echo $?
# Failing Test (test script returns argument as return code)
docker run vanessa/scif:hw test hello-world-script 255
./scif-cli run hello-world-echo 255
echo $?
Execute
docker run vanessa/scif:hw exec hello-world-echo echo "Another hello!"
./scif-cli exec hello-world-echo echo "Another hello!"
Execute command with environment variable $OMG
docker run vanessa/scif:hw exec hello-world-env echo [e]OMG
./scif-cli exec hello-world-env echo [e]OMG
Interactive shell
./scif-cli shell
docker run -it vanessa/scif:hw shell
Shell with application active
./scif-cli shell hello-world-env
docker run -it vanessa/scif:hw shell hello-world-env
Python interactive client
./scif-cli pyshell
docker run -it vanessa/scif:hw pyshell
At this point, you are comfortable with preview or install of a SCIF, and are ready for more advanced interaction. This means any of the following commands, and some of these have already been demonstrated in the previous tutorial, and will have detail added here:
- apps: list apps installed in a SCIF or recipe (or dump)
- inspect: look at details for one or more apps
- pyshell: interactive python shell to explore a SCIF or recipe
- shell: interactive shell to explore a SCIF
- run: run an app runscript, or default to shell in app context
- exec: execute a command to an app with context active
Apps
Apps will give you a simple listing of apps installed.
$ scif apps
SCIF [app] [root]
1 hello-world-script /scif/apps/hello-world-script
2 hello-world-echo /scif/apps/hello-world-echo
and if you don’t have any installed, you get a warning instead.
WARNING /scif is not detected as a recipe or base.
Inspect
Inspect is what you want to use to look at different metadata about a SCIF or app within it. Without any arguments, you will get json output for all apps installed in your SCIF:
scif inspect
{
"hello-world-script": {
"appinstall": [
"echo \"echo 'Hello World!'\" >> $SCIF_APPBIN/hello-world.sh",
"chmod u+x $SCIF_APPBIN/hello-world.sh"
],
"appenv": [
"THEBESTAPP $SCIF_APPNAME"
],
"apprun": [
"/bin/bash hello-world.sh"
]
},
"hello-world-echo": {
"appenv": [
"THEBESTAPP $SCIF_APPNAME"
],
"apprun": [
"echo \"The best app is $THEBESTAPP\""
]
}
}
If you add the name of an app, you will filter the output to it:
scif inspect hello-world-echo
{
"hello-world-echo": {
"appenv": [
"THEBESTAPP $SCIF_APPNAME"
],
"apprun": [
"echo \"The best app is $THEBESTAPP\""
]
}
}
If you add a particular attribute, you can further filter the result. For attributes your choices are a
(all) l
(labels) e
(environment) r
(runscript) f
(files) or i
(install). Here we ask to see the runscript (r
)
scif inspect hello-world-echo r
{
"hello-world-echo": {
"apprun": [
"echo \"The best app is $THEBESTAPP\""
]
}
}
Finally, to dump the original recipe, just use “dump”:
scif dump
%appinstall
echo "echo 'Hello World!'" >> $SCIF_APPBIN/hello-world.sh
chmod u+x $SCIF_APPBIN/hello-world.sh
%appenv
THEBESTAPP $SCIF_APPNAME
%apprun
/bin/bash hello-world.sh
%appenv
THEBESTAPP $SCIF_APPNAME
%apprun
echo "The best app is $THEBESTAPP"
Pyshell
Pyshell produces an interactive python terminal to work with your SCIF. As we saw previously, depending on whether you provide a recipe, an app, or both, you get different environments to work in. This is entry to a python interactive shell.
SCIF python shell Options
example | use case |
---|---|
scif pyshell [recipe] | interact with a recipe, no changes to the filesystem (akin to preview) |
scif pyshell | interact with a SCIF (no SCIF app active) |
scif pyshell [app] | interact with a SCIF with an app active. |
scif pyshell [recipe] [app] | interact with a recipe, still no changes, but load context of an app |
If you want to change the default shell used, set the variable SCIF_SHELL
. For default it will use /bin/bash
.
Shell
Running a standard shell means bash in the context of an app, meaning a subprocess for /bin/bash
(default determined by the environment variable SCIF_PYSHELL
during runtime). We aren’t in any sort of container, so there is no context of shell in the way you would think of shelling into a container or running SSH to connect to a server. This is entry to a bash shell.
SCIF shell Options
example | use case |
---|---|
scif shell | interact with a SCIF (no SCIF app active) |
scif shell [app] | interact with a SCIF with an app active. |
From the above, we can see that it’s possible to shell into only a context, meaning the context of an installed app, or the entire filesystem. Physically shelling into a recipe (a file system that doesn’t exist yet) doesn’t make sense.
Special Command Stickers
The Scientific Filesystem has some special command stickers that (as is true to their name) will stick with your commands and not get lost due to parsing by the host. For any command (e.g., run, exec) for which you are passing commands from the host to the SCIF, it sometimes can be troublesome if a variable gets evaluated on the host that needs to be evaluated in a container with SCIF, for example. To handle this, we have a small set of variables that are passed in, and evaluated internally or relative to the SCIF.
sticker | description | example |
---|---|---|
[e] |
an environment variable prefix | [e]OMG converts to $OMG |
[pipe] |
pipe (usually | ) |
env [pipe] grep _SCIF |
[out] |
output direction (usually > ) |
cat input.txt [out] output.txt |
[in] |
input direction (usually < ) |
|
[append] |
append to a file (usually >> ) |
echo “pancakes” » recipe.txt |
If you are having trouble with a command and you’ve tried quotes, these stickers might be exactly what you need! If there is another command being parsed that you want help with, let us know.
Run
The run command will explicity run the script provided in the apprun
section. In our example, running hello-world-echo
will echo “Hello World” to the console.
$ scif run hello-world-echo
[hello-world-echo] executing /bin/bash /scif/apps/hello-world-echo/scif/runscript
The best app is hello-world-echo
Running the second app, hello-world-script
, is actually a very cool example because in it we are calling a script that was written via an echo during the install step. If you remember:
%appinstall hello-world-script
echo "echo 'Hello World!'" >> $SCIF_APPBIN/hello-world.sh
chmod u+x $SCIF_APPBIN/hello-world.sh
and then running it:
%apprun hello-world-script
/bin/bash hello-world.sh
This means the following:
- we have the variable
SCIF_APPBIN
on the path for our usage during install. - we also have the
SCIF_APPBIN
added to thePATH
for runtime, so the script is found period.
Awesome!
If an app doesn’t have an %apprun
section, we default to a shell. For example, here is a third app that I added to install that is just an environment
%appenv hello-world-env
OMG=TACOS
# then scif install hello-world.scif again
scif install hello-world.scif hello-world-env
Installing base at /scif
+ appenv hello-world-env
if we run the app, we get a shell.
$ scif run hello-world-env
[hello-world-env] executing /bin/bash
$/scif/apps/hello-world-env# echo $OMG
TACOS
Again, cool! This example also nicely shows how an entire app can just be an environment context. Now it’s time to look at executing commands with “exec.”
Exec
Let’s take the hello-world-echo
app as an example. When we shell with context of this app, we would expect its environment variables to be active. Can we show that with exec?
$ scif exec hello-world-echo echo The best app is $THEBESTAPP
[hello-world-echo] executing /bin/echo
The best app is
Ruhroh! No output! The reason is because the variable gets evaluated before getting parsed in. To get around this, we have a special syntax to distinguish an environment variable. Try this!
$ scif exec hello-world-echo echo The best app is [e]THEBESTAPP
[hello-world-echo] executing /bin/echo The best app is $THEBESTAPP
The best app is hello-world-echo
This little bug has always been a pet peeve of mine, so I’ve introduced the little [e]
so we don’t have to struggle.
More coming soon!