Version: 1.1.1
Date: 2018-01-08

Scientific Filesystem



Here we present the Scientific Filesystem (SCIF), an organizational format that supports exposure of executables and metadata for discoverability. The format includes a known filesystem structure, a definition for a set of environment variables describing it, and functions for generation of the variables and interaction with the libraries, metadata, and executables located within. Some quick resources:

Although scif is not exclusively for containers, in that a container can provide an encapsulated, reproducible environment, the scientific filesystem works optimally when contained. Containers traditionally have one entrypoint, one environment context, and one set of labels to describe it. A container created with a Scientific Filesystem can expose multiple entry points, each that includes its own environment, metadata, installation steps, tests, files, and a primary executable script. SCIF thus brings internal modularity and programatic accessibility to encapsulated, reproducible environments.

What will I learn reading this?

We will start by reviewing the background and rationale for a scientific organizational format, and how SCIF achieves the goals of modularity, transparency, and consistency. We then review the organizational structure of the standard, and the different levels of internal modules that it affords. For this work, we provide several tutorials to demonstrate using the scientific filesystem with Docker and Singularity, and additionally have implemented and released the organizational format as a native integration with the Singularity software. Finally, we discuss use cases for SCIF in context of containers, including how SCIF can be used to evaluate software, provide metrics, serve scientific workflows, and execute a primary function under different contexts. To encourage collaboration and sharing of apps, we have developed an open source, version controlled, tested, and programmatically accessible web infrastructure at https://sci-f.github.io/apps. For developers, we provide a getting started guide for integration of SCIF into other container technologies or contexts. The ease of using SCIF to develop scientific containers offers promise for scientists to easily generate self-documenting containers that are programmatically parseable, exposing software and associated metadata, environments, and files to be quickly found and used.

Getting Started

Resources

We have provided several examples and tutorials for getting started with SCIF. If you have a workflow or container that you’d like to see added, please reach out. If you would like to see other ways to contribute, here are some suggestions. This work will remain open for contributions, and early contributions will be represented in an official submission.

Citation

If SCIF has been useful to you, please cite our work on GigaScience!

Vanessa Sochat; The Scientific Filesystem (SCIF), GigaScience, giy023,
https://doi.org/10.1093/gigascience/giy023


Philosophy

The philosophy behind the scientific filesystem is that we are empowered to better use software when clear entrypoints are defined, predictible, and discoverable.

  1. discoverability means that we have knowledge that software exists
  2. predictibility is a means to interact with software without having prior knowledge to its creation

At it’s core, the scientific filesystem is a simple description of how to organize software and metadata on a filesystem for discoverability. This description encompasses a filesystem structure to ensure that scientific software is distinct from standard software on the host, and is interacted with by way of a set of environment variables and functions to expose the structure to the user. In that containers provide encapsulated, reproducible environments, SCIF works optimally when installed and used within a container. With SCIF you can:

  1. write recipes that define environments, metadata, executables, and dependencies for a grouping of scientific software
  2. install the recipe into a container
  3. programatically and easily interact with the scientific software
  4. share your containers or recipes with confidence of discoverability

A container that is found to have a SCIF filesystem, without any additional definition, has a known method of interaction without any special knowledge. We will discuss this in more detail in the following sections.

What SCIF is not:

SCIF handles the step that comes before any kind of workflow manager is used, and is best used alongside some kind of container technology. It works with, and does not replace either of those things.

a workflow manager The scientific filesystem does not handle the details of connecting inputs and outputs, or typical functions we associate with workflow managers. SCIF, in that it gives structure to executables and content for these managers, adds to this ecosystem, but is agnostic to the particulars of user interaction with any particular software.

a dependency manager SCIF is agnostic to where you are using it. For reproducible software modules, especially those that require dependencies on top of the system, it is strongly recommended to use a container technology, and we provide numerous tutorials and examples for doing this. For those that have preference for working without a container, SCIF works just as easily on a host machine as it does a Docker or Singularity container, and in fact, many of the same functions can be run in a preview mode to assess functionality.

Goals

For the official description of goals, see the specification. In the following sections, we hope to show that SCIF is useful because it allows for:

While SCi-F is not a workflow manager, it follows naturally that the creator of a SCIF app might use these internal variables to have modules internally talk to one another. The user and creator do not need to know the structural specifics of the standard, but only how to reference them.

Modules

Modularity can be understood as the level of dimensionality that a user is instructed to operate, and for the purposes of this discussion we will suggest three general levels.

Given the different needs briefly explained above, it is clear that there is no correct level of dimensionality to define a module.

The definition of modularity is entirely based on the needs of the creator and user.

If we discover a container after creation, it cannot be clear without suitable documentation what level is represented, or how to interact with the container. What is needed is an ability for the creator of a container to implicitly define this level of usage simply by way of creating the container. SCIF allows us to do this. We can define modules on the levels of single files, or groups of software to perform a task. The metadata and organization of our preferences is automatically generated to create a complete, and programmatically understandable software package or scientific analysis.

Reproducible Practices

We have just discussed why internal modularity is important for container interaction, and this is also the case that it’s useful for reproducibility. It is important to distinguish the entire container as a reproducible product, and different software modules inside of it that depend on being served through the container to ensure reproducibility. While the container itself is portable, and designed to contain all dependencies to support reproducibility, the SCi-F module in and of itself is not guaranteed to be. For example, a user might define a module only with an %apprun section, implying that the folder only contains a runscript to execute. The user may have chosen to install dependencies for this script globally in the container, in the %post section, because perhaps they are shared across multiple modules. Under these conditions, if another user expected to add the module to a different build recipe, the dependencies from %post would be needed too. The host operating system also needs to be taken into consideration. A module with dependencies installed from the package manager “yum” would not move seamlessly into a debian base. However, appropriate checks and balances can be implemented into the process of moving applications:

Modular internal contents combined with reproducible portable environments via Singularity containers is a starting point for practicing good science.



Getting Started

We provide several tutorials for getting started with the scientific filesystem. If you haven’t read about the goals or specification for the standard (and are interested) you might do that first.

Quick Starts

If you want to quickly familarize with a pre-built example scientific filesystem, check out our container quickstart or really quick start to pull a pre-bulit Docker and/or Singularity image and start using it! Or if you are more visual, just watch an asciinema:

asciicast

Preview and Install

Here we will start with a scientific filesystem recipe, preview what it produces, and then install a scientific filesystem. This is likely the most common use case. preview and install.

Running Commands

The scientific filesystem comes with the expected commands to run, exec, inspect, start a shell, an interactive python shell, and a new syntax to easily distinguish an environment variable (introducing the beautiful [e] to say that [e]MYVAR = $MYVAR). Want to give it a spin? Read more about how to run commands.


This document reviews the current specification for the Scientific Filesystem. For the actual specification documents, including older versions, see the spec folder or the full spec. As stated in the introduction, the scientific filesystem is optimized for provide tools to generate predicible and discoverable scientific containers. The description here describes how to organize software and metadata toward this goal, and encompasses two components:

Read the Specification Document

1. Filesystem Structure

Filesystem structure refers to the structure and organization of the filesystem on a host operating system. Importantly, a scientific filesystem must live alongside a standard operating system, but maintain the core of content outside of it. The filesystem can exist in absence of any environment variables or functions to control it, and in that it is organized predictibly, the static folder and file organization has embedded with it knowledge about its usage. We can derive a filesystem structure from a SCIF recipe and in turn, we can derive a recipe from a filesystem.

2. Environment Variables

The environment is the means by which a scientific filesystem is interacted with. Upon installation or interaction, these variables are defined to drive further operations. For example, when you create a scif container, the variables used to generate the filesystem are maintained with the container for subsequent uses. If the user does not have preference, a set of meaningful defaults is used, making general use of scif quick and easy.

3. Software and Tools

The Scientific Filesystem provides software (a command line utility called scif) that can be installed into containers, and then generate and serve as a controller for the entire scientific filesystem when the container is built. See our examples for how this works.

While there are many conventions and tools for organization and control of environments and software, the scientific filesystem is unique in its ease of use and optimized integration with reproducible container technology, and focus on scientific containers.

Structure

We will start with a review of traditional file organization on a linux machine, and explain the rationale for the organization of SCIF.

Traditional File Organization

File organization is likely to vary a bit based on the host OS, but arguably most Linux flavor operating systems can said to be similar to the Filesystem Hierarchy Standard (FHS). For this discussion, we will disregard the inclusion of package managers, symbolic links, and custom structures, and focus on the core of FHS. We will discuss these locations in the context of how they do (or should) relate to a scientific container. It was an assessment of this current internal standard that led to the original development of SCIF.

Do Not Touch

Arguably, the following folders should not be touched by scientific software:

While these locations likely have libraries and functions needed by the host to support software, it should not be the case that a scientist installs his or her software under any of these locations. It would not be easy or intuitive to find or untangle it from what is already provided by the host.

Variable and Working Locations

The following locations are considered working directories in that they hold variables defined at runtime, or intermediate files that are expected to be purged at some point:

For example, in the context of a container, it is common practice (at least in the case of Singularity) to mount the user’s /home. Thus, if a scientist installed his or her software there, the user would not be able to see it unless this default was changed. For these reasons, it is not advisable to assume stability in putting software in these locations.

Connections

Connections for containers are devices and mount points. A container will arguably always need to be able to support mount points that might be necessary from its host, so it would be important for a scientific container to not put valuables in these locations.

SCIF File Organization

The Standard Container Integration Format defines a root base (/scif) that can be known and consistently mounted across research clusters. The location was chosen to be independent of any locations on traditional linux filesystems for the sole purpose of avoiding conflicts. Using this namespace, the SCIF has a set of defaults that are paired with an environment namespace to make it easy to find content. For example, if you were to discover an app named hello-world you would have confidence about it’s location under /scif/apps/hello-world ($SCIF_APPROOT). Along with these locations and environment variables, SCIF is entirely a set of rules about how a container software installs, organizes, and exposes software modules. For the specifics of SCIF we again direct the reader to the specification. Here we will wontinue with discussion of its development and rationale, starting with a review of some basic background about Linux Filesystems.

Apps

Software modules in the context of SCIF are called “apps.” and the base of /scif/apps is where these apps will live. To read more about apps, read the Specification Document.

Data

The base of /scif/data is structured akin to apps - each installed application has its own folder, and additionally a subfolder is created for inputs and outputs:

/scif/data
   /foo
      /input
      /output

To read more about data, read the Specification Document.


Example Use Cases

SCIF is powerful in that it supports multiple general use cases for scientific and systems evaluation and high level introspection. These use cases broadly fall in the areas of providing modular software, systems and metric evaluation, and guided collaboration to answer a scientific question. For all use cases, the recommendation is to use a contained environment (e.g., Singularity) for maximum reproducibility of the work.

Quick Examples

You might find SCIF useful if you:

A Research Scientist

An Administrator

Developers

If you have more examples, please add them!.


Community

To encourage sharing and distribution of useful apps, we have developed an online interface for easily exploring and sharing SCIF apps, and generating recipes using the apps, available at https://sci-f.github.io/apps.

Infrastructure Overview

The interface is served from a Github repository that renders static template files into a complete website that includes search across all content, exploration by tag (e.g., language or operating system), and instruction by way of reading examples and tutorials. Programmatic access to all apps is provided with a RESTful API for all apps or for a single one, as is an feed for interested users to be notified when new content is added. The interface also includes a recipe generator that allows a user to browse the site, save apps of interest in the browser’s local storage, and then combine them in a Singularity build file that can be downloaded in a compressed archive that includes instructions and any associated, required files for the app.

Contributing

Importantly, as the infrastructure is served from a Github repository, contributing does not require any expertise with web development or related technologies. The user can simply use Github to fork the repo, add a text file to the _apps folder, and submit a pull request (PR) to evaluate the contribution. The text file itself has a header section that contains bulletpointed lists of metadata like name, tags, and files, and the remainder of the file is the Singularity sections for the app (e.g., %apprun hello-world). When the PR is approved, the contribution will automatically render into all areas of the site. If an app includes associated files like scripts or configuration, this data is also easily added into a folder named equivalently to the file, alongside it (e.g., _apps/hello-world/hello-world-bash.md would have associated files in _apps/hello-world/hello-world-bash).

Importantly, by way of using version control, all changes and contributions are tracked and credit allocated.

Testing

Github also allows for complete testing of all contributions, and the repository is set up with a continuous integration (CI, which means testing) service called CircleCI that checks the following:

Any contribution that does not meet these requirements will get feedback during the PR, and the contributor can adjust the file to address any issues. As soon as the content is merged into the master branch, it is immediately live on the site.

Contribute to SCIF

There are so many ways to contribute! Here are just a few:

Future Work

SCIF is exciting because it makes basic scientific application development (optimized in containers) and usage easier. The user can immediately inspect and see software, and how to use it. The user can install additional software, copy from one container to another, or view metadata and help documentation. The developer is provided guidance for how and where to install and configure software, but complete freedom with regard to the software itself. The minimum requirements for any package are a name for its folder, and then optionally a runscript and help document for the user. In addition to these basic examples, we provide other future use cases that would be possible with the Scientific Filesystem.

Mapping of container landscape

Given separation of the software from the host, we can more easily derive features that compare software modules. These features can be used with standard unsupervised clustering to better understand how groups of software are used together. We can further apply different labels like domains and understand what modules are shared (or not shared) between scientific domains. We can find opportunity by discovering gaps, that perhaps a software module isn’t used for a particular domain (and it might be).

Artificial Intelligence (AI) Generated Containers

Given some functional goal, and given a set of containers with measurable features to achieving it, we can (either by brute force or more elegantly) programmatically generate and test containers toward some metric. The landscape of containers can easily be pruned in that the best containers for specific use cases can be easily determined automatically.


Appendix

Discussion

This discussion would not be complete without a mention for external modules or dependencies that are required by the software. For example, pip is a package manager that installs to some python base. Two equivalent python installations with different submodules are, by definition, different. There are two possible choices to take, and we leave this choice up to the generator of the container.

In practice, we have found that global installs tend to be larger, well maintained libraries (e.g., libraries installed with apt-get or package managers like pip) and having them represented in the %post section, to be shared among apps, helps with any kind of analysis that wants to separate what might be considered the general container “base” against the different custom software installed.

We do not enforce using SCIF for Singularity images or any other container. It’s creation and discussion is implemented and provided to only help scientists more easily create reproducible, transparent containers.

Conclusion

The Scientific Filesystem is advantageous in that the container creator can embed his or her work with implied metadata about software and container contents. SCIF also makes it easier to package different run scripts with the container, and expose them easily to the user. However, this does not mean that the standard approach of using a container as a general toolbox and distributing it with a series of external callers is bad or wrong. The choice to use (or not use) SCIF apps is largely dependent on the goals of the creator, and the intended users.

Additional reading material


Scientific Filesystem (SCIF)

Here we will start with a scientific filesystem recipe, preview what it produces, and then install a scientific filesystem. This is likely the most common use case, and we will do the following:

  1. Create a SCIF recipe
  2. Preview recipe install on the host
  3. Install SCIF in container (Docker or Singularity) using Recipe
  4. Export or reverse engineer recipe from SCIF

In the above, you will see that we will go full circle! Using SCIF, we are able to start with a text description of a set of applications (the recipe), produce a SCIF, and then reverse engineer the recipe from the SCIF.

+--------------------+        +--------------------+       +--------------------+
|                    |        |                    |       |                    |
|                    |        |                    |       |                    |
|                    |        |                    |       |                    |
|                    |        |                    |       |                    |
|       Recipe       +-------->      Preview       +------->     Scientific     |
|                    |        |         then       |       |     Filesystem     |
|                    |        |      Install       |       |                    |
|                    |        |                    |       |                    |
|                    |        |                    |       |                    |
+----------^---------+        +--------------------+       +----------+---------+
           |                                                          |
           |                                                          |
           |                                                          |
           +----------------------------------------------------------+

and once we have a SCIF, even if we don’t have the knowledge of the creator that wrote its recipe, because it exposes commands and metadata for discoverability, we can easily, programmatically interact with it without knowing the details:

+--------------------+  +-------------+
|                    +-->  Run        |
|                    |  +-------------+
|                    |  +-------------+
|                    +-->  Inspect    |
|     Scientific     |  +-------------+
|     Filesystem     |  +-------------+
|                    +-->  Shell      |
|                    |  +-------------+
|                    |  +-------------+
|                    +--> Exec        |
+--------------------+  +-------------+

You will recognize many of these functions map to those you are familiar with for interacting with containers, and in some respect they will feel very similar. The difference is that we are interacting directly with the SCIF in the container. If you installed the SCIF in another container (perhaps with an different external interface for interaction) you could have confidence that given that the container has SCIF installed at the entrypoint, your interaction will not be very different. We will discuss this more later in the next tutorial to run SCIF. For now, let’s take a look at writing, previewing, and installing a recipe to generate a SCIF.

Writing a SCIF recipe

Let’s first write a recipe, a text file called hello-world.scif, to produce different variations of hello world (with an echo, and with executing of a script).

%appenv hello-world-echo
    THEBESTAPP=$SCIF_APPNAME
    export THEBESTAPP
%apprun hello-world-echo
    echo "The best app is $THEBESTAPP"

%appinstall hello-world-script
    echo "echo 'Hello World!'" >> bin/hello-world.sh
    chmod u+x bin/hello-world.sh
%appenv hello-world-script
    THEBESTAPP=$SCIF_APPNAME
    export THEBESTAPP
%apprun hello-world-script
    /bin/bash hello-world.sh

Notice that I’m using three sections,

and you might not even need that many! An app can just be an environment, for example, or a single command to run. If you are interested in the different sections allowed for the specification, or the many flexible ways to generate an app, read our recipes guide to learn more.

Preview the recipe

I next would want to preview the recipe. What changes would be made on a host, and where? I feel comfortable doing this on my host because it isn’t actually going to make any changes. After I have installed scif (pip install scif) I can do this from the command line:

[base] /scif 
[apps] /scif/apps 
[data] /scif/data
 

[root] /scif/apps/hello-world-echo 
[lib] /scif/apps/hello-world-echo/lib 
[bin] /scif/apps/hello-world-echo/bin 
[data] /scif/data/hello-world-echo 
+ apprun hello-world-echo
/scif/apps/hello-world-echo/scif/runscript
/scif/apps/hello-world-echo/scif/runscript.help
echo "The best app is $THEBESTAPP"
+ appenv hello-world-echo
/scif/apps/hello-world-echo/scif/environment.sh
THEBESTAPP $SCIF_APPNAME
+ apprecipe hello-world-echo
/scif/apps/hello-world-echo/scif/hello-world-echo.scif


[root] /scif/apps/hello-world-script 
[lib] /scif/apps/hello-world-script/lib 
[bin] /scif/apps/hello-world-script/bin 
[data] /scif/data/hello-world-script 
+ apprun hello-world-script
/scif/apps/hello-world-script/scif/runscript
/scif/apps/hello-world-script/scif/runscript.help
/bin/bash hello-world.sh
+ appenv hello-world-script
/scif/apps/hello-world-script/scif/environment.sh
THEBESTAPP $SCIF_APPNAME
+ appinstall hello-world-script
echo "echo 'Hello World!'" >> $SCIF_APPBIN/hello-world.sh
chmod u+x $SCIF_APPBIN/hello-world.sh
+ apprecipe hello-world-script
/scif/apps/hello-world-script/scif/hello-world-script.scif

It’s pretty straight forward - each section pertains to an app, and first shows the root, lib, bin, and data folders for the app, and then the files that will be produced on install. The very top section shows the global changes (e.g., the global scif folders for data and apps. I can also run this command to just preview one app from the recipe:

$ scif preview hello-world.scif hello-world-echo
[base] /scif 
[apps] /scif/apps 
[data] /scif/data
 

[root] /scif/apps/hello-world-echo 
[lib] /scif/apps/hello-world-echo/lib 
[bin] /scif/apps/hello-world-echo/bin 
[data] /scif/data/hello-world-echo 
+ apprun hello-world-echo
/scif/apps/hello-world-echo/scif/runscript
/scif/apps/hello-world-echo/scif/runscript.help
echo "The best app is $THEBESTAPP"
+ appenv hello-world-echo
/scif/apps/hello-world-echo/scif/environment.sh
THEBESTAPP $SCIF_APPNAME
+ apprecipe hello-world-echo
/scif/apps/hello-world-echo/scif/hello-world-echo.scif

If you choose an app that doesn’t exist, it will tell you that.

[base] /scif 
[apps] /scif/apps 
[data] /scif/data
 
ERROR Cannot find app hello-world in config.

We can also do this interactively! scif comes with a development shell command, pyshell, that will either let you interact with a recipe, or a filesystem. The general command works as follows:

SCIF Python (pyshell) Options

example use case
scif pyshell [recipe] interact with a recipe, no changes to the filesystem (akin to preview)
scif pyshell interact with a SCIF (no SCIF app active)
scif pyshell [app] interact with a SCIF with an app active.
scif pyshell [recipe] [app] interact with a recipe, still no changes, but load context of an app

There also is a shell command for the equivalent interaction, however we would need to install a SCIF first:

scif shell
WARNING /scif is not detected as a recipe or base.

Well let’s install it then! But first, let’s interactively explore the recipe, hello-world.scif with the python shell (pyshell)

$ scif pyshell hello-world.scif
[scif] /scif hello-world-echo | hello-world-script
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: 

We can look at the ScifRecipe object, and then client instantiated from it:

In [1]: ScifRecipe
Out[1]: scif.main.base.ScifRecipe

In [2]: client
Out[2]: [scif]

and then produce the same output from the preview above (the recipe is already loaded, so we don’t need to supply it)

client.preview()

We can list all apps

In [4]: client.apps()
Out[4]: ['hello-world-echo', 'hello-world-script']

or look at the specifics for a particular app:

In [5]: client.app('hello-world-script')
Out[5]: 
OrderedDict([('appinstall',
              ['echo "echo \'Hello World!\'" >> $SCIF_APPBIN/hello-world.sh',
               'chmod u+x $SCIF_APPBIN/hello-world.sh']),
             ('appenv', ['THEBESTAPP $SCIF_APPNAME']),
             ('apprun', ['/bin/bash hello-world.sh'])])

Notice that we have an ordered dict, and the reason is so that the app lookup respects the order that you write the sections in the recipe file. Finally, we can look at all environment variables:

client.environment
{'SCIF_APPBIN_hello_world_echo': '/scif/apps/hello-world-echo/bin',
 'SCIF_APPBIN_hello_world_script': '/scif/apps/hello-world-script/bin',
 'SCIF_APPDATA_hello_world_echo': '/scif/data/hello-world-echo',
 'SCIF_APPDATA_hello_world_script': '/scif/data/hello-world-script',
 'SCIF_APPENV_hello_world_echo': '/scif/apps/hello-world-echo/scif/environment.sh',
 'SCIF_APPENV_hello_world_script': '/scif/apps/hello-world-script/scif/environment.sh',
 'SCIF_APPHELP_hello_world_echo': '/scif/apps/hello-world-echo/scif/runscript.help',
 'SCIF_APPHELP_hello_world_script': '/scif/apps/hello-world-script/scif/runscript.help',
 'SCIF_APPLABELS_hello_world_echo': '/scif/apps/hello-world-echo/scif/labels.json',
 'SCIF_APPLABELS_hello_world_script': '/scif/apps/hello-world-script/scif/labels.json',
 'SCIF_APPLIB_hello_world_echo': '/scif/apps/hello-world-echo/lib',
 'SCIF_APPLIB_hello_world_script': '/scif/apps/hello-world-script/lib',
 'SCIF_APPMETA_hello_world_echo': '/scif/apps/hello-world-echo/scif',
 'SCIF_APPMETA_hello_world_script': '/scif/apps/hello-world-script/scif',
 'SCIF_APPNAME_hello_world_echo': 'hello-world-echo',
 'SCIF_APPNAME_hello_world_script': 'hello-world-script',
 'SCIF_APPRECIPE_hello_world_echo': '/scif/apps/hello-world-echo/scif/hello-world-echo.scif',
 'SCIF_APPRECIPE_hello_world_script': '/scif/apps/hello-world-script/scif/hello-world-script.scif',
 'SCIF_APPROOT_hello_world_echo': '/scif/apps/hello-world-echo',
 'SCIF_APPROOT_hello_world_script': '/scif/apps/hello-world-script',
 'SCIF_APPRUN_hello_world_echo': '/scif/apps/hello-world-echo/scif/runscript',
 'SCIF_APPRUN_hello_world_script': '/scif/apps/hello-world-script/scif/runscript',
 'SCIF_APPS': '/scif/apps',
 'SCIF_DATA': '/scif/data'}

You will notice of an absence of variables that aren’t relative to one of our apps (e.g., we don’t see SCIF_APPNAME without hello_world_echo or hello_world_script), and this is because we are running the shell that doesn’t have the context of a particular SCIF app. If you wanted to activate an app, which comes down to exporting its environment, just ask for it:

client.activate('hello-world-script')

You can also do this directly from the terminal by giving pyshell the app name:

$ scif pyshell hello-world.scif hello-world-echo
[scif] /scif hello-world-echo | hello-world-script
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: client._active
Out[1]: 'hello-world-echo'

In both cases, when we look at the environment we see what we saw before, but we also see another set of environment variables with general names (e.g. SCIF_APPBIN) that are pointing to the active app, which in this case, is hello-world-echo

$ client.environment

{'SCIF_APPBIN': '/scif/apps/hello-world-echo/bin',
 'SCIF_APPDATA': '/scif/data/hello-world-echo',
 'SCIF_APPENV': '/scif/apps/hello-world-echo/scif/environment.sh',
 'SCIF_APPHELP': '/scif/apps/hello-world-echo/scif/runscript.help',
 'SCIF_APPLABELS': '/scif/apps/hello-world-echo/scif/labels.json',
 'SCIF_APPLIB': '/scif/apps/hello-world-echo/lib',
 'SCIF_APPMETA': '/scif/apps/hello-world-echo/scif',
 'SCIF_APPNAME': 'hello-world-echo',
 'SCIF_APPRECIPE': '/scif/apps/hello-world-echo/scif/hello-world-echo.scif',
 'SCIF_APPROOT': '/scif/apps/hello-world-echo',
 'SCIF_APPRUN': '/scif/apps/hello-world-echo/scif/runscript',
 'SCIF_APPS': '/scif/apps',
 'SCIF_DATA': '/scif/data'
  ...
}

The reason that we have both is, you can imagine a case where you want to run one SCIF app, and while it’s running, have it interact in some way with another known app. Given the information about other apps is exposed in the environment, this is possible to do! You can reference an app environment variable in another’s runscript. Here are some fun examples:

# have running foo call bar's runscript
%apprun foo
/bin/bash $SCIF_APPRUN_bar

# source bar's environment first, then run it
%apprun foo
source $SCIF_APPENV_bar
/bin/bash $SCIF_APPRUN_bar

Having these general environment variables for the active app also ensures that a tool created to work with SCIF applications knows how to find the active application. For example, let’s say I create an app that anyone can install into a container, and it only serves to loop through all the other apps and parse the text content of the runscript for some kind of machine learning algorithm? I’d be able to discovery them all without much work, and without knowing anything about the particular apps installed:

# This gives me the runscript locations
env | grep SCIF_APPRUN
SCIF_APPRUN_hello_world_script=/scif/apps/hello-world-script/scif/runscript
SCIF_APPRUN_hello_world_echo=/scif/apps/hello-world-echo/scif/runscript

I could also blindly parse through help files, environments, labels, or any other content that is programatically accessible! You don’t need to know anything in advance beyond these SCIF variables to interact with scientific filesystems. We will go into more command examples as we progress in these tutorials, let’s continue now and talk about how to build a SCIF into a container, which is the recommended approach for reproducibility.

Install SCIF in Docker using Recipe

Once we have previewed, we likely want to install. I’m not a fan of making changes on my host, so I’m going to test doing this inside of a Docker image. The image we will be building in the example below is provided on docker hub:

docker pull vanessa/scif:hw

First we will do it interactively, and then add a few lines to a recipe to do it properly. Here is the interactive version. We will use an miniconda image so we know that it comes with pip to install scif. Note that I’m mapping the present working directory to /tmp so I have my hello-world.scif recipe.

$ docker run -v $PWD:/tmp -it continuumio/miniconda3 /bin/bash
$ pip install scif
$ scif install /tmp/hello-world.scif
Installing base at /scif
+ apprun hello-world-echo
+ appenv hello-world-echo
+ apprun hello-world-script
+ appenv hello-world-script
+ appinstall hello-world-script

the same can be done from within the shell:

client.install()
Installing base at /scif
+ apprun hello-world-echo
+ appenv hello-world-echo
+ apprun hello-world-script
+ appenv hello-world-script
+ appinstall hello-world-script

and then we can see the tree hierarchies for the top level folders and apps

# apt-get install -y tree

$ tree /scif/
/scif/
├── apps
│   ├── hello-world-echo
│   │   ├── bin
│   │   ├── lib
│   │   └── scif
│   │       ├── environment.sh
│   │       ├── hello-world-echo.scif
│   │       └── runscript
│   └── hello-world-script
│       ├── bin
│       ├── lib
│       └── scif
│           ├── environment.sh
│           ├── hello-world-script.scif
│           └── runscript
└── data
    ├── hello-world-echo
    └── hello-world-script

Want to see the apps installed more quickly than using tree?

scif apps
SCIF [app]              [root]
1  hello-world-script	/scif/apps/hello-world-script
2  hello-world-echo	/scif/apps/hello-world-echo

If we had done this before install, we would get a ruhroh message.

$ scif apps
WARNING /scif is not detected as a recipe or base.

We have a scientific filesystem ready to go! At this point, if you want this generation to be reproducible, you need to generate a build specification (a Dockerfile) to build the container above.

FROM continuumio/miniconda3
RUN pip install scif
ADD hello-world.scif
RUN scif install /hello-world.scif
CMD ["scif"]

and then we could build that:

docker build -t vanessa/scif:hw .

and run it to see the help:

docker run -it vanessa/scif:hw

We will review more interesting commands and interaction with your SCIF in the next tutorial on how to run SCIF. This tutorial will finish to show installation in a Singularity container, followed by reverse engineer of a recipe from a SCIF.

Install SCIF in Singularity Natively

Singularity containers, in that they are optimized for scientific reproducibility, work extremely well with SCIF. When you build a Singularity container you create a squashfs filesystem that isn’t writable after build time, and so you can have confidence that the contents of your container will not be modified. The second important note is that Singularity has native integration for SCIF, meaning you can write the recipe directly into the build file! Let’s take a look at our hello world example above, but writing into a Singularity build file:

Bootstrap: docker
From: continuumio/miniconda3

# sudo singularity build hello-world.simg Singularity

########################################################
##
## SCIF:
##
## These sections are equivalent to a SCIF Recipe
##
########################################################

%appenv hello-world-echo
    THEBESTAPP=$SCIF_APPNAME
    export THEBESTAPP
%apprun hello-world-echo
    echo "The best app is $THEBESTAPP"

%appinstall hello-world-script
    echo "echo 'Hello World!'" >> bin/hello-world.sh
    chmod u+x bin/hello-world.sh
%appenv hello-world-script
    THEBESTAPP=$SCIF_APPNAME
    export THEBESTAPP
%apprun hello-world-script
    /bin/bash hello-world.sh

Notice the following:

  1. we have literally copied the recipe into the Singularity build file (middle section) and we are using the same image base (continuumio/miniconda3) as we did for Docker. I also like to use the continuumio/miniconda3 container for a tinier conda snake :).
  2. We are not installing the scif software anywhere. Singularity has the integration natively.
sudo singularity build hello-world.simg Singularity

And then we would run a particular app as follows:

# hello-world-script
singularity run --app hello-world-script hello-world.simg 
Hello World!

# hello-world-echo
$ singularity run --app hello-world-echo hello-world.simg 
The best app is hello-world-echo

# no app specified (runscript)
$ singularity run hello-world.simg 
$

For the last example, you don’t see any obvious change in output because we’ve merely shelled into the container! If you had defined a Singularity %runscript section with a different command, it would be issued instead. If you are using the scif software as a driver, you could export SCIF_ENTRYPOINT to define a different command from the default of /bin/bash.

Install SCIF in Singularity using Recipe

You can install use SCIF within Singularity containers using the scif software (akin to what we did with Docker. For this example, the Singularity build specification would be adjusted to look like this:

Bootstrap: docker
From: continuumio/miniconda3

# sudo singularity build hello-world-scif.simg Singularity.scif

%files
    hello-world.scif

%environment
    PATH=/opt/conda/bin:$PATH
    export PATH

%post
    /opt/conda/bin/pip install scif
    /opt/conda/bin/scif install /hello-world.scif

%runscript
    exec scif "$@"

if you wanted a development version of scif, you could instead install from Github:

Bootstrap: docker
From: continuumio/miniconda3

# sudo singularity build hello-world-scif.simg Singularity.scif

%files
    hello-world.scif

%environment
    PATH=/opt/conda/bin:$PATH
    export PATH

%post
    cd /opt && git clone https://www.github.com/vsoch/scif.git
    cd scif && /opt/conda/bin/python setup.py install
    /opt/conda/bin/scif install /hello-world.scif

%runscript
    exec scif "$@"

Notice that although the sections are similar, they aren’t SCIF app sections, but rather traditional Singularity. Both are in spirit based on the idea of an rpm format: a description (recipe) to build an encapsulated environment. Also notice how we are again just copying the recipe file into the container, installing scif, and then handing the container’s entrypoint to scif to manage. I’m being very careful to add executables to the path and reference them directly, in the case that some future user might have a different version of the software installed locally. Any similar container or virtalization technology that follows these steps could support SCIF. To build the container, again we do:

sudo singularity build hello-world-scif.simg Singularity.scif

Importantly, since we have defined the entrypoint to be the scif executable, all of our commands and functions to interact with the SCIF, given the user runs the container, are exposed for us, and the filesystem is built based on the recipe provided. Just try running the container, you find the scif entrypoint:

./hello-world-scif.simg 

Scientific Filesystem [v0.0.3]
usage: scif [-h] [--debug] [--quiet]
            {version,pyshell,shell,preview,install,inspect,run,apps,dump,exec,write}
            ...

scientific filesystem tools

optional arguments:
  -h, --help            show this help message and exit
  --debug               use verbose logging to debug.
  --quiet               suppress print output

actions:
  actions for Scientific Filesystem

  {version,pyshell,shell,preview,install,inspect,run,apps,dump,exec,write}
                        scif actions
    version             show software version
    pyshell             Interactive python shell to scientific filesystem
    shell               shell to interact with scientific filesystem
    preview             preview changes to a filesytem
    install             install a recipe on the filesystem
    inspect             inspect an attribute for a scif installation
    run                 entrypoint to run a scientific filesystem
    apps                list apps installed
    dump                dump recipe
    exec                execute a command to a scientific filesystem

Importantly, the container still can serve as a portable, reproducible product, and it isn’t the case that it has complete dependency on SCIF for all of its needs. Here is how you would list the apps in the container using the SCIF entrypoint:

./hello-world-scif.simg apps
SCIF [app]              [root]
1  hello-world-echo	/scif/apps/hello-world-echo
2  hello-world-script	/scif/apps/hello-world-script

We would run a scif app like this:

./hello-world-scif.simg run hello-world-script
[hello-world-script] executing /bin/bash /scif/apps/hello-world-script/scif/runscript
Hello World!

And the user (you!) can of course use all of the other features that a container solution like Singularity provides.

Reverse engineer Recipe from SCIF

Finally, given that you have an existing scientific filesystem, you can easily produce its recipe file from the various metadata folders that are discovered, and given that the creator has not changed this content manually, although you can’t have guarantee that it’s reproducible, there is a good chance given all previous depdendencies are still available. Let’s use the container that we generated above, specifically with the command inspect to take a look, and then dump to export the recipe. First, you can inspect an entire filesystem and get the results in json printed to the screen:

scif inspect 
{
    "hello-world-script": {
        "appinstall": [
            "echo \"echo 'Hello World!'\" >> $SCIF_APPBIN/hello-world.sh",
            "chmod u+x $SCIF_APPBIN/hello-world.sh"
        ],
        "appenv": [
            "THEBESTAPP $SCIF_APPNAME"
        ],
        "apprun": [
            "/bin/bash hello-world.sh"
        ]
    },
    "hello-world-echo": {
        "appenv": [
            "THEBESTAPP $SCIF_APPNAME"
        ],
        "apprun": [
            "echo \"The best app is $THEBESTAPP\""
        ]
    }
}

And then dump the same content as a recipe:

scif dump
%appinstall
echo "echo 'Hello World!'" >> $SCIF_APPBIN/hello-world.sh
chmod u+x $SCIF_APPBIN/hello-world.sh

%appenv
THEBESTAPP $SCIF_APPNAME

%apprun
/bin/bash hello-world.sh

%appenv
THEBESTAPP $SCIF_APPNAME

%apprun
echo "The best app is $THEBESTAPP"

Now that you’ve gotten a hang for writing recipes, previewing and installing SCIF, let’s move on to look at some commands.

Version: {{ site.version }}
Date: 2023-02-24 Minor Version: 1.1

Scientific Filesystem (SCIF)

Authors:

Terminology

Term Definition
SCIF The Scientific Filesystem
Host The machine or container where the scientific filesystem is installed alongside but separate from the core operating system
App A scientific filesystem module that is installed onto the Host
Data A root location in the scientific filesystem exclusively for data inputs and outputs that belong to Apps
Recipe A text file with extension .scif with sections that define a SCIF
Discoverability An ability to programatically uncover software and metadata
Predictibility A means of interaction with software without having prior knowledge to its creation
Client A controller for a SCIF, either for a developer or a user
Integration A third party software or tool that understands the SCIF structure and interacts with all or some portion of it
Container A linux container technology that can be installed with a SCIF
SC The scientific community, including academics, researchers, and associated engineering and development staff
ENV Environment variables in the namespace of SCIF (prefixed with SCIF_ that drive interaction of a user agent (UA)
UA A user agent (controller) that interacts with environment variables (ENV) to act as an interface between the user with the scientific filesystem

Objective

To define an organizational standard for the scientific community, a Scientific Filesystem (SCIF) that will make it easy for the scientific community (SC) to generate predictible, and programmatically accessible scientific apps optimized for encapsulated, reproducible environments.

Goals

The Scientific Filesystem (SCIF) will:

Non-Goals

The Scientific Filesystem (SCIF) explicitly does not implement, define, or provide in v0.2:

Solution Overview

This specification defines a filesystem structure and environment variable namespace that interact with functions to produce discoverable software applications. We focus on recommendations for a container technology or other encapsulated environment to implement a SCIF compatible (containerized) filesystem (or integrations thereof) along with controllers for it (client). A compatible client or integration must provide a means to generate and interact with the SCIF filesystem and environment namespace, along with documentation to developers and users.

Architecture

The Scientific Filesytem has a default root on the host to intentionally be separate from standard linux folders, and to not interfere with likely existing folders (e.g., just /data or /apps). This is done by using a root $SCIF_BASE, defaulting to /scif

/scif
   /apps
   /data

Apps

The base of /scif/apps is where software modules will live, and can be generated automatically from a scif recipe and parsed to generate one. Clients and integrations that meet this specification to interact with a SCIF can parse these directories programatically without knowing specifics of the software apps installed. The structure of a complete app folder (foo) looks like the following:

/scif
   /apps
     /foo
        /bin
        /lib
        /scif
          runscript
          runscript.help
          labels.json
          environment.sh

Recipes

The scif recipe is the primary method by which a user from the SC can define and then generate a SCIF. The recipe consists of sections, where the delineation of a section is determined by starting with a % and then being followed by a software app name that the section is relevant for.

%section foo

The recipe itself is a text file that must have extension .scif, and can serve as input to clients and integrations for SCIF.

Sections

Specifically, this means that any software that implements the SCIF would do the following procedure to produce a SCIF from a recipe:

  1. Find any app section alongside a name (e.g., %appinstall foo) as indication of an application command.
  2. Parse the string following the section name (e.g. foo) as the name of the application. and create a folder for the app under the roots /scif/apps and /scif/data in lowercase given that each folder doesn’t already exist.
  3. If an installation procedure is present (%appinstall foo), perform it to install dependencies relative to the app’s install folder.
  4. Create a metadata folder, scif, inside the application folder.
  5. For those defined, save the environment (%appenv foo), help (%apphelp foo), runscript (%apprun foo), and labels (%applabels foo) to the metadata folder.
  6. Generate a “bin” folder for foo, and automatically add it to the $PATH when foo is being used. Generate a “lib” folder for libraries, to also be added to the $LD_LIBRARY_PATH when foo is used.
  7. If the (%apptest foo) section is defined, tests are run on the newly installed app to insure it works as expected.

This general list is hashed out in more detail in the following sections.

Data

The base of /scif/data is structured akin to apps - each installed application has its own folder, and additionally (but not required) subfolders are created for inputs and outputs:

/scif/data
   /foo
      /input
      /output

SCIF does not enforce or state how the container creator should use the data folders, but rather encourages the creator to use the organization so that a user can intutiively know that any input for app foo might go into /scif/data/foo/input, general data for foo might be in /scif/data/foo, and global data for the entire container might be in /scif/data. For example, this scientific workflow mounts a host folder at /scif/data, and then each application creates it’s needed data folders at runtime simply by referencing an environment variable specific to the app.

Environment Namespace

Variable Default Definition
SCIF_BASE /scif the root location for SCIF
SCIF_DATA /scif/data the root location for apps data
SCIF_APPS /scif/apps the root location for installed apps
SCIF_SHELL /bin/bash shell to use for “shell” command
SCIF_PYSHELL ipython interactive python shell for pyshell command
SCIF_ENTRYPOINT /bin/bash the command to run given no runscript or app defined
SCIF_ENTRYFOLDER SCIF_BASE the entry folder to run the entrypoint command
SCIF_MESSAGELEVEL INFO a client level of verbosity. Must be one of CRITICAL, ABORT, ERROR, WARNING, LOG, INFO, QUIET, VERBOSE, DEBUG

Table 1 During interaction and runtime of SCIF, the following environment variables must be defined.

Active App Environment Namespace

Variable Default Definition
SCIF_APPNAME example the active software app
SCIF_APPDATA /scif/data/example the data root for the active software app
SCIF_APPROOT /scif/apps/example the install root for the active software app
SCIF_APPBIN /scif/apps/example/bin the app bin, which is automatically added to the path when active
SCIF_APPLIB /scif/apps/example/lib the app bin, which is automatically added to the path when active
SCIF_APPMETA /scif/apps/example/scif the metadata folder
SCIF_APPHELP /scif/apps/example/scif/runscript.help a text file with help to print for the user to the terminal
SCIF_APPRUN /scif/apps/example/scif/runscript the commands to run as the app entrypoint
SCIF_APPSTART /scif/apps/example/scif/startscript the start script (if provided) for an app
SCIF_APPTEST /scif/apps/example/scif/test the commands to run to test the app
SCIF_APPLABELS /scif/apps/example/scif/labels.json a key:value json lookup dictionary of labels
SCIF_APPENV /scif/apps/example/scif/environment.sh a shell script to source for the software app environment

Table 2: During interaction with a specific app (e.g., example) in the table above, these environment variables must be defined in additional to the global SCIF variables above.

With the convention above, any tool that interacts with SCIF could, for example, know that data is expected to be written to SCIF_APPDATA, or that a label (e.g., a parameter) can be read from the SCIF_APPLABELS, or that an application has its binaries at SCIF_APPBIN. The integration doesn’t need to know that example is running.

Sleeping App Environment Namespace

Variable Default Definition
SCIF_APPNAME_sleeper sleeper the inactive software app
SCIF_APPDATA_sleeper /scif/data/sleeper the data root for the inactive software app
SCIF_APPROOT_sleeper /scif/apps/sleeper the install root for the active software app
SCIF_APPBIN_sleeper /scif/apps/sleeper/bin the app bin, which is automatically added to the path when active
SCIF_APPLIB_sleeper /scif/apps/sleeper/lib the app bin, which is automatically added to the path when active
SCIF_APPMETA_sleeper /scif/apps/sleeper/scif the metadata folder
SCIF_APPHELP_sleeper /scif/apps/sleeper/scif/runscript.help a text file with help to print for the user to the terminal
SCIF_APPRUN_sleeper /scif/apps/sleeper/scif/runscript the commands to run as the app entrypoint
SCIF_APPSTART_sleeper /scif/apps/sleeper/scif/startscript startscript for the app
SCIF_APPTEST_sleeper /scif/apps/sleeper/scif/test the commands to run to test the app
SCIF_APPLABELS_sleeper /scif/apps/sleeper/scif/labels.json a key:value json lookup dictionary of labels
SCIF_APPENV_sleeper /scif/apps/sleeper/scif/environment.sh a shell script to source for the software app environment

Table 3 :During interaction with a specific app, the other apps (not active) must also reveal themselves through an equivalent set of environment variables specific to them. In the table below we describe environment variables defined for such an app (e.g., sleeper). The variables are equivalent except for the appending of the app name at the end:

This strategy ensures that any application built to have applications interact (e.g., a container installed with several steps that internally work together) can easily reference another application within the same container without knowing the exact path. Further, it ensures that entire spaces of environment variables (e.g., all application help text files) can be revealed by way of discovering environment variables that match a particular pattern.

Additional Variables

Additional variables are appropriate for integrations to define, but must begin with SCIF_ to remain in the scientific filesystem namespace. This also ensures that SCIF, when installed alongside other software and environments, can have its environemnt settings isolated by way of filtering down based on a common prefix.

Integration Requirements

The following sections detail requirements for integrations (third party software that uses SCIF) that are distinguished from clients (a controller or generator of a SCIF) because they don’t necessarily need to create a SCIF, but just interact with one.

Initialization

At the initialization step of an integration or client, the following must be true:

Client Requirements

A client, in that it is a more advanced integration, must also follow the integration requirements previously stated.

Recipe Parsing

A parser of a recipe should be able to read chunks of text and return a data structure that includes a dictionary of app settings, each indexed by the app name, and the next level having sections, each a list of the commands from the file. The ordering that the sections are presented, in terms of apps that are discovered first, should be preserved. For example, if app “foo” has a section for an install that is discovered first, an app “bar” with an install section should be run after. If an app “red” has an environment section that comes before an app “blue” install section, by way of being discovered first in the recipe, the sections for “red” will be processed first, even if the install section for red came after blue’s.

For example, the following chunk in a text file recipe sregistry.scif would parse to

#######################################
# Singularity Registry
#######################################
%appenv registry
SREGISTRY_CLIENT=registry
export SREGISTRY_CLIENT
%apprun registry
exec /opt/conda/bin/sregistry "$@"

For example,

{"apps":
    { "registry":
             { 'appenv': ['SREGISTRY_CLIENT=hub', 'export SREGISTRY_CLIENT'],
               'appinstall': ['pip install sregistry[registry]'],
               'apprun': ['exec sregistry "$@"']}
             }
    }
}

And then the integration can add other sections or parameters that are needed to higher level indices here. For example, if an %install section is added, it would be parsed to:

{"apps":
    { "registry":
             { 'appenv': ['SREGISTRY_CLIENT=hub', 'export SREGISTRY_CLIENT'],
               'appinstall': ['pip install sregistry[registry]'],
               'apprun': ['exec sregistry "$@"']}
             }
    },

 "install": { ... }
}

During the parsing process, the following rules must be followed:

Creation

The creation step coincides with generation of a SCIF.

Introspection

Any integration or tool must reveal metadata, and SCIF content to its users. This means that:

Controller Requirements

Any integration or tool that is considered a client and implements SCIF must provide a controller that:

Commands

The minimal set of functions for the controlled must support the following higher level commands. We use the controller name ctrl in the examples below, and suggested action names:

Scientific Filesystem Quick Start

This is the quick start for using a Scientific Filesystem. We won’t be designing or creating one, but using a pre-generated Docker or Singularity container. The awesome part is that despite two different container technologies, the guts inside, and interacting with them, will be the same! Thus, we will first show how to pull the different containers, and then show usage side by side. If you do want to learn these first steps, see the preview and install tutorial. If you want a REALLY quick start, skip the prose and go to it! If you want code for a quick start, see our hello-world.scif Github repository.

What you should understand for this quick start is what we’ve done thus far. We (as the creator of a container with a scientific filesystem) wrote a recipe, a text file with instructions for interacting with different scientific filesystem software modules (“apps”) and then exposed a single entrypoint that is the controller for the different apps. Then we build the container.

Then the user that doesn’t have knowledge to the creation is able to interact with the software modules defined by the filesystem in a consistent and user friendly way. Right now I am playing the role of the creator, and you the user for doing this quick start. Let’s jump in!

Singularity

For our first container we are using the Singularity scientific filesystem container that is built in the preview and install tutorial. First, you should pull the image, either with the sregistry client or singularity natively:

singularity pull --name scif-cli shub://vsoch/scif:scif
Progress |===================================| 100.0% 
Done. Container is at: /home/vanessa/Desktop/scif-cli

# or using sregistry client
# sregistry pull --name scif-cli vsoch/scif:scif

Importantly note that we are pulling the tag scif. If you pull latest (e.g., vsoch/scif then you will get a container with the scif client installed, but no filesystem.

Docker

For our second container, we are starting with the Dockerized scientific filesystem container that is built in the preview and install tutorial. We will use the latest for this example, and you can also see the tags available for other versions. First, pull the image:

docker pull vanessa/scif:hw

Note that the container vanessa/scif is a base without any SCIF installed, but you would use it to install your own SCIF.

FROM vanessa/scif
ADD recipe.scif /
RUN scif install /recipe.scif

What you should understand from the install tutorial is that we (as the creator of the container) wrote a recipe, a text file with instructions for interacting with different scientific filesystem software modules (“apps”) and then exposed a single entrypoint that is the controller for the different apps. Then we build the container.

Then the user that doesn’t have knowledge to the creation is able to interact with the software modules defined by the filesystem in a consistent and user friendly way. Right now I am playing the role of the creator, and you the user. Let’s jump in! For each example below, you can test with Docker, Singularity, or both. For some, the $PS1 prompt might not match, but the output is equivalent.

Test the entrypoint

We can first test the entrypoint. We have just pulled the container, and we know nothing. So we run it.

docker run vanessa/scif:hw
./scif-cli 

Scientific Filesystem [v0.0.3]
usage: scif [-h] [--debug] [--quiet] [--writable]
            {version,pyshell,shell,preview,help,install,inspect,run,apps,dump,exec}
            ...

scientific filesystem tools

optional arguments:
  -h, --help            show this help message and exit
  --debug               use verbose logging to debug.
  --quiet               suppress print output
  --writable, -w        for relevant commands, if writable SCIF is needed

actions:
  actions for Scientific Filesystem

  {version,pyshell,shell,preview,help,install,inspect,run,apps,dump,exec}
                        scif actions
    version             show software version
    pyshell             Interactive python shell to scientific filesystem
    shell               shell to interact with scientific filesystem
    preview             preview changes to a filesytem
    help                look at help for an app, if it exists.
    install             install a recipe on the filesystem
    inspect             inspect an attribute for a scif installation
    run                 entrypoint to run a scientific filesystem
    apps                list apps installed
    dump                dump recipe
    exec                execute a command to a scientific filesystem

Apps

We are familiar with the scientific filesystem, so we can use the apps command to see what is installed. Since this conainer had the scientific filesystem installed on build, it already has three apps for us to play with:

docker run vanessa/scif:hw apps
./scif-cli apps

SCIF [app]              [root]
1  hello-world-echo	/scif/apps/hello-world-echo
2  hello-world-env	/scif/apps/hello-world-env
3  hello-world-script	/scif/apps/hello-world-script

Help

We can then ask for help for a particular app. This section is important for the creator to put some time into describing the basic important things that should be known.

docker run vanessa/scif:hw help hello-world-env
./scif-cli help hello-world-env
This is the help section for hello-world-env! This app
does not have anything other than an environment installed.
It just defines the environment variable `OMG=TACOS`. Try issuing
a command to the scif entrypoint to echo this variable:
scif exec hello-world-env echo [e]OMG
docker run vanessa/scif:hw exec hello-world-env echo [e]OMG
[hello-world-env] executing /bin/echo $OMG
TACOS

And you see a hasty help message that I wrote for the module hello-world-env telling you how to run it. We will do that soon.

Inspect

We can also inspect an app of interest, which will spit out a metadata structure for it. You can think of the help command as returning a human friendly thing, and inspect something that can be programmatically parsed.

docker run vanessa/scif:hw inspect hello-world-env
./scif-cli inspect hello-world-env
{
    "hello-world-env": {
        "appenv": [
            "OMG=TACOS"
        ],
        "apphelp": [
            "This is the help section for hello-world-env! This app",
            "does not have anything other than an environment installed.",
            "It just defines the environment variable `OMG=TACOS`. Try issuing",
            "a command to the scif entrypoint to echo this variable:",
            "scif exec hello-world-env echo [e]OMG",
            "docker run vanessa/scif:hw exec hello-world-env echo [e]OMG",
            "[hello-world-env] executing /bin/echo $OMG",
            "TACOS"
        ]
    }
}

Yes, it really just is an environment, and a help message for it! Now that we’ve seen this instruction twice, let’s give run a try with specification of an environment variable, $OMG in the container.

Run

We can run the hello-world-echo app like this:

docker run vanessa/scif:hw run hello-world-echo
./scif-cli run hello-world-echo
[hello-world-echo] executing /bin/bash /scif/apps/hello-world-echo/scif/runscript
The best app is hello-world-echo

What about our example above with hello-world-env? It can be weird trying to pass an environment variable into a container from the host, because it gets evaluated (and then winds up something unexpected or empty!) To help this, with scif we use a modified syntax to pass the variable into the container. We replace $ with [e] so that $VARIABLE is [e]VARIABLE. Here is an example:

docker run vanessa/scif:hw exec hello-world-env echo [e]OMG
./scif-cli exec hello-world-env echo [e]OMG
[hello-world-env] executing /bin/echo $OMG
TACOS

If we had done that with $ it would have evaluated the variable on our host shell, and passed nothing into the container (unless in fact $OMG was defined on the host)/

Test

When the recipe has an %apptest section for an app, the content is written to a file test.sh in the metadat folder. In this example application, we have a bash script that will print a message and exit with 0 (success) given no argument, or exit with a return code set to a provided argument:

%apptest hello-world-script
    echo "Running tests!"
    if [ $# -eq 0 ]
        then
            echo "No arguments supplied, tests pass!"
            exit 0
        fi
    echo "Argument supplied, exiting with ${1}"
    exit ${1}

Thus, we can test the hello-world-script app like this to get a return code of 0:

docker run vanessa/scif:hw test hello-world-script
[hello-world-script] executing /bin/bash /scif/apps/hello-world-script/scif/test.sh
Running tests!
No arguments supplied, tests pass!

$ echo $?
0

And like this to get a non-zero return code (e.g., 255)

docker run vanessa/scif:hw test hello-world-script 255
[hello-world-script] executing /bin/bash /scif/apps/hello-world-script/scif/test.sh 255
Running tests!
Argument supplied, exiting with 255

ERROR Return code 255
$ echo $?
255

If an app doesn’t have tests, it prints this:

$ docker run vanessa/scif:hw test hello-world-echo
No tests defined for this app.

And for now, I’ve decided to provide a return code of 1, because it should be encouraged to write tests.

$ echo $?
1

Exec

You can also execute a command:

docker run vanessa/scif:hw exec hello-world-echo echo "Another hello!"
./scif-cli exec hello-world-echo echo "Another hello!"
[hello-world-echo] executing /bin/echo Another hello!
Another hello!

Bash Shell

If you want to interact with your container in the context of an app, there is a command for that! We can either shell into the container with the global scif environment (and no activated apps):

./scif-cli shell
WARNING No app selected, will run default ['/bin/bash']
executing /bin/bash 
vanessa@thinkpad:/scif$ 

Notice how the path ($PS1) in the terminal window changed to /scif? The same happens for Docker of course:

docker run -it vanessa/scif:hw shell
WARNING No app selected, will run default ['/bin/bash']
executing /bin/bash 
root@1ab15ba4cc3b:/scif

And the main difference here is that inside of the Docker image, we are (usually always) root :) Let’s see where we are:

$ echo $PWD
/scif

$ ls
apps
data

Since the scientific filesystem is not an entire environment like a container (it is rooted at /scif this is where we shell to. If we shelled in context of an app (next) we would be in the apps root. Before exit, try looking at the environment to see that we have activated the global SCIF environment variables.

env | grep SCIF
...
SCIF_APPBIN_hello_world_echo=/scif/apps/hello-world-echo/bin
SCIF_MESSAGELEVEL=INFO
exit
...

After exiting, we can do the same in the context of a specific app:

 ./scif-cli shell hello-world-env
[hello-world-env] executing /bin/bash 
vanessa@thinkpad:/scif/apps/hello-world-env$ 

Notice how we are in the root of hello-world-env. It works the same for Docker.

docker run -it vanessa/scif:hw shell hello-world-env
[hello-world-env] executing /bin/bash 
root@1ab15ba4cc3b:/scif/apps/hello-world-env# echo $OMG
TACOS
root@1ab15ba4cc3b:/scif/apps/hello-world-env# 

This is a great example of how a single container can be used to serve different interactive environments.

Python Shell

We can enter an interactive shell for exploring the container filesystem, if we want to do more than execute commands. For docker, we have to append an -it to mean we want an “interactive terminal” to the run command, using “pyshell” as the entrypoint:

./scif-cli pyshell
Found configurations for 2 scif apps
hello-world-echo
hello-world-script
[scif] /scif hello-world-echo | hello-world-script
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.
docker run -it vanessa/scif:hw pyshell
Found configurations for 3 scif apps
hello-world-env
hello-world-script
hello-world-echo
[scif] /scif hello-world-env | hello-world-script | hello-world-echo
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

We can now look at details for the client:

In [1]: client
Out[1]: [scif]

In [2]: client.apps()
Out[2]: ['hello-world-env', 'hello-world-script', 'hello-world-echo']

The “preview” command is most useful when you have a recipe (and haven’t installed yet) and want to look at what will be created on the filesystem. But you can also run it for an already installed scif:

In [3]: client.preview()
[base] /scif 
[apps] /scif/apps 
[data] /scif/data
 

[root] /scif/apps/hello-world-env 
[lib] /scif/apps/hello-world-env/lib 
[bin] /scif/apps/hello-world-env/bin 
[data] /scif/data/hello-world-env 
+ appenv hello-world-env
/scif/apps/hello-world-env/scif/environment.sh
OMG=TACOS
+ apprecipe hello-world-env
/scif/apps/hello-world-env/scif/hello-world-env.scif

...

You can also get the full set of environment variables:

client.get_env()
Out[5]: 
{'SCIF_APPBIN_hello_world_echo': '/scif/apps/hello-world-echo/bin',
 'SCIF_APPBIN_hello_world_env': '/scif/apps/hello-world-env/bin',
 'SCIF_APPBIN_hello_world_script': '/scif/apps/hello-world-script/bin',
 'SCIF_APPDATA_hello_world_echo': '/scif/data/hello-world-echo',
 'SCIF_APPDATA_hello_world_env': '/scif/data/hello-world-env',
 'SCIF_APPDATA_hello_world_script': '/scif/data/hello-world-script',
 'SCIF_APPENV_hello_world_echo': '/scif/apps/hello-world-echo/scif/environment.sh',
 'SCIF_APPENV_hello_world_env': '/scif/apps/hello-world-env/scif/environment.sh',
 'SCIF_APPENV_hello_world_script': '/scif/apps/hello-world-script/scif/environment.sh',
 'SCIF_APPHELP_hello_world_echo': '/scif/apps/hello-world-echo/scif/runscript.help',
 'SCIF_APPHELP_hello_world_env': '/scif/apps/hello-world-env/scif/runscript.help',
 'SCIF_APPHELP_hello_world_script': '/scif/apps/hello-world-script/scif/runscript.help',
 'SCIF_APPLABELS_hello_world_echo': '/scif/apps/hello-world-echo/scif/labels.json',
 'SCIF_APPLABELS_hello_world_env': '/scif/apps/hello-world-env/scif/labels.json',
 'SCIF_APPLABELS_hello_world_script': '/scif/apps/hello-world-script/scif/labels.json',
 'SCIF_APPLIB_hello_world_echo': '/scif/apps/hello-world-echo/lib',
 'SCIF_APPLIB_hello_world_env': '/scif/apps/hello-world-env/lib',
 'SCIF_APPLIB_hello_world_script': '/scif/apps/hello-world-script/lib',
 'SCIF_APPMETA_hello_world_echo': '/scif/apps/hello-world-echo/scif',
 'SCIF_APPMETA_hello_world_env': '/scif/apps/hello-world-env/scif',
 'SCIF_APPMETA_hello_world_script': '/scif/apps/hello-world-script/scif',
 'SCIF_APPNAME_hello_world_echo': 'hello-world-echo',
 'SCIF_APPNAME_hello_world_env': 'hello-world-env',
 'SCIF_APPNAME_hello_world_script': 'hello-world-script',
 'SCIF_APPRECIPE_hello_world_echo': '/scif/apps/hello-world-echo/scif/hello-world-echo.scif',
 'SCIF_APPRECIPE_hello_world_env': '/scif/apps/hello-world-env/scif/hello-world-env.scif',
 'SCIF_APPRECIPE_hello_world_script': '/scif/apps/hello-world-script/scif/hello-world-script.scif',
 'SCIF_APPROOT_hello_world_echo': '/scif/apps/hello-world-echo',
 'SCIF_APPROOT_hello_world_env': '/scif/apps/hello-world-env',
 'SCIF_APPROOT_hello_world_script': '/scif/apps/hello-world-script',
 'SCIF_APPRUN_hello_world_echo': '/scif/apps/hello-world-echo/scif/runscript',
 'SCIF_APPRUN_hello_world_env': '/scif/apps/hello-world-env/scif/runscript',
 'SCIF_APPRUN_hello_world_script': '/scif/apps/hello-world-script/scif/runscript',
 'SCIF_APPS': '/scif/apps',
 'SCIF_DATA': '/scif/data'}

or activate an app (and then see how the enviroment variables change, they will have added a few!

client.activate('hello-world-env')

In [7]: client.get_env()
Out[7]: 
{'OMG': 'TACOS',
 'SCIF_APPBIN': '/scif/apps/hello-world-env/bin',
 'SCIF_APPDATA': '/scif/data/hello-world-env',
 'SCIF_APPENV': '/scif/apps/hello-world-env/scif/environment.sh',
 'SCIF_APPHELP': '/scif/apps/hello-world-env/scif/runscript.help',
 'SCIF_APPLABELS': '/scif/apps/hello-world-env/scif/labels.json',
 'SCIF_APPLIB': '/scif/apps/hello-world-env/lib',
 'SCIF_APPMETA': '/scif/apps/hello-world-env/scif',
 'SCIF_APPNAME': 'hello-world-env',
  ...
 'SCIF_APPRECIPE': '/scif/apps/hello-world-env/scif/hello-world-env.scif',
 'SCIF_APPROOT': '/scif/apps/hello-world-env',
 'SCIF_APPRUN': '/scif/apps/hello-world-env/scif/runscript',
 'SCIF_APPS': '/scif/apps',
 'SCIF_DATA': '/scif/data'}

And then deactivate to undo that.

client.deactivate()

I’m skiing over sunshine with happiness to be working on this! The response to reviewers (with this update) will be submit in early March. Please contribute feedback (no matter how small!) to the docs and spec or the client (all served from that repo) and add your name to the specification. If you have already contributed, then please submit a pull request there and add your name! Onwards to scientific filesystem galaxies, friends!

Same commands, but more detail here, or get quick start code on Github here.

1. Get containers with the same scientific filesystem

singularity pull --name scif-cli shub://vsoch/scif:scif
docker pull vanessa/scif:hw

2. View the scientific filesystem entrypoint

docker run vanessa/scif:hw
./scif-cli 

3. Discover Installed Apps

docker run vanessa/scif:hw apps
./scif-cli apps

4. Commands

Help

docker run vanessa/scif:hw help hello-world-env
./scif-cli help hello-world-env

Inspect

docker run vanessa/scif:hw inspect hello-world-env
./scif-cli inspect hello-world-env

Run

docker run vanessa/scif:hw run hello-world-echo
./scif-cli run hello-world-echo

Test

# Passing Test (test script returns 0 with no arguments)
docker run vanessa/scif:hw test hello-world-script
./scif-cli run hello-world-echo
echo $?

# Failing Test (test script returns argument as return code)
docker run vanessa/scif:hw test hello-world-script 255
./scif-cli run hello-world-echo 255
echo $?

Execute

docker run vanessa/scif:hw exec hello-world-echo echo "Another hello!"
./scif-cli exec hello-world-echo echo "Another hello!"

Execute command with environment variable $OMG

docker run vanessa/scif:hw exec hello-world-env echo [e]OMG
./scif-cli exec hello-world-env echo [e]OMG

Interactive shell

./scif-cli shell
docker run -it vanessa/scif:hw shell

Shell with application active

 ./scif-cli shell hello-world-env
docker run -it vanessa/scif:hw shell hello-world-env

Python interactive client

./scif-cli pyshell
docker run -it vanessa/scif:hw pyshell

At this point, you are comfortable with preview or install of a SCIF, and are ready for more advanced interaction. This means any of the following commands, and some of these have already been demonstrated in the previous tutorial, and will have detail added here:

Apps

Apps will give you a simple listing of apps installed.

$ scif apps
SCIF [app]              [root]
1  hello-world-script	/scif/apps/hello-world-script
2  hello-world-echo	/scif/apps/hello-world-echo

and if you don’t have any installed, you get a warning instead.

WARNING /scif is not detected as a recipe or base.

Inspect

Inspect is what you want to use to look at different metadata about a SCIF or app within it. Without any arguments, you will get json output for all apps installed in your SCIF:

scif inspect 
{
    "hello-world-script": {
        "appinstall": [
            "echo \"echo 'Hello World!'\" >> $SCIF_APPBIN/hello-world.sh",
            "chmod u+x $SCIF_APPBIN/hello-world.sh"
        ],
        "appenv": [
            "THEBESTAPP $SCIF_APPNAME"
        ],
        "apprun": [
            "/bin/bash hello-world.sh"
        ]
    },
    "hello-world-echo": {
        "appenv": [
            "THEBESTAPP $SCIF_APPNAME"
        ],
        "apprun": [
            "echo \"The best app is $THEBESTAPP\""
        ]
    }
}

If you add the name of an app, you will filter the output to it:

scif inspect hello-world-echo
{
    "hello-world-echo": {
        "appenv": [
            "THEBESTAPP $SCIF_APPNAME"
        ],
        "apprun": [
            "echo \"The best app is $THEBESTAPP\""
        ]
    }
}

If you add a particular attribute, you can further filter the result. For attributes your choices are a (all) l (labels) e (environment) r (runscript) f (files) or i (install). Here we ask to see the runscript (r)

scif inspect hello-world-echo r
{
    "hello-world-echo": {
        "apprun": [
            "echo \"The best app is $THEBESTAPP\""
        ]
    }
}

Finally, to dump the original recipe, just use “dump”:

scif dump
%appinstall
echo "echo 'Hello World!'" >> $SCIF_APPBIN/hello-world.sh
chmod u+x $SCIF_APPBIN/hello-world.sh

%appenv
THEBESTAPP $SCIF_APPNAME

%apprun
/bin/bash hello-world.sh

%appenv
THEBESTAPP $SCIF_APPNAME

%apprun
echo "The best app is $THEBESTAPP"

Pyshell

Pyshell produces an interactive python terminal to work with your SCIF. As we saw previously, depending on whether you provide a recipe, an app, or both, you get different environments to work in. This is entry to a python interactive shell.

SCIF python shell Options

example use case
scif pyshell [recipe] interact with a recipe, no changes to the filesystem (akin to preview)
scif pyshell interact with a SCIF (no SCIF app active)
scif pyshell [app] interact with a SCIF with an app active.
scif pyshell [recipe] [app] interact with a recipe, still no changes, but load context of an app

If you want to change the default shell used, set the variable SCIF_SHELL. For default it will use /bin/bash.

Shell

Running a standard shell means bash in the context of an app, meaning a subprocess for /bin/bash (default determined by the environment variable SCIF_PYSHELL during runtime). We aren’t in any sort of container, so there is no context of shell in the way you would think of shelling into a container or running SSH to connect to a server. This is entry to a bash shell.

SCIF shell Options

example use case
scif shell interact with a SCIF (no SCIF app active)
scif shell [app] interact with a SCIF with an app active.

From the above, we can see that it’s possible to shell into only a context, meaning the context of an installed app, or the entire filesystem. Physically shelling into a recipe (a file system that doesn’t exist yet) doesn’t make sense.

Special Command Stickers

The Scientific Filesystem has some special command stickers that (as is true to their name) will stick with your commands and not get lost due to parsing by the host. For any command (e.g., run, exec) for which you are passing commands from the host to the SCIF, it sometimes can be troublesome if a variable gets evaluated on the host that needs to be evaluated in a container with SCIF, for example. To handle this, we have a small set of variables that are passed in, and evaluated internally or relative to the SCIF.

sticker description example
[e] an environment variable prefix [e]OMG converts to $OMG
[pipe] pipe (usually |) env [pipe] grep _SCIF
[out] output direction (usually >) cat input.txt [out] output.txt
[in] input direction (usually <)  
[append] append to a file (usually >>) echo “pancakes” » recipe.txt

If you are having trouble with a command and you’ve tried quotes, these stickers might be exactly what you need! If there is another command being parsed that you want help with, let us know.

Run

The run command will explicity run the script provided in the apprun section. In our example, running hello-world-echo will echo “Hello World” to the console.

$ scif run hello-world-echo
[hello-world-echo] executing /bin/bash /scif/apps/hello-world-echo/scif/runscript
The best app is hello-world-echo

Running the second app, hello-world-script, is actually a very cool example because in it we are calling a script that was written via an echo during the install step. If you remember:

%appinstall hello-world-script
    echo "echo 'Hello World!'" >> $SCIF_APPBIN/hello-world.sh
    chmod u+x $SCIF_APPBIN/hello-world.sh

and then running it:

%apprun hello-world-script
    /bin/bash hello-world.sh

This means the following:

Awesome!

If an app doesn’t have an %apprun section, we default to a shell. For example, here is a third app that I added to install that is just an environment

%appenv hello-world-env
    OMG=TACOS

# then scif install hello-world.scif again

scif install hello-world.scif hello-world-env
Installing base at /scif
+ appenv hello-world-env

if we run the app, we get a shell.

$ scif run hello-world-env
[hello-world-env] executing /bin/bash 
$/scif/apps/hello-world-env# echo $OMG
TACOS

Again, cool! This example also nicely shows how an entire app can just be an environment context. Now it’s time to look at executing commands with “exec.”

Exec

Let’s take the hello-world-echo app as an example. When we shell with context of this app, we would expect its environment variables to be active. Can we show that with exec?

$ scif exec hello-world-echo echo The best app is $THEBESTAPP
[hello-world-echo] executing /bin/echo
The best app is 

Ruhroh! No output! The reason is because the variable gets evaluated before getting parsed in. To get around this, we have a special syntax to distinguish an environment variable. Try this!

$ scif exec hello-world-echo echo The best app is [e]THEBESTAPP
[hello-world-echo] executing /bin/echo The best app is $THEBESTAPP
The best app is hello-world-echo

This little bug has always been a pet peeve of mine, so I’ve introduced the little [e] so we don’t have to struggle.

More coming soon!