Download and Run Analysis Locally

Date: 26-Sept-2022
Description:
  • This notebook provides a walk through to download analyses stored in Flywheel to a local file system. Some neuroimaging analyses and workflows are not yet supported as Flywheel gears. For these workflows, the current workaround is to download your Flywheel analyses (or workflow inputs) run the workflow on your local machine, then upload the restuls (workflow outputs) back to flywheel as a new analysis.

  • In this example, we will be downloading data into a CURC scratch filesystem and run a group CONN analysis on our high performance compute.

  • It should be possible to run this notebook in any jupyter-compatible thrid party-platforms such as google collab or mybinder.org.

Requirements

  • University of Colorado at Boulder Research Computing (CURC) account

  • Access to University of Colorado Flywheel Instance

The following workbook should be run on CURC Blanca Compute. If you are unsure you have the correct permission or access to these resources please contact INC Data and Analysis team: Amy Hegarty [amy.hegarty@colorado.edu] or Lena Sherbakov [lena.sherbakov@colorado.edu].

CURC Jupyterhub

Before launching this jupyter notebook, users should launch this session using Open OnDemand.

TIP: Follow the instructions on INC Documentation to get started with Jupyter Notebooks.

We will be working on a large scratch system mounted only on Blanca compute nodes in this tutorial. If you do not have access to this filesystem you should select a similar large capacity scratch enviornment for analysis.

Setup

TIP: Please use the “flywheel” kernel for this tutorial. If you do not see a “flywheel” kernel, contact INC Data and Analysis team to install this environment.

[2]:
print("Welcome to Intermountain Neuroimaging Consortium!")
Welcome to Intermountain Neuroimaging Consortium!
[ ]:
# Python standard package come first
import logging
import os, platform, sys
from zipfile import ZipFile

# Third party packages come second
import flywheel

# add software paths
sys.path.append('/projects/ics/software/flywheel-python/bids-client/')
sys.path.append('/projects/ics/software/flywheel-python/')

Lets intialize a logger to keep track of the progress of our job (e.g. useful to keep track of runtime).

[ ]:
# Instantiate a logger
logging.basicConfig(level=logging.INFO)
log = logging.getLogger('root')

Lets check we are on the correct computing system.

[ ]:
host = os.getenv('HOSTNAME', os.getenv('COMPUTERNAME', platform.node())).split('.')[0]

if "bnode" not in host:
    log.error("Tutorial should be run on CURC high performance compute nodes: blanca")

Flywheel API Key and Client

You can get you API_KEY by following the steps described in the Flywheel SDK doc here.

DANGER: Do NOT share your API key with anyone for any reason - it is the same as sharing your password and may break human subject participant confidentiality. ALWAYS obscure credentials from your code, especially when sharing with others/commiting to a shared repository.

[ ]:
API_KEY = getpass('Enter API_KEY here: ')

Instantiate the Flywheel API client either using the API_KEY provided by the user input above or by reading it from the environment variable FW_KEY.

[ ]:
fw = flywheel.Client(API_KEY if 'API_KEY' in locals() else os.environ.get('FW_KEY'))

You can check which Flywheel instance you have been authenticated against with the following:

[ ]:
log.info('You are now logged in as %s to %s', fw.get_current_user()['email'], fw.get_config()['site']['api_url'])

Constants

Often you will have to define a few constants in your notebook which serve as the inputs. Such constant for instance is the API_KEY that was used to instantiate the Flywheel client. Other examples could be a PROJECT_ID or PROJECT_LABEL that will be used to identify a specific project.

[ ]:
PROJECT_LABEL = 'MyProject'

Helper functions

Here are all the custom helper functions we have developed for use in this example.

[ ]:
def get_project_id(fw, project_label):
    """Return the first project ID matching project_label

    Args:
       fw (flywheel.Client): A flywheel client
       project_label (str):  A Project label

    Returns:
       (str): Project ID or None if no project found
    """
    project = fw.projects.find_first(f'label={project_label}')
    if project:
        return project.id
    else:
        return None

Main script

We will be using the Flywheel SDK to identify and reterive specific analysis files stored in Flywheel for download. Importantly, since the original analysis files are still retained in Flywheel we can use our local copy of the data as a temporary or scratch workspace and remove all files at the end of this workflow.

First, lets point to a project in Flywheel.

[ ]:
project_id = get_project_id(fw, PROJECT_LABEL)
if project_id:
    print(f'Project ID is: {project_id}.')
else:
    print(f'No Project with label {PROJECT_LABEL} found.')

Lets start by getting some information about the analyses in our flywheel project. We will loop through all the sessions in our desired project, and log the number of complete, failed, and cancelled jobs. This structure will be the same that we will use when downloading the list of analyses next.

[ ]:
# get project object
project = fw.get_project(project_id)

gear_name = 'bids-fmriprep'

icomplete = 0
ifailed = 0
icancelled = 0
isessions = 0

# loop through all sessions in the project. More detailed filters could be
#   used to specify a subset of sessions
for session in project.sessions.find():

    full_session = fw.get_session(session.id)
    isessions += 1
    for analysis in full_session.analyses:

        analysis_job=analysis.job

        #only print ones that match the  analysis label
        if gear_name in analysis.label:
            if any(analysis_job.state in string for string in ["complete"]):
                icomplete += 1

            elif any(analysis_job.state in string for string in ["failed"]):
                ifailed += 1
                log.info("subject: %s session: %s %s job: %s %s", session.subject.label, session.label, session.id, analysis.id, analysis_job.state)

            elif any(analysis_job.state in string for string in ["cancelled"]):
                icancelled += 1
                log.info("subject: %s session: %s %s job: %s %s", session.subject.label, session.label, session.id, analysis.id, analysis_job.state)

log.info('%s Sessions, gear %s: %s complete, %s failed, %s cancelled', str(isessions),gear_name,str(icomplete), str(ifailed), str(icancelled))

Lets point to a file location on our local machine (in this case Blanca Compute) to store the analyses locally.

TIP: Point to a large scratch filesystem for fast read and write operations.

[ ]:
# path to scratch directory
username=os.getenv('USER')
scratch='/scratch/blanca/'+username+'/'
os.chdir(scratch)

Download Analyses to Local Filesystem

Next we are going download analyses of interest. In this example, we will download all fmriprep output directories. There is plenty of customization you can use here to be sure you are downloading only the sessions and analyses of interest. Check out some filtering examples in our tutorial here .

WARNING: This will take some time!

[ ]:
# loop through all sessions in the project. More detailed filters could be
#   used to specify a subset of sessions
for session in project.sessions.find():

    full_session = fw.get_session(session.id)
    isessions += 1
    for analysis in full_session.analyses:

        analysis_job=analysis.job

        #only download ones that match the analysis label
        if gear_name in analysis.label:
            if any(analysis_job.state in string for string in ["complete"]):

                # Download the data to scratch
                for fl in analysis.files:
                    fl.download(scratch+fl['name'])

                    # unzip files
                    if '.zip' in fl['name']:
                        zipfile = ZipFile(scratch+fl['name'], "r")
                        zipfile.extractall(scratch)


                log.info('Downloaded analysis: %s for Subject: %s Session: %s', analysis.label,session.subject.label, session.label)


Now that your data is stored in local scratch, its time to run your analysis as you always would, for example… CONN analysis. To get started with your CONN analysis, you will need to return to Open OnDemand and start a new core desktop session. For more information on how to do this, please visit our documentation

After you are happy with your analysis don’t forget to upload your analysis results to flywheel. Follow the instructions in this tutorial to get started: upload-my-analysis.ipynb