Vera User Guide

View the slides from the April 30, 2021 lecture Using Vera

Vera is a dedicated cluster for the McWilliams Center for Cosmology and Astrophysics at Carnegie Mellon University.

Getting Started

To get access to Vera, you must:

  • create a PSC username and password, if you don't already have those
  • request an allocation on Vera (if you are faculty) or get added to a Vera allocation (if you are a student)

Create a PSC username and password

If you have an active account on any other PSC system:
PSC usernames and passwords are the same across all PSC systems. You will use the same username and password on Vera as for your other PSC account.

If you do not have an active account on any other PSC system:
You must create a PSC username and password.

  1. Create your PSC username by completing this form. You will receive an email message when your username has been created.

  2. When your username is ready, create a PSC password (sometimes called "Kerberos password"). Go to the web-based PSC password change utility to set your PSC password.

See PSC password policies.

Apply for an allocation

If you are faculty or staff, request an allocation on Vera

Complete the PSC account request form to request an allocation on Vera for your group. List the usernames for all the members of your group who should have access to this allocation in that form.

If you are a student, get added to a Vera allocation

Have your advisor send email to grants@psc.edu asking to add you to their Vera account. Your advisor will need your PSC username in order to add you.

Changing your PSC password

There are two ways to change or reset your PSC password:

When you change your PSC password, whether you do it via the online utility or via the kpasswd command on a PSC system, you change it on all PSC systems.

How to get help or report a problem

If you have questions or issues at any time, you can send email to help@psc.edu.

Please include only one question or issue per email; it will help us to track and solve any issues more quickly and efficiently.

Be sure to include

  • an informative subject line
  • your username

If the question concerns a particular job, include these in addition:

  • the JobID
  • any error messages you received
  • the date and time the job ran
  • link to job scripts, output and data files
  • the software being used, and versions when appropriate
  • a screenshot of the error or the output file showing the error, if possible

System configuration

The Vera system consists of 41 public (open to anyone) compute nodes.

Node name r001-r006,
r009-020
r007-008,
r021-032
henon01-henon02 henon03-henon08 henon-gpu01
RAM 256GB
DDR4-2133
128GB
DDR4-2133
512GB 512GB 512GB
CPUs 2 Intel Haswell (E5-2695 v3) CPUs; 14 cores/CPU; 2.3 - 3.3 GHz 2 AMD EPYC 7742 CPUs: 64 cores/CPU 2 AMD EPYC 7542 CPUs: 32 cores/CPU 2 AMD EPYC 7542 CPUs: 32 cores/CPU
Cores 28 cores 28 cores 128 cores, 256 threads 64 cores, 128 threads 64 cores
GPUs N/A N/A N/A N/A 2 A100
Cache 35MB LLC 32MB 16MB 16MB
Node-local storage 2 HDDs, 4TB each

 

In addition to the public nodes, there are two restricted nodes, twig and sirius. If you believe that you should have access to twig,please contact Tiziana Di Matteo or Rupert Croft to be added to the required allocations. Sirius is for the use of Dr. Katelyn Brievik's group.

Node name twig sirius
RAM 512GB 1.5TB
CPU 2 AMD EPYC Genoa 9654 CPUs
Cores 64 cores 96 cores/CPU, 192 total
GPUs 4 A100 None
Cache 16MB 384MB/CPU

Connecting to Vera

We take security very seriously! Be sure to read and comply with PSC password policies and guidelines on PSC guidelines on security, resource use, and privacy.

When you connect to Vera, you are connecting to a Vera login node. The login nodes are used for managing files, submitting batch jobs and launching interactive sessions. They are not suited for production computing.

Connect via ssh

Use an ssh client from your local machine to connect to hostname vera.psc.edu using the default port (22). You do not have to specify the port.

Read more about using SSH to connect to PSC systems.

Public-private keys

You can also use public-private key pairs to connect to Vera. To do so, you must first fill out this form to register your keys with PSC.

File spaces

There are two Vera filesystems, hildafs and verafs. You have a home directory on hildafs and a shared group scratch directory on verafs.

Your Vera home directory is /hildafs/home/username, where username is your PSC username. Your home directory has a 5GB quota. Your home directory is backed up.

Your group shares a scratch space available as /verafs/scratch/groupname, where groupname is your grant name. This scratch file space is NOT backed up. You can find your grant name by typing

id -gn

You can check your file usage using the command /opt/packages/allocations/my_quotas. Both your home directory and the scratch space available for your group are shown.

In addition, your project may have shared space in /hildafs/projects/projectname, where projectname is your grant name. If your project does not already have space in /hildafs/projects, you can request it.

Transferring files

You can use rync, scp or sftp to transfer files into and out of Vera.

Data Transfer Nodes for Vera

There are nodes dedicated to handling file transfers into Vera, named data.vera.psc.edu. Using these nodes will make file transfers more efficient.

Paths for Vera file spaces

The start of the full paths for your Vera directories are:

Home directory      /hildafs/home/_username_
Scratch directory   /verafs/scratch/_groupname_/_username_
Project directory   /hildafs/projects/_groupname_/_username_

Transfers into your Vera file spaces

rsync

You can use the rsync command to copy files to and from Vera. A sample rsync command to copy to a Vera directory is

rsync -rltpDvp -e 'ssh -l username' source_directory vera.psc.edu:target_directory

Substitute your username for username. By default, rsync will not copy older files with the same name in place of newer files in the target directory. It will overwrite older files in the target directory.

We recommend the rsync options -rltDvp. See the rsync man page for information on these options and other options you might want to use. We also recommend the option

-oMACS=umac-64@openssh.com

If you use this option your transfer will use a faster data validation algorithm.

You may want to put the rsync command in a loop to insure that it completes. A sample loop is

RC=1
n=0
while [[ $RC -ne 0 && $n -lt 20 ]]
 do
     rsync ...
     RC = $?
     let n = n + 1
     sleep 10
 done

This loop will try your rsync command 20 times. If it succeeds it will exit. If an rsync invocation is unsuccessful the system will try again and pick up where it left off, and will copy only those files that have not already been transferred. You can put this loop, with your rsync command, into a batch script and run it with sbatch.

scp

To use scp for a file transfer you must specify a source and destination for your transfer. The format for either source or destination is

_username_@_machine-name_:path/filename

For transfers involving Vera, username is your PSC username and the machine-name is vera.psc.edu.

An example scp command is

scp file username@vera.psc.edu:/verafs/scratch/groupname/username

File transfers using scp must specify full paths for Vera file systems. See Paths for Vera file spaces for details.

sftp

To use sftp, first connect to the remote machine:

sftp _username_@_machine-name_

When Vera is the remote machine, use your PSC userid as username and vera.psc.edu for the machine name. You will be prompted for your password on the remote machine. If Vera is the remote machine enter your PSC password.

You can then enter sftp subcommands, like put to copy a file from the local system to the remote system, or get to copy a file from the remote system to the local system.

To copy files into Vera you must either cd to the proper directory or use full pathnames in your file transfer commands. See Paths for Vera file spaces for details.

Software

The Module package

The Module package provides for the dynamic modification of a users' environment via module files. Module files manage necessary changes to the environment, such as adding to the default path or defining environment variables, so that you do not have to manage those definitions and paths manually.

Modules are also used to manage multiple versions of applications, tools, and libraries, and where name conflicts between multiple packages would cause problems.

Basic use

To see what modules are available for a software package, type

module avail package-name

To set up the environment for a software package, load that environment with the module load command.

module load package-name

Module commands

Some useful module commands are:

module avail Lists all the available modules
module avail foo Lists all the available modules for package foo
module help foo Displays help on module foo
module display foo Indicates what changes would be made to the environment by loading module foo without actually loading it
module load foo Loads module foo
module list Lists your currently loaded modules
module swap foo1 foo2 Switches loaded module foo1 with module foo2
module unload foo Reverses all changes to the environment made by previously loading module foo

AI module

The AI module provides a rich, unified, Anaconda-based environment for AI, Machine Learning, and Big Data applications. It includes several popular AI/ML/BD packages, selected to work together well.

The AI environment is built for the GPU nodes on Vera. Be sure to use one of the GPU partitions.

Using the AI environment

To see the full list of software included in the AI environment, first load the module and then activate the environment with these commands:

module load AI
source activate $AI_ENV  

To see what is included in $AI_ENV, type

conda list
Customize the AI environment

If you need software that is not in the pre-built environment, you can create a new environment by cloning the PSC defined one and then customizing it. First load the module and activate the PSC defined environment, as above, then clone it with

conda create --name *your-new-environment-name* --clone $AI_ENV

Then you can activate the new environment and proceed with your customization.

Example In this example, the user installs the h5py package in a new environment they are creating. Use the following commands. Note:

  • The conda list command shows what packages are currently installed. Check to see if what you need is already available. The conda list command also shows the version number of the installed packages.

  • The conda create command clones $AI_ENV to create a new environment. This can take a long time, so ask for an hour of time with the interact command.

  • Here, the new environment is named clone-env-1. The --prefix flag names the full path to the where the environment will be stored. You can name the environment anything you like and store it in any directory you like.

interact -gpu -t 01:00:00
 module load AI    # loads the default AI module
 source activate $AI_ENV
 conda list 
 conda create --name clone-env-1 --clone $AI_ENV
 conda activate clone-env-1
 conda install h5py

Conda install will install the newest version of the package. If you want to install a version of the package not available in the public installations use the --revision option to the conda install command.

Installed software

Package Description More information
Anaconda3 Open data science platform Anaconda Home Page
CFITSIO Library of C and Fortran routines for reading and writing data in FITS format FITSIO Home Page
CMake Tools to control the compilation process, build, test and package software CMake Home Page
CUDA NVidia GPU development environment CUDA Home Page
Eigen C++ template library for linear algebra: matrices, vectors, numerical solvers and related algorithms Eigen Home Page
FFTW3 C subroutine library for computing the DFT in one or more dimensions, of arbitrary input size, of both real and complex data FFTW Home Page
GCC GNU compilers Gnu Compiler Home Page
Go Open source programming language The Go Home Page
GSL Gnu Scientific Library GSL - GNU Scientific Library
Intel Intel compilers and MPI library C++ Developer Guide & Reference

Fortran Developer Guide & Reference

Python Powerful, object-oriented programming language Python Home Page
Singularity Open-source software container platform Singularity Home Page

Compilers

Intel and GNU compilers are available on Vera. You must load the compiler module before you can use them.

Compiler type Module load command Command to compile
C C++ Fortran
Intel module load intel icc icpc ifort
Gnu module load gcc gcc g++ gfortran

MPI programming

Three types of MPI are supported on Vera: MVAPICH2, OpenMPI and Intel MPI.

To compile an MPI program, you must do these things:

  • Load the module for the compiler and the MPI type you want to use
  • Load the module for that you want to use
  • Issue the appropriate command to compile your program.

The three MPI types may perform differently on different problems or in different programming environments. If you are having trouble with one type of MPI, please try using another type. Contact help@psc.edu for more help.

Note: When using Intel MPI, you must set the environment variable I_MPI_JOB_RESPECT_PROCESS_PLACEMENT to 0. Otherwise the mpirun task placement settings you give will be ignored. The command to do this is:

For the BASH shell:

export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0

For the CSH shell:

setenv I_MPI_JOB_RESPECT_PROCESS_PLACEMENT 0

Compiler commands for MPI programs

For the Intel compilers

Use the Intel compilers with Load this module Compile with this command
C C++ Fortran
Intel MPI intel mpiicc
note the "ii"
mpiicpc
note the "ii"
mpiifort
note the "ii"
OpenMPI intel mpicc mpicxx mpifort
MVAPICH2 mpi/intel_mvapich mpicc code.c -lifcore mpicxx code.cpp -lifcore mpifort code.f90 -lifcore

For the Gnu compilers

Use the GNU compilers with Load this module Compile with this command
C C++ Fortran
OpenMPI mpi/gcc_openmpi mpicc mpicxx mpifort
MVAPICH mpi/gcc_mvapich

Running jobs

All production computing must be done on Vera's compute nodes, NOT on the login node. The SLURM scheduler (Simple Linux Utility for Resource Management) manages and allocates all of Vera's compute nodes.

To submit a job:

  • create a file called a batch or job script
  • submit that script to a partition (queue) using the sbatch command
  • if you like, check on the job's progress as it waits in the partition and as it is running
  • check the output file for results or any errors when it finishes

A simple example

This section outlines an example which submits a simple batch job. More detail on batch scripts, the sbatch command and its options follow.

Create a batch script

Use any editor you like to create your batch scripts. A simple batch script named hello.job which runs a "hello world" command is given here. Comments, which begin with '#', explain what each line does.

The first line of any batch script must indicate the shell to use for your batch job. Lines beginning with '#SBATCH' are used to define the partition to use, the number of nodes or the walltime requested, and more. See section below, "Options to the sbatch command", for a more complete list.

#!/bin/bash
# use the bash shell
#SBATCH -p RM        
#SBATCH -t 2:10:00   
#SBATCH -N 2         

# The sbatch commands above request:
# run this job in the RM partition
# 2 hours and 10 minutes of walltime
# 2 nodes


set -x    # echo each command to standard out before running it
date      # run the Unix 'date' command
echo "Hello world, from Vera!"   # run the Unix 'echo' command

Submit the batch script to a partition

Use the sbatch command to submit the hello.job script.

[joeuser@m005 ~]$ sbatch hello.job
Submitted batch job 7408623

Note the jobid that is echoed back to you when the job is submitted. Here it is 7408623.

You can also give sbatch options on the command line. Options given on the command line take precedence over those in a job script. To use the RM partition, 2 hours and 10 minutes of walltime, and 2 nodes, the command would be:

[joeuser@m005 ~]$ sbatch -p RM -t 2:10 -N 2 hello.job

Check on the job progress

You can check on the job's progress in the partition by using the squeue command. By default you will get a list of all running and queued jobs. Use the -u option with your username to see only your jobs. See the squeue command for details.

[joeuser@m005 ~]$ squeue -u joeuser
JOBID   PARTITION NAME     USER    ST TIME NODES NODELIST(REASON)
7408623 RM        hello.jo joeuser PD 0:08 1     r005

The status "PD" (pending) in the output here shows that job 7408623 is waiting in the queue. See more about the squeue command below.

When the job is done, squeue will no longer show it:

Check the output file when the job is done

By default, the standard output and error from a job are saved in a file with the name slurm-jobid.out, in the directory that the job was submitted from.

[joeuser@m005 ~]$ more slurm-7408623.out
+ date
Sun Jan 19 10:27:06 EST 2020
+ echo 'Hello world, from Vera!'
Hello world, from Vera!
[joeuser@m005 ~]$

The sbatch command

To submit a batch job, use the sbatch command. The format is

sbatch -options batch-script

The options to sbatch can either be in your batch script or on the sbatch command line. Options in the command line override those in the batch script.

Options to the sbatch command

For more information about these options and other useful sbatch options see the sbatch man page.

Option Description Default
-t HH:MM:SS Walltime requested in HH:MM:SS 30 minutes
-N n Number of nodes requested 1
-p partition-name Chose the partition to run the job in. Note that partition names are case-sensitive. RM
-o filename Save standard out and error in filename. This file will be written to the directory that the job was submitted from slurm-jobid.out
--ntasks-per-node=n

Note the "--" for this option

Request n cores be allocated per node 1
--gpus=type:n

Note the "--" for this option

Specifies the number of GPUs requested. 'type' specifies the type of GPU you are requesting. The only currently valid type is "A100-40"
'n' is the total number of GPUs requested for this job
--mail-type=type

Note the "--" for this option

Send email when job events occur, where type can be BEGIN, END, FAIL or ALL None
--mail-user=user
Note the "--" for this option
User to send email to as specified by -mail-type. Default is the user who submits the job None
-d=dependency-list Set up dependencies between jobs, where dependency-list can be:

after:job_id[:jobid...]
This job can begin execution after the specified jobs have begun execution.

afterany:job_id[:jobid...]
This job can begin execution after the specified jobs have terminated.

aftercorr:job_id[:jobid...]
A task of this job array can begin execution after the corresponding task ID in the specified job has completed successfully (ran to completion with an exit code of zero).

afternotok:job_id[:jobid...]
This job can begin execution after the specified jobs have terminated in some failed state (non-zero exit code, node failure, timed out, etc).

afterok:job_id[:jobid...]
This job can begin execution after the specified jobs have successfully executed (ran to completion with an exit code of zero).

singleton
This job can begin execution after any previously launched jobs sharing the same job name and user have terminated.

None
--no-requeue
Note the "--" for this option
Specifies that your job will not be requeued under any circumstances, even if the node fails. Note the "--" for this option. NA
--time-min=HH:MM:SS
Note the "--" for this option.
Specifies a minimum walltime for your job in HH:MM:SS format

SLURM considers the walltime requested when deciding which job to start next. Free slots on the machine are defined by the number of nodes and how long those nodes are free until they will be needed by another job. By specifying a minimum walltime you allow the scheduler to reduce your walltime request to your specified minimum time when deciding whether to schedule your job. This could allow your job to start sooner.

If you use this option your actual walltime assignment can vary between your minimum time and the time you specified with the -t option. If your job hits its actual walltime limit, it will be killed. When you use this option you should checkpoint your job frequently to save the results obtained to that point.

None
-h Help, lists all the available command options

Vera partitions

There are three public partitions (queues) on Vera which allow access to the public nodes.

Partition Name Nodes
RM r001-r032
HENON henon01-henon-08
HENON-GPU henon-gpu01

The scheduler follows policies to ensure that one user does not dominate the machine. Jobs receive a priority based on the Fair Tree algorithm. When the top priority job will not fit, SLURM will try to schedule the next highest priority job. There are also limits to the number of nodes and cores a user can simultaneously use. Scheduling policies are always under review to ensure best turnaround for users.

In addition, there is one partition which allows access to the non-public twig node:

Partition Name Nodes
TWIG twig

Node, partition, and job status information

sinfo

The sinfo command displays information about the state of Vera's nodes. The nodes can have several states:

alloc Allocated to a job
down Down - not available for scheduling
drain Not available for scheduling
idle Free - available for scheduling
resv Reserved - part of a reservation; not available outside of that reservation

For more information see the sinfo man page.

squeue

The squeue command displays information about the jobs in the partitions. Some useful options are:

-j jobid Displays the information for the specified jobid
-u username Restricts information to jobs belonging to the specified username
-l (long) Displays information including: time requested, time used, number of requested nodes, the nodes on which a job is running, job state and the reason why a job is waiting to run.

See the squeue man page for a discussion of the codes for job state, for why a job is waiting to run, and more options.

scancel

The scancel command is used to kill a job in a partition, whether it is running or still waiting to run. Specify the jobid for the job you want to kill. For example,

scancel 12345

kills job # 12345.

For more information, see the scancel man page.

sacct

The sacct command can be used to display detailed information about jobs. It is especially useful in investigating why one of your jobs failed. The general format of the command is

sacct -X -j _nnnnnn_ -S MMDDYY --format parameter1,parameter2, ...

Where

  • nnnnn is the jobid of the job you are investigating.
  • the date given for the -S option is the date at which sacct begins searching for information about your job.
  • the commas between the parameters in the --format option cannot be followed by spaces.

The --format option determines what information to display about a job. Useful parameters are

  • JobID
  • Account - the account id
  • ExitCode - useful in determining why a job failed
  • State - useful in determining why a job failed
  • Start, End, Elapsed - start, end and elapsed time of the job
  • NodeList - list of nodes used in the job
  • NNodes - how many nodes the job was allocated
  • MaxRSS - how much memory the job used
  • AllocCPUs - how many cores the job was allocated

For more information, see the sacct man page.

Monitoring memory usage

It can be useful to find the memory usage of your jobs. For example, you may want to find out if memory usage was a reason a job failed.

You can determine a job's memory usage whether it is still running or has finished. To determine if your job is still running, use the squeue command.

squeue -j nnnnnn -O state

where nnnnnn is the jobid.

For running jobs: srun and top or sstat

You can use the srun and top commands to determine the amount of memory being used.

srun --jobid=nnnnnn top -b -n 1 | grep username

Where

  • nnnnnn is the jobid
  • username is your username

The RES field in the output from top shows the actual amount of memory used by a process. The top man page can be used to identify the fields in the output of thetop command.

See the man pages for srun and top for more information.

You can also use the sstat command to determine the amount of memory being used in a running job

sstat -j nnnnnn.batch --format=JobID,MaxRss

where _nnnnnn _is your jobid.

See the man page for sstat for more information.

For jobs that are finished: sacct or job_info

If you are checking within a day or two after your job has finished you can issue the command

sacct -j nnnnnn --format=JobID,MaxRss

If this command no longer shows a value for MaxRss, use the job_info command

job_info nnnnnn | grep max_rss

Substitute your jobid for nnnnnn in both of these commands.

See the man page for sacct for more information.

For more information, see the online documentation for SLURM, including man pages for all the SLURM commands.

OnDemand

The OnDemand interface allows you to conduct your research on Vera through a web browser. You can manage files - create, edit and move them - submit and track jobs, see job output, check the status of the queues, run a Jupyter notebook through JupyterHub and more, without logging in to Vera via traditional interfaces.

OnDemand was created by the Ohio Supercomputer Center (OSC). In addition to this document, you can check the extensive documentation for OnDemand created by OSC, including many video tutorials, or email help@psc.edu.

Start OnDemand

To connect to Vera via OnDemand, point your browser to https://ondemand.vera.psc.edu.

  • You will be prompted for a username and password. Enter your PSC username and password.
  • The OnDemand Dashboard will open. From this page, you can use the menus across the top of the page to manage files and submit jobs to Vera.

To end your OnDemand session, choose Log Out at the top right of the Dashboard window and close your browser.

 

Manage files

To create, edit or move files, click on the Files menu from the Dashboard window. A dropdown menu will appear, listing file spaces on Vera, including your home directory, the path to your projects home directory, and /opt/packages.

Choosing one of the file spaces opens the File Explorer in a new browser tab. The files in the selected directory are listed. You can always jump to your home directory by clicking the folder icon labeled Home Directory in the left panel.

There are two sets of buttons in the File Explorer.

Buttons on the top left just below the name of the current directory allow you to View, Edit, Rename, Download, Copy or Paste (after you have moved to a different directory) the selected file(s), or you can select or unselect all files with (Un)Select All.

image  

Buttons in the top right perform these functions:

Go To Navigate to another directory or file system
Open in Terminal Open a terminal window on Vera in a new browser tab
New File Creates a new empty file
New Dir Create a new subdirectory
Upload Copies a file from your local machine to Vera
Show Dotfiles Toggles the display of dotfiles
Show Owner/Mode Toggles the display of owner and permisson settings

 

image

 

Create and edit jobs

You can create new job scripts, edit existing scripts, and submit those scripts to Vera through OnDemand.

From the top menus in the Dashboard window, choose Jobs > Job Composer. A Job Composer window will open.

There are two tabs at the top:Jobs and Templates.

Under the Jobs tab, you will see two buttons: +New Job and Create Template.

If you have never run a job through OnDemand, a help prompt opens under the +New Job button. It will walk you through the steps for creating, submitting, stopping or deleting a job.

If you have run jobs through OnDemand previously, the jobs are listed.

 

Create a new job script

To create a new job script:

1. Select a template to begin with
2. Edit the job script
3. Edit the job options
1. Select a template
  1. Go to the Jobs tab in the Job Composer window. You have been given a default template, named Simple Sequential Job.
  2. To create a new job script, click the blue + New Job > From Default Template button in the upper left. ![image](imgs/NewJob.jpg) You will see a green message at the top of the window, "Job was successfully created".

At the right of the Jobs window, you will see the Job Details, including the location of the script and the script name (by default, main_job.sh). Under that, you will see the contents of the job script in a section titled Submit Script.

 

image

2. Edit the job script

Edit the job script so that it has the commands and workflow that you need.

If you do not want the default settings for a job, you must include options to change them in the job script. For example, you may need more time or more than one node. Use an SBATCH directive in the job script to set these options.

There are two ways to edit the job script: using the Edit Files button or the Open Editor button. First, go to the Jobs tab in the Jobs Composer window.

Either: Find the blue Edit Files tab at the top of the window

image

 

Or: Find the Submit Script section at the bottom right. Click the blue Open Editor button.

 

image

 

 

In either case, an Editor window opens. Make the changes you want and click the blue Save button.

 

image

After you save the file, the editor window remains open, but if you return to the Jobs Composer window, you will see that the content of  your script has changed.

3. Edit the job options

In the Jobs tab in the Jobs Composer window, click the blue Job Options button.

image

 

 

The options for the selected job such as name, the job script to run, and the account to run it under are displayed and can be edited. Click Reset to revert any changes you have made. Click Save or Back to return to the job listing (respectively saving or discarding your edits).

 

image

Submit jobs to Vera

Select a job in the Jobs tab in the Jobs Composer window. Click the green Submit button to submit the selected job. A message at the top of the window shows whether the job submission was successful or not. If it is not, you can edit the job script or options and resubmit. When the job submits successfully, the status of the job in the Jobs Composer window will change to Queued or Running. When  the job completes, the status will change to Completed.

 

image

JupyterHub

You can run JupyterHub notebooks, through OnDemand.

  1. Select Interactive Apps >> Jupyter Notebooks from the top menu in the Dashboard window.
  2. In the screen that opens, specify the timelimit, number of nodes, and partition to use. If you have mutiple grants on Vera, you can also designate the account to deduct this usage from.
  3. Use the Extra Args field to specify the number of cores or number of GPUs you want.
    • If you will use one of the GPU partitions, use the -gpus flag for the number of GPUs per node that you want. See the Running jobs section of this User Guide for more information on partitions and the options available.
  4. Use the Extra Jupyter Args field to pass arguments to your Jupyter notebook.
  5. Click the blue Launch button to start your JupyterHub session. You may have to wait in the queue for resources to be available.
  6. ![Screen to launch an interactive JupyterHub session](imgs/vera-jupyter-args.jpg)
  7. When your session starts, click the blue Connect to Jupyter button. The Dashboard window now displays information about your JupyterHub session including which node it is running on, when it began, and how much time remains. A new window running JupyterHub also opens. Note the three tabs: Files, Running and Clusters.Files

    By default you are in the Files tab, and it displays the contents of your Bridges home directory. You can navigate through your home directory tree.

    Running

    Under the Running tab, you will see listed any notebooks or terminal sessions that you are currently running.

  8. Now you can start a Jupyter notebook:
    1. To start a Jupyter notebook which is stored in your home directory space, in the Files tab, click on its name. A new window running the notebook opens.
    2. To start a Jupyter notebook which is stored in your ocean directory, you must first create a symbolic link to it from your home directory. While in your home directory, use a command like
      ln -s /ocean/projects/yourgroup/youruserid OCEANDIR
      When you enter JuypterHub, you will see the entry OCEANDIR in your list of files under the Files tab. Click on this to be moved to your directory.

Errors

If you get an "Internal Server Error" when starting a JupyterHub session, you may be over your home directory quota. Check the Details section of the error for a line like:

#<ActionView::Template::Error: Disk quota exceeded @ dir_s_mkdir - /home/joeuser/ondemand/data/sys/dashboard/batch_connect/sys/jupyter_app...............

You can confirm that you are over quota by opening a Bridges-2 shell access window and typing

du -sh

This command shows the amount of storage in your home directory.  Home directory quotas are 10GB. If du -sh shows you are near 10GB, you should delete or move some files out of your home directory.  You can do this in OnDemand in the File Explorer window or in a shell access window.

When you are under quota, you can try starting a JupyterHub session again.

 

Stopping your JupyterHub session

In the Dashboard window, click the red Delete button.

Security guidelines and policies

See the PSC Policies page for PSC policies regarding privacy, security and the acceptable use of PSC resources. Questions about any of these policies should be directed to help@psc.edu.