Skylight (2017- )

Skylight @ MERCURY User Guide

Skylight is MERCURY’s newest HPC acquisition purchased using funds from NSF MRI grant. Its operation and hosting is generously supported by the Offfice of Furman’s Vice President for Academic Affairs and Provost in collaboration with Clemson University. Skylight is named after one of the peaks on the Adirondack mountains. This follows a naming tradition dating back to Marcy (2013- ), Haystack (2009-2013), Saddleback, Iroquois, Avalanche and many others.   

 

This system is managed by the MERCURY Consortium. Please send all questions and comments to support@mercurysonsortium.org. If you already have an account on Marcy, you will also have access to Skylight. If you need a new account, we can create one at the request of your advisor. Advisors can e-mail the student’s full name and email address to support@mercurysonsortium.org to request an account.

 


 

Hardware Configuration

  • 1  Master/login node: 20 Intel E5-2640v4 2.4GHz cores, 128GB RAM, 1 TB mirrored disk
  • 1  Storage (I/O) node: 20 Intel E5-2640v4 2.4GHz cores, 64GB RAM, 23 TB disk array
  • 22 STDMEM compute nodes (node01-22): 20 Intel E5-2640v4 2.4GHz cores, 128GB RAM, 1 TB disk
  • 3  HIMEM compute nodes (himem01-03): 28 Intel E5-2680v4 2.4GHz cores, 512GB RAM, 1 TB disk
  • 6  GTX1080 GPU compute nodes (gpu01-06): 20 Intel E5-2640v4 2.4GHz cores, 64GB RAM, 1 TB disk + 4x Nvidia GTX 1080 GPUs
  • 2  P100 GPU compute nodes (gpup01-02): 20 Intel E5-2640v4 2.4GHz cores, 128GB RAM, 1 TB disk + 1x Nvidia P100GPU

 

The machine has 35 nodes including the login and storage node. 

Skylight.cluster (login node)

  • (2) Intel Broadwell E5-2640v4, 2.40GHz, 10-Core, 95Watt Processor(s)
  • 128GB Memory (8 x 16GB), 6.4GB of memory per core 
  • (2) 1TB Hard Drive(s) (mirrored RAID 1)

This login node is accessible from anywhere. Most system services run on this node and it should only be used to compile code, push changes to the compute nodes, set up calculations and perform light analysis. Please do not run any interactive jobs on this node. All jobs should be submitted from this host.

IIO (storage node)

  • (2) Intel Broadwell E5-2640v4, 2.40GHz, 10-Core, 95Watt Processor(s)
  • 64GB Memory (8 x 8GB)
  • (27) 1.2TB SATA SSDs (RAID 6, 23TB Usable Storage)
     

This is the main storage node where /home and /usr/local directories reside. The login and all compute nodes mount /home and /usr/local from this machine. You should never need to log into this machine and do any “work” on it. Overloading this server with other user processes will degrade the performance of the rest of the computers in the cluster. The only exception to this is the first time logging into the system so that your user “home” directory gets created. If for some reason your home directory were removed, logging back into this server would re-create a missing home directory.

Compute nodes

The system has 22 stdmem (128GB RAM/node) nodes, 3 himem (512GB RAM/node) nodes, 6 GPU-containing four GTX 1080 GPUs each, and 2 nodes with 1 P100 GPU each.

STDMEM nodes

22 nodes (called node01-node22)

  • (2) Intel Broadwell E5-2640v4, 2.40GHz, 10-Core, 90Watt Processor(s)
  • 128GB Memory (8 x 16GB), 6.4GB of memory per core
  • (1) 1TB SATA HDD

HIMEM nodes

3 nodes (called himem01-03)

  • (2) Intel Broadwell E5-2680v4, 2.40GHz, 14-Core, 120Watt Processor(s)
  • 512GB Memory (24 x 32GB), 18.3GB of memory per core
  • (1) 1TB SATA HDD

GPU nodes with NVIDIA GTX 1080 GPUs

6 nodes (called gpu01-06)

  • (2) Intel Broadwell E5-2640v4, 2.40GHz, 10-Core, 90Watt Processor(s)
  • 64GB Memory (8 x 8GB), 3.2GB of memory per core
  • (1) 1TB SATA HDD
  • (4) NVIDIA GeForce GTX1080, 8GB GDDR5X, 2560 CUDA cores

 

GPU nodes with NVIDIA Tesla P100 GPUs

2 nodes (called gpup01-02)

  • (2) Intel Broadwell E5-2640v4, 2.40GHz, 10-Core, 90Watt Processor(s)
  • 128GB Memory (8 x 16GB), 6.4GB of memory per core
  • (1) 1TB SATA HDD
  • (1) NVIDIA Tesla P100 PCI-E 12GB GDDR5 Passive Single GPU

A brief comparison of the two GPU types in Skylight is shown in the next table.

Features GeForce GTX 1080

(Pascal architecture)

Tesla P100

(Pascal architecture)

Number and Type of GPU 1 Pascal GP104 1 Pascal GP110
Peak double precision floating point performance ~0.3 TFLOPS* 4.7 TFLOPS*
Peak single precision floating point performance 8.2 TFLOPS 9.3 TFLOPS
Memory bandwidth (ECC off) 400 GB/s 549 GB/sec
Memory size (GDDR5X) 8 GB 12 GB
CUDA cores 2560 3584

 

Access

Your login credentials and home directories have been migrated from Marcy to Skylight. That allows users with accounts on Marcy to automatically access Skylight as well.

Skylight is accessible from anywhere via SSH using SSH keys only. You will not be able to log in to Skylight directly using passwords. Here are three ways you can log in to Skylight. 

  1. Using SSH keys: if you log in to Marcy with SSH keys from a certain machine, you should be able to log in directly to Skylight via  
    • ssh username@skylight.furman.edu 
  2. Through Marcy in two steps: You can log in to Marcy first and hop into Skylight 
    • ssh username@marcy.furman.edu
      • ssh skylight
  3. Through Marcy in one step: You can tunnel your SSH traffic to Skylight through Marcy
    • ssh -t user@marcy.furman.edu ssh skylight

Direct SSH to Skylight using passwords is currently unavailable. If you have any questions or encounter any problems, please email support@mercuryconsortium.org.


 

Available Software

The cluster has a lot of software stored in /usr/local/Dist and more will be added as necessary. Some essential software and environments are set at login. but the vast majority of the scientific software is provisioned using the Modules software environment management tool. When you load a specific application module, the proper environmental variables, dependencies and paths that are needed to execute that application are set and provided for you. The most recent version(s) of commonly used applications in computational chemistry are compiled to run optimally on Skylight and and provided as modules. If you need other applications or older versions of these applications, please send a request to support@mercuryconsortium.org and we will do our best to make them available to you quickly.

Chemistry

The following applications are freshly compiled and tested to run on Skylight. The applications’ capability to run on GPUs has not been tested extensively yet except for the case of AMBER16. We’ll add GPU support for all capable applications over time.

  1. Gaussian16 A.03
  2. Gaussian09 A.02 + D.03
  3. Orca 3.0.3 + 4.0.0
  4. GAMESS 2016
  5. AMBER 12, 14
  6. AMBER 16 w/ GPU support for both NVIDIA GTX 1080 and P100 cards
  7. LAMMPS 03-31-2017 
  8. NAMD 2.12 w/ GPU support
  9. CP2K 2.6 + 3.0 + 4.1 
  10. Desmond 2016
  11. libEFP

The following will be available shortly:

  1. NWchem
  2. PSI4
  3. CFOUR
  4. GROMACS
  5. OpenMM
  6. Quantum Espresso
  7. Siesta
  8. CPMD
  9. DFTB+
  10. cluster 

General Tools

If you want to compile your own code, these general libraries are available.

  1. Intel Compilers (16, 17)
  2. Intel MKL libraries (16, 17)
  3. Intel MPI (17)
  4. GCC (4.8.5)
  5. OpenMPI (1.6, 1.8, 2.0)
  6. MVAPICH2 (2.2)
  7. MPICH (3.3)
  8. Python (2.7.5, 3.3)

Software Provisioning using Modules

Modules are used to dynamically modify the user’s shell environment as needed. You can see what modules are loaded by default by executing ‘module list’ right after logging in. Here are common module commands and their description:

module avail List the available modules. Note that if there are multiple versions of a single package that one will be denoted as (default). If you load the module without a version number you will get this default version.
module whatis List all the available modules along with a short description.
module load MODULE Load the named MODULE.
module unload MODULE Unload the named MODULE, reverting back to the OS defaults.
module list List all the currently loaded modules.
module help Get general help information about modules.
module help MODULE Get help information about the named module.
module show MODULE Show details about the module, including the changes that loading the module will make to your environment.

The next example demonstrates the use of modules to set up one’s environment to do a GPU calculation using AMBER16

Let us start by seeing what modules are loaded by default

user@master[~] module list
Currently Loaded Modulefiles:
 1) null 2) modules 3) use.own

AMBER16 is not loaded by default. So, if one tries to find AMBER16 binaries, they would not be available.

user@master[~] which pmemd.cuda
pmemd.cuda not found

AMBER16 is not loaded by default. So, if one tries to find AMBER16 binaries, they would not be available.

user@master[~] module load amber/16-cuda
user@master[~] which pmemd.cuda
/usr/local/Dist/amber/amber16/bin/pmemd.cuda

Now that the right binary is in your path, you can run AMBER16. 

You can execute module avail to see available modules and get documentation about running these modules through module show module_name

user@master[~] module avail

--------------------- /usr/local/Modules/modulefiles ----------------------------------------------------------------------------------
amber/12            cpmd/3.17.1         gamess/default      module-info         psi4/07-13
amber/14            cpmd/default        gaussian/default    modules             psi4/default
amber/16            cuda/6.0            gaussian/g09-A02    namd/2.12-ib        psi4/miniconda
amber/16-cuda       cuda/8.0            gaussian/g09-D01    namd/2.12-ib-cuda   python/2.7.13
amber/default       cuda/default        gaussian/g16-A03    namd/2.12-omp       python/2.7-intel
cfour/1.0           desmond/2016-4      gromacs/5.0-beta    namd/default        python/3.3.0
cluster/1.2         dftb+/1.2.2-mpi     InVEST              null                qe/6.1
cluster/1.3         dftb+/1.2.2-serial  lammps/3-2017       nwchem/6.3          scalapack/2.0.2-gnu
cluster/default     dock6/6.7           libefp/1.4.2        nwchem/6.5          siesta/3.2
cp2k/2.6.2          dot                 miniconda/psi4      nwchem/6.6          use.own
cp2k/3.0            espresso/5.1        mlpr/2015.1.14      nwchem/default
cp2k/4.1            fftw/3.3.3          mlpr/default        openmm/5.4
cp2k/default        gamess/2016         module-git          openmm/6.1

-------------------------- /home/user/privatemodules --------------------------------------------------------------------------------------- 
null 

user@master[~] module show gamess/2016

-------------------------------------------------------------------
/usr/local/Modules/modulefiles/gamess/2016:

module-whatis Adds `/usr/local/Dist/gamess/16' to your 'PATH/LD_LIBRARY/MANPATH' environment
module-whatis To run Gamess calculations, use the rungamess.csh script.
module-whatis Usage: rungms inputfile ScratchDirName NumberOfProcesses/CoresToUse
module-whatis Eg: rungms test.inp myScrDir 16
module-whatis Eg: rungms exam01.inp $SLURM_JOB_ID  $SLURM_NTASKS
module-whatis Or adapt the script (/usr/local/Dist/bin/rungamess-16.csh for your purposes.

module load intel impi
prepend-path PATH /usr/local/Dist/gamess/16
-------------------------------------------------------------------

 


Running Calculations

Slurm Job Scheduler

Unlike our past HPC systems where we used SunGridEngine (SGE) or Torque(OpenPBS)+Maui for job scheduling, we have switched to Slurm for Skylight. From users’ standpoint, Slurm looks very much like the familiar PBS, except with #SBATCH instead of #PBS directives. For a quick overview of Slurm, please visit this page.
 
Here are some common Slurm commands and their description:

The table below shows a summary of SLURM commands. These commands are described in more detail below along with links to the SLURM doc site.

  SLURM SLURM Example
Submit a batch job sbatch sbatch run.slurm
Run a script interatively srun srun --pty -p interact -t 10 --mem 1000 /bin/bash /bin/run.csh
Kill a job scancel scancel 1542
View status of queues squeue squeue -u username

A typical batch submission file to run a Gaussian calculation would look like this:

#!/bin/tcsh
#SBATCH -p stdmem                   # Queue/Partition to submit to
#SBATCH -n 20                       # Number of cores or tasks per node (--ntasks-per-node=20)
#SBATCH -N 1                        # Number of nodes (--nodes=1) ;should be 1 in most cases)
#SBATCH -t 48:00:00                 # Runtime in D-HH:MM:SS
#SBATCH --mem=100G                  # Total Memory needed for all cores (see also --mem-per-cpu)
#SBATCH -o jTest_%j.err             # File to which STDOUT & STDERR will be written
#SBATCH --job-name="jTest"          # Job name that will appear in the queue
#SBATCH --mail-type=END             # Type of email notification- BEGIN,END,FAIL,ALL
#SBATCH --mail-user=user@host.edu   # Email to which notifications will be sent

source /usr/share/Modules/init/tcsh              #necessary to make sure modules work in TCSH shell
set echo

module load gaussian/g16-A03                     #Load the right Gaussian module
setenv GAUSS_SCRDIR /scratch/$GROUP/$USER        #Optionally set scratch directory path
g16 test.com                                     #Run a test calculation

 

 

Slurm Queues

The queue policies are kept simple at the moment, but more features will be added over time to ensure optimal and fair use.

Submit
Queue
Execution
Queue
Nodes Available
Cores
Max
Wallclock
Max
Memory
Run Limit per User Restrictions
stdmem stdmem node01-22 20 unlimited 128GB None
himem himem himem01-03 28 unlimited 512GB None
gpu gpu gpu01-06 20 unlimited 64GB need to request a GTX 1080 GPU
gpup gpup gpup01-02 20 unlimited 128GB need to request a P100 GPU
               

The nodes are divided based on their specs as follows.

 

Example Runs and Template Submission Files

The easiest way to learn how to run calculations is to look at the examples provided below. The Slurm batch submission files are named run.slurm and all the input and output files from the particular calculation are also provided in the same directory. Good luck and happy computing! 

[ICO] Name Last modified Size Description

       
[DIR] AMBER/ 2017-06-21 09:23  
[DIR] GAMESS/ 2017-06-23 16:47  
[DIR] Gaussian/ 2017-06-26 14:10  
[DIR] MOPAC/ 2017-06-26 14:24  
[DIR] NAMD/ 2017-06-25 22:29  
[DIR] NWChem/ 2017-06-26 14:13  
[DIR] ORCA/ 2017-06-19 18:04  
[DIR] Psi4/ 2017-06-26 14:12  
[DIR] cp2k/ 2017-06-25 19:30  
[DIR] dftb+/ 2017-06-26 14:13  
[DIR] lammps/ 2017-06-25 19:31  
[DIR] libEFP/ 2017-06-19 11:28