Work on HPC

Last update: 2022/05/02

Here we take NCAR's Cheyenne and Casper as an exmaple:

Interactive Jobs on CPU

The command is:

qsub -X -I -l select=1:ncpus=36:mpiprocs=36 -l walltime=06:00:00 -q regular -A project_code

Interactive Jobs on GPU

execcasper -l gpu_type=v100 -l walltime=06:00:00 -l select=1:ncpus=18:mpiprocs=36:ngpus=1:mem=300GB -A project_code

Activate a conda environment

An example bash script (note the difference between the lines of "echo")

Cheyenne:

#!/bin/bash
source /glade/work/zhonghua/miniconda3/bin/activate aws_urban
echo "ssh -N -L 8880:`hostname`:8880 $USER@cheyenne.ucar.edu"
jupyter notebook --no-browser --ip=`hostname` --port=8880

Casper:

#!/bin/bash
source /glade/work/zhonghua/miniconda3/bin/activate aws_urban
echo "ssh -N -L 8880:`hostname`:8880 $USER@casper.ucar.edu"
jupyter notebook --no-browser --ip=`hostname` --port=8880

Create a conda environment

conda create -n myenv python=3.8
conda activate myenv
conda install -c conda-forge numpy pandas xarray netcdf4 matplotlib jupyterlab scikit-learn intake-esm s3fs

Create a conda environment from a file

conda env create -f environment.yml
channels:
- conda-forge
- defaults

dependencies:
- python=3.8.0
- intake-esm
- s3fs
- jupyterlab
- matplotlib
- xarray
- pandas
- scikit-learn
- tqdm
- flaml=1.0.0
- fastparquet
- pip
- pip:
  - netCDF4==1.5.8
  - haversine
name: myenv

Removing a conda environment

conda remove --name myenv --all

Jobs on GPU

Please edit:

  • job Name, project code, mail recipient, source XXX
#!/bin/bash -l
### Job Name
#PBS -N your_job_nmae
### Project code
#PBS -A your_project_code
#PBS -l walltime=12:00:00
#PBS -q casper
#PBS -l gpu_type=v100
### Merge output and error files
#PBS -j oe
### Select 1 nodes with 36 CPUs 
#PBS -l select=1:ncpus=36:ngpus=1:mem=300GB
### Send email on abort, begin and end
#PBS -m abe
### Specify mail recipient
#PBS -M your_email

### reference: https://arc.ucar.edu/knowledge_base/72581396
### reference: https://github.com/zzheng93/code_DSI_India_AutoML/blob/main/2_automl/automl/flaml_clusters/train_clusters.sub

source /glade/work/zhonghua/miniconda3/bin/activate tabnet
python train_random.py

Checking memory use

See here: https://arc.ucar.edu/knowledge_base/72581501

Using Dask on HPC Systems

See here: https://www2.cisl.ucar.edu/events/tutorial-using-dask-hpc-systems

Reference

results matching ""

    No results matching ""