Work on HPC
Last update: 2022/05/02
Here we take NCAR's Cheyenne and Casper as an exmaple:
Interactive Jobs on CPU
The command is:
qsub -X -I -l select=1:ncpus=36:mpiprocs=36 -l walltime=06:00:00 -q regular -A project_code
Interactive Jobs on GPU
execcasper -l gpu_type=v100 -l walltime=06:00:00 -l select=1:ncpus=18:mpiprocs=36:ngpus=1:mem=300GB -A project_code
Activate a conda environment
An example bash script (note the difference between the lines of "echo")
Cheyenne:
#!/bin/bash
source /glade/work/zhonghua/miniconda3/bin/activate aws_urban
echo "ssh -N -L 8880:`hostname`:8880 $USER@cheyenne.ucar.edu"
jupyter notebook --no-browser --ip=`hostname` --port=8880
Casper:
#!/bin/bash
source /glade/work/zhonghua/miniconda3/bin/activate aws_urban
echo "ssh -N -L 8880:`hostname`:8880 $USER@casper.ucar.edu"
jupyter notebook --no-browser --ip=`hostname` --port=8880
Create a conda environment
conda create -n myenv python=3.8
conda activate myenv
conda install -c conda-forge numpy pandas xarray netcdf4 matplotlib jupyterlab scikit-learn intake-esm s3fs
Create a conda environment from a file
conda env create -f environment.yml
channels:
- conda-forge
- defaults
dependencies:
- python=3.8.0
- intake-esm
- s3fs
- jupyterlab
- matplotlib
- xarray
- pandas
- scikit-learn
- tqdm
- flaml=1.0.0
- fastparquet
- pip
- pip:
- netCDF4==1.5.8
- haversine
name: myenv
Removing a conda environment
conda remove --name myenv --all
Jobs on GPU
Please edit:
- job Name, project code, mail recipient,
source XXX
#!/bin/bash -l
### Job Name
#PBS -N your_job_nmae
### Project code
#PBS -A your_project_code
#PBS -l walltime=12:00:00
#PBS -q casper
#PBS -l gpu_type=v100
### Merge output and error files
#PBS -j oe
### Select 1 nodes with 36 CPUs
#PBS -l select=1:ncpus=36:ngpus=1:mem=300GB
### Send email on abort, begin and end
#PBS -m abe
### Specify mail recipient
#PBS -M your_email
### reference: https://arc.ucar.edu/knowledge_base/72581396
### reference: https://github.com/zzheng93/code_DSI_India_AutoML/blob/main/2_automl/automl/flaml_clusters/train_clusters.sub
source /glade/work/zhonghua/miniconda3/bin/activate tabnet
python train_random.py
Checking memory use
See here: https://arc.ucar.edu/knowledge_base/72581501
Using Dask on HPC Systems
See here: https://www2.cisl.ucar.edu/events/tutorial-using-dask-hpc-systems
Reference
- Starting Cheyenne jobs: https://arc.ucar.edu/knowledge_base/72581258
- Starting Casper jobs with PBS: https://arc.ucar.edu/knowledge_base/72581396
- Managing environments: https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
- CISL Learning Center: https://www2.cisl.ucar.edu/what-we-do/cisl-learning-center