Frequently Asked Questions - Research Co

Frequently Asked Questions - Research Computing Center Wiki
Frequently Asked Questions
From Research Computing Center Wiki
Jump to navigation
Jump to search
Contents
Connecting
1.1
How do I connect to GACRC clusters?
1.2
I received an SSH host key error when trying to connect to a GACRC cluster. What does this mean?
1.3
How do I ssh into a specific login node, if I have a tmux session running there?
Files
2.1
How do I copy files to/from GACRC storage?
2.2
Can I use text files (programs, scripts, etc) created on a Windows machine on the GACRC Unix/Linux machines?
2.3
Can I use text files (programs, scripts, etc) created on a Mac machine on the GACRC Unix/Linux machines?
2.4
Can I leave my files in my /scratch directory?
Storage
3.1
Why can't I see my lab's /project directory?
3.2
My data in /scratch disappeared. What happened?
3.3
Is GACRC storage backed up?
Software
4.1
What software is available on GACRC clusters?
4.2
Can I install software myself on GACRC clusters?
4.3
How do I access R libraries and Python modules on GACRC clusters?
4.3.1
R Libraries
4.3.2
Python Modules
4.4
What is Singularity?
4.5
How do I request an application be installed on a GACRC cluster?
4.6
My software requires a database, can you help?
4.7
Can I install web services on GACRC clusters?
4.8
How can I use the Gaussian software on Sapelo2?
Using GACRC Clusters
5.1
I'm brand new to high performance computing. Where do I start?
5.2
Can I use a shell other than Bash?
5.3
Why doesn't the ls command give me colored output?
5.4
How do I use GUI applications on GACRC clusters from my Windows desktop?
5.5
How do I use GUI applications on GACRC clusters from my Mac?
5.6
Why did I receive an email from Arbiter?
5.7
Can I connect to GACRC clusters via Visual Studio Code?
Slurm Jobs
6.1
How can I check on the status of my job(s)?
6.2
I submitted my job, but I don't see anything in the output of squeue --me
6.3
Why is my job pending?
6.4
How do I know how much resources to request for my job?
6.5
How much time, memory, and how many cores can I request for my jobs?
6.6
What is an array job?
6.7
Can I add more time to my running job(s)?
6.8
Can I receive an email when my job starts or finishes?
6.9
Why is my job running in a scavenge_p partition?
Training
7.1
What training does GACRC offer?
7.2
How do I sign up for GACRC training?
7.3
Is GACRC training done in person?
7.4
Does GACRC have any training videos?
Support
8.1
How do I get GACRC support?
8.2
What is the scope of GACRC support?
Accounts
9.1
How do I apply for accounts on GACRC clusters?
9.2
What do I do if I've changed lab groups or am collaborating with another lab?
9.3
Will I still have access to GACRC Clusters after leaving UGA?
9.4
Can my external collaborators get access to Sapelo2?
9.5
Will I still have access to the Teaching Cluster once the semester is over?
10
GACRC
10.1
What compute platforms are available at GACRC?
10.2
How do I acknowledge the GACRC in my publication?
Connecting
How do I connect to GACRC clusters?
Video instructions:
Connecting to Sapelo2 from Windows
Connecting to Sapelo2 from Mac
Connecting to Sapelo2 from Linux
Users can access GACRC clusters using secure shell (ssh) from their local machines either on-campus or off-campus. To connect via ssh, you must have an ssh software on your local machine and a connection to the UGA campus network. ssh software is included in recent releases of Unix based operating systems (including Linux and Mac OSX). If you are using a Windows computer, you can download and install PuTTY. You can find detailed instructions on how to download and install PuTTY on your Windows computer at
Please note that connecting to GACRC clusters from off-campus requires connecting to the
UGA VPN
. For more detailed information on how to connect to a specific GACRC cluster, please see the
Connecting
page.
I received an SSH host key error when trying to connect to a GACRC cluster. What does this mean?
If you’ve received a warning message when attempting to connect Sapelo2 regarding the host key verification failing, this likely means you need to update your SSH known_hosts file on your local machine, by deleting the line that begins with “sapelo2.gacrc.uga.edu” (or the hostname of the GACRC machine you're trying to connect to). This can be done quickly with the following commands on Mac and Linux. This can happen as individual servers are moved into and out of our login node pool over time.
Connecting from MacOS or Linux
Users connecting from a MacOS or a Linux system might see an error like this:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: POSSIBLE DNS SPOOFING DETECTED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
The ECDSA host key for sapelo2 has changed,
and the key for the corresponding IP address 128.192.75.18
is unchanged. This could either mean that
DNS SPOOFING is happening or the IP address for the host
and its host key have changed at the same time.
Offending key for IP in /Users/jsmith/.ssh/known_hosts:76
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:E1ovq19vLNYNF1eFiOQ91tc1EPtbHcMhML2I45UrJrE.
Please contact your system administrator.
Add correct host key in /Users/jsmith/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /Users/jsmith/.ssh/known_hosts:25
ECDSA host key for sapelo2 has changed and you have requested strict checking.
Host key verification failed.
To fix this problem, you will need to remove the keys belonging to the host,
sapelo2.gacrc.uga.edu
. This can be done by manually deleting all lines corresponding to the host,
sapelo2.gacrc.uga.edu
, in the
~/.ssh/known_hosts
file, or by executing the command:
ssh-keygen -R sapelo2.gacrc.uga.edu
Once you have done this, you should be able to ssh into sapelo2.gacrc.uga.edu. You might still get a message like this:
[jsmith@laptop]$ ssh jsmith@sapelo2.gacrc.uga.edu
The authenticity of host 'sapelo2.gacrc.uga.edu' can't be established.
ECDSA key fingerprint is SHA256:ikdjggjeorjgnkresitnsgjsms
ECDSA key fingerprint is MD5:be:1xxxxxxxxxxxx
Are you sure you want to continue connecting (yes/no)?
You can type
yes
and your connection should work.
Connecting from Windows
When connecting from Windows for the first time after the maintenance, users might encounter an error like
POTENTIAL SECURITY BREACH
or
HOST IDENTIFICATION HAS CHANGED
. Users can click
Yes
to continue the connection and have a new host key saved on their local machines.
How do I ssh into a specific login node, if I have a tmux session running there?
The login nodes allow tmux sessions to persist across ssh sessions. However, when you ssh into sapelo2.gacrc.uga.edu, your session can connect to one of several login nodes (for example, ss-sub1, ss-sub2, ss-sub3, etc). If you start a tmux session on one of the login nodes, it will not be available on the others. So you would need to check which login node you landed on and then log back into it directly. To check the name of the login node, you can run the command
hostname
. For example:
hostname
It returns:
ss-sub1
Once you have known the login node (e.g., ss-sub1) where your tmux session is running on, you can connect to that specific node directly:
ssh MyID@ss-sub1.gacrc.uga.edu
thus ls
tmux attach -t 0
Files
How do I copy files to/from GACRC storage?
Users can transfer files between their local machines and GACRC storage using FTP with explicit SSL encryption, a secure copy (scp), WinSCP, FileZilla, etc. To transfer files using scp (or SSH file transfer) you must have scp (or SSH) on your local machine and a connection to the UGA campus network. An scp software is included in recent releases of Unix based operating systems (including Linux and Mac OS X). Two file transfer software that support FTP with explicit SSL encryption are the open source software FileZilla (available for Windows, Mac OS X, and Linux) and WinSCP (available for Windows machines).
For more detailed information on how to copy files to/from a specific GACRC resource, please see the
Transferring Files
page.
Can I use text files (programs, scripts, etc) created on a Windows machine on the GACRC Unix/Linux machines?
Text (ASCII) files created on Windows machines might have Windows newlines that are not interpreted correctly by a Unix/Linux system. However, you can convert a Windows text file to the Unix/Linux format with the dos2unix command available on the GACRC's Sapelo2 and the teaching cluster. The syntax is
dos2unix filename
where filename is the name of the ascii file (such as program.c, program.f, run.sh, input.txt, etc) created on a Windows machine.
Can I use text files (programs, scripts, etc) created on a Mac machine on the GACRC Unix/Linux machines?
Text (ASCII) files created on Mac machines might have Mac newlines that are not interpreted correctly by a Unix/Linux system. However, you can convert a Mac text file to the Unix/Linux format with the mac2unix command available on the GACRC's Sapelo2 and the teaching cluster. The syntax is
mac2unix filename
where filename is the name of the ASCII file (such as program.c, program.f, run.sh, input.txt, etc) created on a Mac machine.
Can I leave my files in my /scratch directory?
No, do not do this. Files not being used in /scratch will be cleaned up. Please see
the FAQ on files disappearing from /scratch
Storage
Why can't I see my lab's /project directory?
/project directories are only accessible from the transfer nodes. Please make sure you've connected to xfer.gacrc.uga.edu (rather than the login/submit nodes) to access your lab's /project directory. Please note that /project directories are auto-mounted when you first accessed, so if you were to initially execute the command
ls /project
, you wouldn't see your lab's project directory as a subdirectory of /project, although it is there.
My data in /scratch disappeared. What happened?
Data not being used or accessed in the /scratch file system are periodically cleaned up, as per the
30-day Scratch Purge Policy
. Please move your files off of /scratch when you're no longer using them. The /scratch file system is not backed up.
Is GACRC storage backed up?
/home and /project directories are backed up, while /scratch, /work, and /lscratch are not. Please see the
snapshots
section of
Disk Storage
for more information.
Software
What software is available on GACRC clusters?
The best way to search for software on the clusters is with the
ml spider
nameOfSoftware
command, where
nameOfSoftware
is what you're searching for. You can also scroll through a full list of software modules with the
ml av
command. After entering this command, press spacebar to scroll, and q to quit. If centrally installed software has unique usage information, we document it on our
Software
page. In addition to software modules, we have some Singularity containers centrally installed at /apps/singularity-images on Sapelo2.
Can I install software myself on GACRC clusters?
Yes, users can install their own software in their /home directory or their lab's /work directory. Note that this does not include installing applications from package managers such as yum or apt. Please see
Installing Applications on Sapelo2
for more information.
How do I access R libraries and Python modules on GACRC clusters?
R Libraries
Most R libraries are added to the centrally installed R modules. Thus, in most cases, you can load the software module for the version of R that you're using and then load the desired library in your R script with
library(packageName)
. Note that we tend to not update these R libraries once they're installed, as other users could be using them.
In some cases R libraries will have their own software module, that loads a particular version of R with it. For example, R packages that depend on the JAGS library can be found in the software module rjags/4-12-foss-2022a-R-4.3.1 (for R 4.3.1).
Python Modules
Python modules that are not a part of the standard Python library will typically have their own software modules which also load a particular version of Python. For example, the software module TensorFlow/2.11.0-foss-2022a-CUDA-11.7.0 would load TensorFlow version 2.11.0 and Python 3.9.4. Another example is SciPy-bundle/2022.05-foss-2022a, which loads several scientific Python packages, such as numpy, scipy, and pandas, as well as Python 3.10.4.
What is Singularity?
Please see the section on
Singularity
in
Software on Sapelo2
How do I request an application be installed on a GACRC cluster?
Please fill out the
software installation/update request form
My software requires a database, can you help?
At this time we have very limited resources to support applications that require a database. Effectively managing a relational database is no trivial task and can require significant setup and maintenance, especially when trying to integrate one into an application on an HPC cluster. If an application allows it, it would be more efficient to use a SQLite database, which is a server-less database that creates a single database file for your application to work with, that could exist in your /scratch or /work directory while you're using it.
Can I install web services on GACRC clusters?
Applications that are or include web services generally do not lend themselves well to HPC clusters for a variety of reasons. First of all, ports that web applications would use are not opened through the firewall on our clusters. Secondly, many web services expect to be running 24/7, which is not feasible on an HPC cluster, given that running web applications would not be acceptable on the login/submit nodes, and compute nodes are for temporary jobs, not permanent services. If there is an application you would like to use on the cluster that has a web-based component that you think may be acceptable on a GACRC cluster, please reach out to us via the
software installation/update request form
and we'll take a look at it.
How can I use the Gaussian software on Sapelo2?
Users are required to sign a license agreement form before being allowed to run this software. Please see our
wiki page
on Gaussian for more information.
Using GACRC Clusters
I'm brand new to high performance computing. Where do I start?
Please see the following links to get started:
Intro to Linux videos
Intro to HPC video
Best Practices on Sapelo2
Can I use a shell other than Bash?
When you log into a Linux machine, the environment on your terminal and the commands that you type at the prompt are defined/interpreted by a program called a shell. Examples of shells are bash, csh, ksh, tcsh, zsh. The syntax for setting environment variables and some of the functionality of your keyboard depend on the shell that you are running. For example, with bash and tcsh it is straightforward to use up arrows to recover previous commands. All users have a default shell (bash) defined at account creation time. Users who wish to have their default shell changed can request that via the
GACRC General Support
form.
Why doesn't the ls command give me colored output?
By default
ls
does not color code its output on Sapelo2. This is because doing so required getting file metadata, which can be especially taxing on a Lustre file system (/scratch and /work) if overdone.
How do I use GUI applications on GACRC clusters from my Windows desktop?
A number of software installed on GACRC clusters have X Window (GUI) front ends. Examples of such applications are Matlab, Mathematica, some text editors and debuggers, etc. The best way to run such applications is using the Open OnDemand (OOD) interface to Sapelo2, either by running an interactive application in OOD or by starting an X Desktop session on the cluster and running the application therein. More information is available at
OnDemand
If using OnDemand is not an option, you can run GUI applications using X forwarding. In order to export such X Window applications to your Windows desktop, your desktop needs to have an X Window client (or server) running on it. A free X Window server for Microsoft Windows (10/8/7) is
Xming
. You can download it from
Sourceforge
and make a default installation. You will need to install the Xming server and the Xming-fonts package. Some applications also require having Xming-mesa installed. During the installation of Xming, you might want to select the option to create a desktop icon for Xming. When the installation of these two packages is complete, double click on the Xming icon to start the X Window server (a capital X will appear on your task bar).
Now you need to configure your SSH client to allow tunneling of X11 connections. For example, if you use PuTTY you need to open it, expand the SSH option in the left pane, click X11 in the left pane, and check the "Enable X11 forwarding" box.
Once that is done, you can SSH into your GACRC account (e.g. Sapelo2 account) and run X Window applications. The application should appear on your local Windows desktop. Each time you logout and log back into your Windows desktop, you would need to start the Xming Server manually before using PuTTY to connect to your GACRC account.
Please note that GUI applications require a graphical interactive job session, for which more information can be found
here
How do I use GUI applications on GACRC clusters from my Mac?
A number of software installed on GACRC clusters have X Window (GUI) front ends. Examples of such applications are Matlab, Mathematica, some text editors and debuggers, etc. The best way to run such applications is using the Open OnDemand (OOD) interface to Sapelo2, either by running an interactive application in OOD or by starting an X Desktop session on the cluster and running the application therein. More information is available at
OnDemand
If using OnDemand is not an option, you can run GUI applications using X forwarding.
For Apple's OSX v10.6.3 and beyond, users have to manually install XQuartz to enable the X11 features according to
Apple
. It is free and available at
XQuartz
Then connect to Sapelo as:
ssh -X myid@sapelo2.gacrc.uga.edu
Please check where your local machine has xauth installed, e.g. is it in /opt/X11/bin/xauth or somewhere else? Then edit the ~/.ssh/config file on your local machine (not on Sapelo2) to add the location of xauth, e.g. add
Host *
XAuthLocation /opt/X11/bin/xauth
if that is the path of xauth on your machine. If ~/.ssh/config does not exist, create this file and put the lines above in this file.
After making this change on your local machine, start an XQuartz terminal and connect to sapelo2 with the
ssh -X
command above.
Please note that GUI applications require a graphical interactive job session, for which more information can be found
here
Why did I receive an email from Arbiter?
If you've received an email from Arbiter, that means you are running a process on the login/submit nodes (ss-sub1, ss-sub2, ss-sub3, etc...) that is using a lot resources and should be run on a compute node. The login/submit nodes are only for submitting jobs to the cluster and are not for running any scientific software or scripts. If you accidentally run a process on the login/submit nodes that shouldn't be run there, Arbiter will throttle your process to preserve the integrity of the login/submit nodes for everyone else and send you an email letting you know that that happened.
Can I connect to GACRC clusters via Visual Studio Code?
Yes, please see our documentation about that
here
Slurm Jobs
How can I check on the status of my job(s)?
squeue --me
- Shows the status of pending or running jobs, until a job finishes.
scontrol show job
jobid
- Shows information about pending or running jobs, until very shortly after a job finishes.
sacct -X -j
jobid
- Shows status/information about a job.
sacct-gacrc -X -j
jobid
sacct
with some useful pre-formatted fields.
sacct-gacrc-v
jobid
sacct-gacrc
displayed vertically, line by line.
I submitted my job, but I don't see anything in the output of squeue --me
It is very likely there was a problem with your job that caused it to fail and disappear from the output of
squeue --me
before you finished typing the command. Check your Slurm job output file(s) for any errors.
Why is my job pending?
One way that you can investigate why your job is pending is to check the rightmost column ("NODELIST/REASON") of the output of
squeue --me
. If the job is pending, rather than a list of node names on which the job is running, it will give a reason as to why the job hasn't started. These are some of the most common reasons a job may be pending:
The partition to which you have sent your job is very busy at the moment. The busier a partition is, the longer it may take for the job scheduling system to fit in your job among all the others running and waiting to run. This is also somewhat of a function of how many resources you've requested. As a general rule of thumb, the more resources requested, the longer you may have to wait for your job to start. To investigate how busy a partition is, you can use the
sinfo -p
partitionName
or
sinfo-gacrc
commands.
You have hit limit for the number of jobs you can have running at a time in the partition to which you've sent your job. Please see
Job Submission partitions on Sapelo2
for more information on how many jobs you can have running and pending at a time in a particular partition.
You have requested an amount of time for your job that would cause it to run into a scheduled maintenance period if it were to use all of the requested walltime. If this is the case, the reason listed for the job being in a pending state in the
squeue --me
output will be "(ReqNodeNotAvail, Reserved for maintenance)." You would need to
scancel
jobID
this job and resubmit it with a lower walltime if you would like to run it prior to the scheduled maintenance. Scheduled maintenance information can be found on the
home page
of our wiki, and will be emailed to GACRC users.
There are other pending jobs in the same partition as yours that have a higher priority. You can see the priority of your job(s) in the output of
sq --me
or
sacct-gacrc -X --prio
. When determining a job's priority Slurm takes into account recent cluster usage. More information about Slurm job priority can be found
here
How do I know how much resources to request for my job?
Please see these wiki pages to learn more about optimizing requested resources for your jobs:
Best Practices on Sapelo2
Job Resource Tuning
How much time, memory, and how many cores can I request for my jobs?
For information on resources available in GACRC cluster partitions, please see
Job Submission partitions on Sapelo2
What is an array job?
Please see our
wiki page
on array jobs.
Can I add more time to my running job(s)?
If your job is still running and needs more time, please reach out to us via our
general support request form
, and we can add more time to it. If the job has already reached its walltime limit (and was terminated by the queueing system), it would have to be restarted.
Can I receive an email when my job starts or finishes?
Yes. You can instruct Slurm to send you an email when your job starts or finishes with the Slurm headers --mail-user and --mail-type (defining the email address to which emails should be sent and under what conditions an email should be sent, respectively). For example:
#!/bin/bash

#SBATCH --partition=batch
#SBATCH --ntasks=1
#SBATCH --mem=10gb
#SBATCH --time=01:00:00
#SBATCH --mail-user=MYID@uga.edu
#SBATCH --mail-type=ALL
The above Slurm headers would cause an email to be sent to MYID@uga.edu when the job began, and when it finished (regardless of job success or failure). Other valid values for --mail-type include BEGIN,END,FAIL, where END would send an email when the job completes successfully, and FAIL would send an email when it finishes but fails. If you prefer to be notified when the job starts and finishes, you can just use ALL for the value of --mail-type. Note that the email address for --mail-user doesn't necessarily have to be a UGA email address, just a valid email address.
By default, email notifications set for an array job will generate one email message for the array job. If you would like to receive an email message for individual array job elements (up to a certain limit), please add ARRAY_TASKS to the --mail-type option.
Why is my job running in a scavenge_p partition?
Short jobs (for example, jobs that request less than two hours of walltime) submitted to the 'batch' partition might be automatically moved into a scavenge_p partition if the 'batch' partition is busy. This is a way to reduce the wait time of the short jobs, while making use of the buyin nodes that are not in use. For more information, please see
What_is_the_scavenge_p_partition
Training
What training does GACRC offer?
Every month GACRC offers Linux and Sapelo2 training for current and pending new users of Sapelo2. We also offer Python, R, and Conda training. For the current training schedule and more information, please see our
Training
page.
How do I sign up for GACRC training?
To sign up for GACRC training, please fill out the
training request form
Is GACRC training done in person?
No. For the foreseeable future, we will be doing our training sessions via Zoom.
Does GACRC have any training videos?
Yes. Please see our
Kaltura channel
Support
How do I get GACRC support?
The best way to get support from GACRC is to fill out the relevant form at
What is the scope of GACRC support?
We strive to provide exceptional HPC support. This is primarily focused on assistance with use of GACRC clusters. Some of the things we are able to assist our users with include but are not limited to:
Job management/troubleshooting
Data management
Software installation/troubleshooting
Script debugging/optimization
General HPC consulting
Support using Linux
Cluster account support
HPC cluster training
Programming training
We cannot assist users with their actual science. This can be a gray area sometimes, but some things that are the responsibility of the researcher include but are not limited to:
Usage of scientific programs
Determining the best tool for one's research tasks
Ensuring one's input data are formatted correctly
Accounts
How do I apply for accounts on GACRC clusters?
User accounts are created as part of a "lab group" which has been registered by a Principal Investigator (PI), i.e. a UGA faculty. Once the group is registered, the PI will receive an email stating that he/she can request individual accounts for members of his/her group. For more information, please see
What do I do if I've changed lab groups or am collaborating with another lab?
If you have switched lab groups or are collaborating with another lab group and need access to their /work and /project directories, please have the PI of your new lab group fill out the
Modify/Delete Account request form
Will I still have access to GACRC Clusters after leaving UGA?
As long as your MyID stays active in the UGA system and your professor/group PI wants to continue to keep you in his/her computing lab, your cluster access will be maintained by GACRC. As a student, about a year after you graduate or leave UGA, you will receive an email notifying you that your MyID account will be disabled. Faculty and staff's MyID might be disabled as soon as they leave UGA. You can find detailed info about this at
. To have access to Sapelo2 beyond this point, your UGA research group PI can request a Visiting Researcher/Scholar (VRS) or a Remote Visiting Researcher/Scholar (Remote VRS or RVRS) status for you. Information about this program and its application process is available at
Please note that Sapelo2 should only be used for work done in collaboration with your UGA PI.
Can my external collaborators get access to Sapelo2?
Yes. A UGA PI can request a Visiting Researcher/Scholar (VRS) or a Remote Visiting Researcher/Scholar (Remote VRS or RVRS) status for an external collaborator. Information about this program and its application process is available at
. Once the collaborator's VRS or RVRS status is active, the collaborator will have a UGA MyID. At that point, the UGA PI can request a Sapelo2 account for the collaborator, using our regular
Account Creation Form
Will I still have access to the Teaching Cluster once the semester is over?
Teaching cluster accounts are not long-term accounts. According to our policy, accounts created on the teaching cluster will be deleted at the end of each semester.
GACRC
What compute platforms are available at GACRC?
A list of GACRC systems, including a brief description of the compute platforms, is available at the
Systems
page.
How do I acknowledge the GACRC in my publication?
A sample acknowledgment statement is provided at
Retrieved from "
Navigation menu