WebHome < WebDocumentation < Foswiki

WebHome < WebDocumentation < Foswiki
You are here:
Foswiki
WebDocumentation Web
WebHome
(18 Dec 2025,
ZhiweiWang
Edit
Attach
Arc User-Guide
Arc
is the primary
High Performance Computing (HPC)
system at The University of Texas at San Antonio (UTSA) that can be used for running data-intensive, memory-intensive, and compute-intensive jobs from a wide range of disciplines.
1. Arc is equipped with:
172 total compute/GPU nodes and 2 login nodes, majority of these are Intel Cascade Lake CPUs and some are AMD EPYC CPUs
22
GPU
nodes - each containing two Intel CPUs with 20-cores each for a total of 40 cores, one V100 Nvidia
GPU
accelerator and 384GB of RAM
GPU
nodes - each containing two Intel CPUs with 20-cores each for a total of 40 cores, two V100 Nvidia
GPU
accelerators and 384GB of RAM
3 NVIDIA DGX nodes - each having two AMD EPYC CPUs with 64-cores each for a total of 128 cores, eight Nvidia A100 80GB
GPUs
and 2TB of RAM
GPU
nodes - each containing two Intel CPUs with 20-cores each for a total of 40 cores, four V100 Nvidia
GPU
accelerators and 384GB of RAM
GPU
nodes - each having two AMD EPYC CPUs with 8-cores each for a total of 16 cores, one Nvidia A100 40GB
GPU
and 1 TB RAM
GPU
nodes - each having two Intel CPUs with 24-cores each for a total of 48 cores, four Nvidia H100 80GB
GPUs
and 1TB of RAM
2 CPU nodes - each having two Intel CPUs with 24-cores each for a total of 48 cores, four Nvidia
L40S
48GB
GPUs
and 1TB of RAM
2 large-memory nodes, each containing four CPUs with 20-cores each for a total of 80 cores, and each including 1.5TB of RAM
1 large-memory node, equipped with two AMD EPYC CPUs with 8-cores each for total of 16 cores and 2 TB of RAM
6 nodes equipped with two AMD EPYC CPUs with 8-cores each for a total of 16 cores and 1 TB of RAM
100Gb/s Infiniband connectivity
Two Lustre filesytems: /home and /work, where /home has 110 TBs capacity and /work has 1.1 PB of capacity
A cumulative total of 250TB of local scratch (approximately 1.5 TB of /scratch space on most compute/GPU nodes)
Multiple partitions (or queues) having different characteristics and constraints:
bigmem2: 1 node
gpu1a100: 2 nodes
bigmem: 2 nodes
compute1: 65 nodes
compute2: 27 nodes
compute3: 6 nodes
gpu1v100: 22 nodes
gpu2v100: 9 nodes
gpu4v100: 2 nodes
two privately owned partitions consisting of 24 nodes
one privately owned partition equipped with three Nvidia DGX servers with 8x A100 80 GB
GPUs
one privately owned partition equipped with two Dell
R760XA
servers with 4x
L40S
GPUs
two privately owned partitions each equipped with Dell XE8640 servers with 4x H100
GPUs
2.
Arc Fair-Use Policies
Running Jobs
Compute nodes are not shared among multiple users. Instead, when a user grabs a compute node, they will be the only user allowed to access it. This is being implemented for security reasons, as well as performance reasons. If multiple users are sharing the same node, performance can be negatively impacted due to resource contention. While we will no longer be scheduling jobs from different users on the same node, users are encouraged to take advantage of tools such as GNU parallel to co-schedule their multiple independent tasks on the compute nodes allocated to them. Please see
Section 10
of the user-guide for further details on running multiple tasks concurrently on one or more nodes from a single Slurm job.
Each user will be limited to 10 active jobs at a given point in time and will be limited to running these jobs on a maximum of 20 compute nodes. As each compute node is dual-socket, and has a 20-core processors on each socket, a total of 800 cores could be potentially used by a job at a given point in time.
Each job will be limited to a run-time of no more than 72 hours. Users are encouraged to consider implementing checkpointing-restart capabilities in their home-grown applications. The research computing support group will be happy to provide guidance on implementing checkpointing-restart mechanism in the users' code. Some third-party software, like the FLASH astrophysics code, already have in-built capabilities to checkpoint-restart. Such capabilities can be enabled by setting the required environment variables. The users are encouraged to review the documentation of their software to confirm whether or not the checkpoint-restart functionality is available in the software of their choice.
Section 16
of this user-guide has further information on using checkpointing and restart.
Exceptions
: If you require access to nodes for a longer period of time, or need access to more nodes than what are allowed by default, please submit a service request
ticket
with an exemption request. We will need a brief description of the activity for your request, along with the number of cores and nodes required, and the time duration for which you are requesting the exemption. Also, we request to explore the options for checkpointing the code before submitting the ticket for service request at the following URL:
Data Storage (Disk Usage)
Work Directory (/work/abc123) – as detailed in our Wiki, this directory is where you should place any input/output files as well as logs for your running jobs. This directory is
NOT
backed up and is not intended for long-term storage.
Work Directory Data Retention – all files in the Work directory that have not been accessed in the last 30 days will be likely candidates for deletion.
Home Directory (/home/abc123) – this directory is backed up but should only be used for installing and compiling code. Storage of datasets is permitted here, but there will be a hard quota limit of 100GB in place.
Vault Directory (vault/research/abc123) - each user on Arc is provided 1TB of archival storage located in /vault. This storage space is accesible from Arc, as well as Windows or Mac computers. This data is backed up and the backups are replicated to UT Arlington for an extra layer of protection. If additional storage space is needed on the "vault" system, please submit a service request at the following URL:
GPU Resources Utilization
Monitoring: We employ enhanced monitoring systems to check for
GPU
usage on all jobs allocated to general use
GPU
nodes.
Termination: Jobs detected to be idle or not making use of the allocated
GPUs
will be terminated after a grace period of 1 hour.
Notification: After termination, users will receive a warning email to either adjust their job accordingly or utilize one of the many CPU queues.
3. Requesting an Account on Arc
If you are interested in requesting an account on Arc, please visit the
support portal
and search for "HPC account"
* Arc is accessible over SSH using two-factor authentication with DUO. Hostname for Arc is
arc.utsa.edu
, and the SSH port number is 22. In order to utilize DUO, you must register online at
passphrase.utsa.edu
Please note that sharing of User Credentials is strictly prohibited. Any violation of this policy could lead to suspension of your account on Arc.
4. Prerequisite: Arc has a Linux operating system and hence, basic knowledge of Linux is required for working efficiently on Arc in command-line mode.
If you need help with learning Linux, the following link will provide a quick overview of Linux and basic Linux commands:
Express Linux Tutorial
5.
Onboarding Arc
Logging Into ARC and Accessing a Compute Node
Running Applications Interactively and Submitting Batch Jobs
Managing Batch Jobs
Modules for Managing User Environment on Arc
Migrating Data from Arc to the Isilon archival storage
General Instructions for File Transfer
6. Best Practices on Arc
Designing and Running Parallel Programs
Running Multiple Copies of Executables Concurrently from the Same Job
Using Containers (Singularity and Docker) on Arc
Application Checkpointing and Restart on Arc
7. Managing Working Environments
Python VMs in Anaconda
Saving and Importing Python Environments
Saving the Python Virtual Environment in Work Directory
Saving and Importing R Environments:
Saving the R libraries in Work Directory
R troubleshooting - Fixing a corrupted library directory using Bioconductor
Setting Java Environment for Applications with Java Dependencies
8. Using Some of the Popular Software Packages that are Installed System-Wide
AI/Machine Learning Software
Using Tensorflow on Arc
Using Tensorflow with Multiple GPUs
Installing and Using PyTorch on Arc
Using Matlab on Arc
Using Abaqus on Arc
Using Namd On Arc
Visualization Using Paraview on Arc
9.
Technical Support
For
technical support
, you can submit a support request for Arc at the following link:
. Instructions for submitting support requests can be found
here
The Research Computing Support Group is available between 8:00 AM to 5:00 PM on all business days to assist with the service requests.
Our time-to-response on new tickets is 4 business hours, and the time-to-resolution varies depending upon the complexity of the issue.
Please open a new ticket for every new topic
Once a ticket is closed you are welcome to reopen it if the exact topic that was addressed in the ticket appears to be still unresolved
For after-hours emergency support, please contact Tech Cafe at 210-458-5555.
10.
Training and Workshops
References
"User Environment Tracking and Problem Detection with XALT," K. Agrawal, M. R. Fahey, R.
McLay, and D. James, In Proceedings of the First International Workshop on HPC User Support Tools, HUST '14, Nov. 2014. dx.doi.org/10.1109/HUST.2014.6.
Attachments
Attachments
Attachment
Action
Size
Date
Who
Comment
pdf
Deep Learning Model on CIFAR10 dataset using PyTorch on GPU nodes.pdf
manage
417 K
15 Aug 2021 - 19:36
AdminUser
Pytorch on
GPUs
pdf
Express_Linux_Tutorial-SizeOptimized.pdf
manage
653 K
19 Aug 2021 - 19:22
AdminUser
Quick Linux Tutorial - Saved as a "Reduced Size" pdf to get below 10MB size limit
pdf
Installation and Working of Deep Learning Libraries (TensorFlow) on Remote Linux Systems (Stampede2 and Arc).pdf
manage
135 K
15 Aug 2021 - 18:45
AdminUser
Tensorflow
pdf
RUNNING MATLAB “Hello, World” Example on Remote Linux Systems (1).pdf
manage
104 K
15 Aug 2021 - 18:07
AdminUser
Sample
MatLab
Job
pdf
Running_Jobs_On_Arc.pdf
manage
392 K
25 Oct 2022 - 22:29
AdminUser
Running Jobs on Arc
EXT
migrate-shamu2arc
manage
3 K
26 Aug 2021 - 14:23
AdminUser
Bash wrapper script for rsync to migrate user home and/or work data from Shamu to Arc
pdf
running_c_cpp_fortran_python_r.pdf
manage
412 K
19 Aug 2021 - 21:32
AdminUser
Running C, C++, fortran, Python, and R applications in serial mode
pdf
running_executables_and_gnu_parallel.pdf
manage
468 K
19 Aug 2021 - 21:34
AdminUser
Executables and GNU Parallel
pdf
running_parallel_programs_on_Arc.pdf
manage
476 K
19 Aug 2021 - 21:32
AdminUser
dit
ttach
rint version
istory
: r87
r86
r85
r84
acklinks
iew wiki text
Edit
iki text
ore topic actions
Topic revision: r87 - 18 Dec 2025,
ZhiweiWang
WebDocumentation
Toolbox
Create New Topic
Index
Changes
Notifications
RSS Feed
Statistics
Preferences
Webs
ARC
CondaEnvironmentSaysMetadataCorruptedWhenInstalling
Main
Sandbox
System
WebDocumentation
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki?
Send feedback