GitHub - apache/tez: Apache Tez · GitHub
Skip to content
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
apache
tez
Public
Notifications
You must be signed in to change notification settings
Fork
439
Star
514
Branches
Tags
Open more actions menu
Folders and files
Name
Name
Last commit message
Last commit date
Latest commit
History
3,093 Commits
3,093 Commits
.github/
workflows
.github/
workflows
build-tools
build-tools
dev-support/
bin
dev-support/
bin
docs
docs
hadoop-shim-impls
hadoop-shim-impls
hadoop-shim
hadoop-shim
tez-api
tez-api
tez-build-tools
tez-build-tools
tez-common
tez-common
tez-dag
tez-dag
tez-dist
tez-dist
tez-examples
tez-examples
tez-ext-service-tests
tez-ext-service-tests
tez-mapreduce
tez-mapreduce
tez-plugins
tez-plugins
tez-runtime-internals
tez-runtime-internals
tez-runtime-library
tez-runtime-library
tez-tests
tez-tests
tez-tools
tez-tools
tez-ui
tez-ui
.asf.yaml
.asf.yaml
.gitignore
.gitignore
INSTALL.md
INSTALL.md
Jenkinsfile
Jenkinsfile
LICENSE
LICENSE
NOTICE
NOTICE
README.md
README.md
Tez_DOAP.rdf
Tez_DOAP.rdf
pom.xml
pom.xml
Repository files navigation
Apache Tez
Apache Tez is a generic data-processing pipeline engine envisioned as a
low-level engine for higher abstractions such as Apache Hive, Apache Pig etc.
At its heart, tez is very simple and has just two components:
The data-processing pipeline engine where-in one can plug-in input,
processing and output implementations to perform arbitrary data-processing.
Every 'task' in tez has the following:
Input to consume key/value pairs from.
Processor to process them.
Output to collect the processed key/value pairs.
A master for the data-processing application, where-by one can put together
arbitrary data-processing 'tasks' described above into a task-DAG to process
data as desired.
The generic master is implemented as a Apache Hadoop YARN ApplicationMaster.
Building Tez
For instructions on how to contribute to Tez, refer to:
Tez Wiki - How to Contribute
Requirements
JDK 21+
Maven 3.9.14 or later
spotbugs 4.9.3 or later (if running spotbugs)
ProtocolBuffer 3.25.5
Hadoop 3.x
Maven Build Goals
Clean:
mvn clean
Compile:
mvn compile
Run tests:
mvn test
Create JAR:
mvn package
Run spotbugs:
mvn compile spotbugs:spotbugs
Run checkstyle:
mvn compile checkstyle:checkstyle
Install JAR in M2 cache:
mvn install
Deploy JAR to Maven repo:
mvn deploy
Run jacoco:
mvn test -Pjacoco
Run Rat:
mvn apache-rat:check
Build javadocs:
mvn javadoc:javadoc
Build distribution:
mvn package -Dhadoop.version=3.4.2
Visualize state machines:
mvn compile -Pvisualize -DskipTests=true
Build Options
Use
-Dpackage.format
to create distributions with a format other than
.tar.gz (mvn-assembly-plugin formats).
Use
-Dhadoop.version
to specify the version of Hadoop to build Tez against.
Use
-Dprotoc.path
to specify the path to
protoc
Use
-Dallow.root.build
to root build
tez-ui
components.
Building against a Specific Version of Hadoop
Tez runs on top of Apache Hadoop YARN and requires Hadoop 3.x.
By default, it can be compiled against other compatible Hadoop versions by
specifying
hadoop.version
mvn package -Dhadoop.version=3.4.2
For recent versions of Hadoop (which do not bundle AWS and Azure by default),
you can bundle AWS-S3 or Azure support:
mvn package -Dhadoop.version=3.4.2 -Paws -Pazure
Tez also has shims to provide version-specific implementations for various APIs.
For more details, refer to
Hadoop Shims
Tez UI
UI Build Issues
In case of issues with the UI build, please clean the UI cache:
mvn clean -PcleanUICache
Skip UI Build
To skip the UI build, use the
noui
profile:
mvn clean install -DskipTests -Pnoui
Maven will still include the
tez-ui
project, but all related plugins will be
skipped.
Issue with PhantomJS on building in PowerPC
Official PhantomJS binaries were not available for the Power platform. If the
build fails on PPC, try installing PhantomJS manually and rerun. Refer to
PhantomJS README
and install it globally.
Protocol Buffer Compiler
The version of the Protocol Buffer compiler (
protoc
) can be defined
on-the-fly:
mvn clean install -DskipTests -pl ./tez-api -Dprotobuf.version=3.25.5
The default version is defined in the root
pom.xml
If you have multiple versions of
protoc
, set the
PROTOC_PATH
environment
variable to point to the desired binary. If not defined, the embedded
protoc
compiler corresponding to
${protobuf.version}
will be used.
Alternatively, specify the path during the build:
mvn package -DskipTests -Dprotoc.path=/usr/local/bin/protoc
Building the Docs
Build a local copy of the Apache Tez website:
mvn site -pl docs
Building Components Separately
If you are building a submodule directory, dependencies will be resolved from
the Maven cache or remote repositories. Alternatively, run
mvn install -DskipTests
from the Tez top level once and then work from the
submodule.
Visualize State Machines
Use
-Pvisualize
to generate a Graphviz file (
Tez.gv
) representing state
transitions:
mvn compile -Pvisualize -DskipTests=true
Optional parameters:
-Dtez.dag.state.classes=
(Default: DAG, Vertex, Task, TaskAttempt)
-Dtez.graphviz.title
(Default: Tez)
-Dtez.graphviz.output.file
(Default: Tez.gv)
Example for
DAGImpl
mvn compile -Pvisualize \
-Dtez.dag.state.classes=org.apache.tez.dag.app.dag.impl.DAGImpl \
-DskipTests=true
Convert the
.gv
file to an image:
dot -Tpng -o Tez.png Tez.gv
Building Contrib Tools
Use
-Ptools
to build tools under
tez-tools
mvn package -Ptools
About
Apache Tez
tez.apache.org/
Topics
java
big-data
hadoop
apache
tez
Resources
Readme
License
Apache-2.0 license
Code of conduct
Code of conduct
Security policy
Security policy
Uh oh!
There was an error while loading.
Please reload this page
Activity
Custom properties
Stars
514
stars
Watchers
32
watching
Forks
439
forks
Report repository
Releases
74
tags
Packages
Uh oh!
There was an error while loading.
Please reload this page
Uh oh!
There was an error while loading.
Please reload this page
Contributors
Uh oh!
There was an error while loading.
Please reload this page
Languages
Java
90.2%
JavaScript
7.5%
Handlebars
1.0%
Less
0.5%
Shell
0.4%
Python
0.3%
Other
0.1%
You can’t perform that action at this time.