Bare bones Configure Apache Hop
Ok - there are in-depth technical overviews of Apache Hop. This is not one of them.
The intention of this is a stupidly fast, minimum MUST-dos to get someone from zero-to-functioning with Apache Hop.
My primary goal of this was to get Apache Hop working well enough to follow some of my other tutorials. It's JUST good-enough to get it up and running.
It's also probably good enough if you already have decent ETL experience - especially if you have Pentaho/Kettle/PDI experience!
If you want some more in-depth background - with best practices and feature reviews please check out some of the following videos:
Pre-requisites
|
|
- SET your HOP_CONFIG_FOLDER
Because you want to have no-muss, no-fuss Hop UPGRADES when new version come out, and not worry about the binaries mixed in with your config/projects you MUST set some Hop environment variables:
DO NOT SKIP THIS STEP! - PLEASE AT LEAST set the HOP_CONFIG_FOLDER!
No! this really isn't optional - set this somewhere fairly permanent, hopefully somewhere you are backing up.
Perhaps a home directory (or network share) defined you could use something like this:
CMD SHELL
MD %homedrive%%homepath%hop-config SETX HOP_CONFIG_FOLDER %homedrive%%homepath%hop-config SETX HOP_JAVA_HOME "%java_home%"
I wanted my Hop config & Projects to move with me between my laptop, work/home PC, so I used my Microsoft Office365 OneDrive. (which also then backs it up to a cloud drive) This path also (unfortunately) had spaces in the name, so you HAVE to quote it out like this:
CMD SHELLcd %onedrive% MD hop-config SETX HOP_CONFIG_FOLDER "%ONEDRIVE%\hop-config"
SETX HOP_JAVA_HOME "%java_home%"
- Edit Hop launch scripts (if required)
If you had to quote your hop-config (or any other Hop environment variables that you configured). You will also need to ALSO add the quoting within the script/batch file you use to launch Hop:
Simply EDIT those scripts, and add quotes around any of the Environment variables that have spaces in the paths. This is why it is BEST to use paths without spaces, but this will work but will ALSO REQUIRE you to edit these scripts EVERY time you upgrade the Apache Hop binaries.
- HOP_OPTIONS Additional Configuration
HOP_OPTIONS let you configure things like memory. These can be set in the launch scripts above (hop-gui, hop-run, hop-server etc).
Most commonly you may need to increase the JVM Heap size to accommodate larger data sets.
For example:
HOP_OPTIONS=-Xmx512m: start Hop with maximum 512MB of memory
HOP_OPTIONS=-Xmx2048m: start Hop with maximum 2048MB (or 2GB) of memory
HOP_OPTIONS=-Xmx4g: start Hop with maximum 4GB of memory - Set your AES Encryption Key
Security 101 - NO CLEAR_TEXT PASSWORDS in your workflows/script/projects right?
Apache Hop has built-in AES encryption capabilities for securing your data source passwords.
Set your AES key environment variable before launching any of your Hop tools.
Here's a couple examples to set your AES variable on a Windows computer:
POWERSHELL
$env:HOP_PASSWORD_ENCODER_PLUGIN="AES"
$env:HOP_AES_ENCODER_KEY=ddsfsdfsfsdf
or
CMD SHELLSETX HOP_PASSWORD_ENCODER_PLUGIN AES
SETX HOP_AES_ENCODER_KEY ddsfsdfsfsdf
In the examples above, I'm just manually setting the AES key. A better (and more secure way) would be to retrieve your key programmatically (via API call) from a secrets VAULT, password manager, access controlled file/folder or a secure-string encrypted registry key (you get the idea). - Add any custom JDBC drivers.
Hop already comes with drivers for many of your most common data sources, so you may not need to do this. I am working on a project for a CRM/ERP tool that uses a little less common "Advantage SQL Database" (a Sybase/SAP product). So I had to go download the Advantage JDBC jar file, extract the contents, and place the adsjdbc.jar file into the /lib folder of my Apache Hop installation.
project!
For additional help - check out:
- The YouTube videos I listed above
- Apache Hop Chat
- Apache Hop Blog
- Community Support
- Events and User Groups