Sunday, June 26, 2022

Bare bones Configure Apache Hop


Bare bones Configure Apache Hop

Ok - there are in-depth technical overviews of Apache Hop.  This is not one of them.

The intention of this is a stupidly fast, minimum MUST-dos to get someone from zero-to-functioning with Apache Hop.

My primary goal of this was to get Apache Hop working well enough to follow some of my other tutorials.  It's JUST good-enough to get it up and running.

It's also probably good enough if you already have decent ETL experience - especially if you have Pentaho/Kettle/PDI experience!

If you want some more in-depth background - with best practices and feature reviews please check out some of the following videos:

Pre-requisites
  1. SET your HOP_CONFIG_FOLDER

    Because you want to have no-muss, no-fuss Hop UPGRADES when new version come out, and not worry about the binaries mixed in with your config/projects you MUST set some Hop environment variables:

    DO NOT SKIP THIS STEP! - PLEASE AT LEAST set the HOP_CONFIG_FOLDER!

    No! this really isn't optional - set this somewhere fairly permanent, hopefully somewhere you are backing up.

    Perhaps a home directory (or network share) defined you could use something like this:

     CMD SHELL 
    MD %homedrive%%homepath%hop-config
    SETX HOP_CONFIG_FOLDER %homedrive%%homepath%hop-config
    SETX HOP_JAVA_HOME "%java_home%"
    
    

    I wanted my Hop config & Projects to move with me between my laptop, work/home PC, so I used my Microsoft Office365 OneDrive. (which also then backs it up to a cloud drive)  This path also (unfortunately) had spaces in the name, so you HAVE to quote it out like this:

    CMD SHELL 
    cd %onedrive%
    MD hop-config
    SETX HOP_CONFIG_FOLDER "%ONEDRIVE%\hop-config"
    
    SETX HOP_JAVA_HOME "%java_home%"

  2. Edit Hop launch scripts (if required)

    If you had to quote your hop-config (or any other Hop environment variables that you configured).  You will also need to ALSO add the quoting within the script/batch file you use to launch Hop:
Simply EDIT those scripts, and add quotes around any of the Environment variables that have spaces in the paths.  This is why it is BEST to use paths without spaces, but this will work but will ALSO REQUIRE you to edit these scripts EVERY time you upgrade the Apache Hop binaries.

  1. HOP_OPTIONS Additional Configuration

    HOP_OPTIONS let you configure things like memory.  These can be set in the launch scripts above (hop-gui, hop-run, hop-server etc).

    Most commonly you may need to increase the JVM Heap size to accommodate larger data sets.

    For example: 

    HOP_OPTIONS=-Xmx512m: start Hop with maximum 512MB of memory
    HOP_OPTIONS=-Xmx2048m: start Hop with maximum 2048MB (or 2GB) of memory
    HOP_OPTIONS=-Xmx4g: start Hop with maximum 4GB of memory

  2. Set your AES Encryption Key

    Security 101 - NO CLEAR_TEXT PASSWORDS in your workflows/script/projects right?
    Apache Hop has built-in AES encryption capabilities for securing your data source passwords.

    Set your AES key environment variable before launching any of your Hop tools.
    Here's a couple examples to set your AES variable on a Windows computer:

     POWERSHELL 
    $env:HOP_PASSWORD_ENCODER_PLUGIN="AES"
    $env:HOP_AES_ENCODER_KEY=ddsfsdfsfsdf

    or

    CMD SHELL 
    SETX HOP_PASSWORD_ENCODER_PLUGIN AES
    SETX HOP_AES_ENCODER_KEY ddsfsdfsfsdf

     In the examples above, I'm just manually setting the AES key.  A better (and more secure way) would be to retrieve your key programmatically (via API call) from a secrets VAULT, password manager, access controlled file/folder or a secure-string encrypted registry key (you get the idea).

  3.  Add any custom JDBC drivers.

    Hop already comes with drivers for many of your most common data sources, so you may not need to do this.  I am working on a project for a CRM/ERP tool that uses a little less common "Advantage SQL Database" (a Sybase/SAP product).  So I had to go download the Advantage JDBC jar file, extract the contents, and place the adsjdbc.jar file into the /lib folder of my Apache Hop installation.

There you have it.  Now you should be ready to get Apache Hop up and running for your first
project! 

For additional help - check out: