Sunday, October 24, 2021

Configure RangerMSP database connection in PDI


Configure RangerMSP (CommitCRM) database connection in PDI

Once your Pentaho server is installed and working, you will want to establish a connection to your RangerMSP database.

This tutorial will show you how to enable connectivity between Pentaho and Neo4j:

What you need to before you begin

  1. Create a Windows File share for the CommitCRM folder.  Grant permissions (file and share) to the computer account of your Pentaho (PDI) server.

  2. Download the Advantage JDBC database driver 11.  You will need to install the driver, then browse to the installation folder, find and copy the jar file into the /lib folder of your Pentaho server installation.



  3. Launch Spoon
    Create a new transformation, and select the "Execute SQL Script" Step and drag it onto the transformation.


  4. View the properties dialog
    by double clicking on it, or right-click → Edit
    Then on the Connection property, click New



  5. Use the following Connection Settings
    note, there is no username/password. rather, access to the DB is controlled via file/share permissions on your RangerMSP server.  Be sure your Pentaho Computeraccount$ has been given file/share permission on the share

    Connection Name
     <A unique descriptive name>
    Connection Type
     Generic database
    Access
    Native (JDBC)
    Custom connection URL
    jdbc:extendedsystems:advantage://<servername>:6262;catalog=//<servername>/CommitCRM/Db
    Custom driver class name
    com.extendedsystems.jdbc.advantage.ADSDriver *note this is case sensitive!
    Username

    Password





  6. Click "Test" and verify connectivity to your neo4j database

    If successful you should see the following message:



  7. "SHARE" this database connection
    with other transformations/jobs, otherwise you would have to repeat these steps for EACH Transformation/Job.

    Switch from the Design to the View Tab



  8. Expand Database connections
    Right-Click on your new database connection and select Share


You will know the connection is shared if it is displayed in "BOLD".

Now this connection is available to all jobs/transformations for this installation of Spoon/PDI.




Saturday, October 23, 2021

Configure Neo4j 4.x database connection in PDI


Configure Neo4j 4.x database connection in PDI

Once your Pentaho server is installed and working, you will want to establish a connection to your Neo4j database.

This tutorial will show you how to enable connectivity between Pentaho and Neo4j:

What you need to before you begin

Because I frequently use both the generic "Execute SQL script" tasks - which uses the generic PDI database connections and supports multi-transaction CYPHER statements, and ALSO the open source Neo4j output plugin by @mattcasters because it handles data maps and parameters passed from other steps WAY more elegantly than the "Execute SQL script" task.  We will need to configure two neo4j database configurations.

For this tutorial the neo4j server is the same server as the Pentaho Data Integration server. (So you will see the 127.0.0.1 address) - If you have these on separate computers, either use the FQDN or IP address of your neo4j server in the various name/address fields.

  1. Download the latest Neo4j database JDBC driver and place the jar file in the /lib folder of your Pentaho server installation.



  2. Download the latest PDI neo4j output plugin, unzip and place the neo4j output folder in the /plugins folder of your Pentaho server installation.



  3. Launch Spoon
    Create a new transformation, and select the "Execute SQL Script" Step and drag it onto the transformation.


  4. View the properties dialog
    by double clicking on it, or right-click → Edit
    Then on the Connection property, click New



  5. Use the following Connection Settings:
    Use the following settings (These would be typical settings for a default neo4j installation.  You may need to customize these according to your server settings):
    Connection Name
     <A unique descriptive name>
    Connection Type
     Generic database
    Access
    Native (JDBC)
    Custom connection URL
    jdbc:neo4j:bolt://<servername>:<bolt port>
    Custom driver class name
    org.neo4j.jdbc.Driver *note this is case sensitive!
    Username
    <neo4j username>
    Password
    <neo4j password>



  6. Click "Test" and verify connectivity to your neo4j database

    If successful you should see the following message:


  7. "SHARE" this database connection
    with other transformations/jobs, otherwise you would have to repeat these steps for EACH Transformation/Job.

    Switch from the Design to the View Tab



  8. Expand Database connections
    Right-Click on your new database connection and select Share

    You will know the connection is shared if it is displayed in "BOLD"

    Now this connection is available to all jobs/transformations for this installation of Spoon/PDI

    Now you have a Neo4j data connection for all the generic tasks that support SQL style tasks.  Next we want configure the custom Neo4j Output plugin:

  9. At the top under the Neo4j header select "Create connection"


  10. Use the following settings (These would be typical settings for a default neo4j installation.  You may need to customize these according to your server settings):

    Connection name
     <A unique descriptive name>
    I use the same connection name as above to keep things simple!
    Server or IP address
    <address of your neo4j server>
    Database name (4.0)
    neo4j
    Version 4 database?
    checked
    Bolt Port
    7687
    Browser Port
    7474
    Use routing, neo4j:// protocol        
    unchecked
    Routing Policy

    Username
    <neo4j username>
    Password
    <neo4j password>
    Use encryption    



  11. Click Test to verify your connection



  12. Now you can use The Neo4j functions like Neo4J Output, Neo4j Cypher, Graph Output and others.  Here's an article that dives much deeper into their functionality.

Configuring Pentaho Data Integration (PDI) on Windows


Configuring Pentaho Data Integration (PDI) / Kettle community edition on Windows:

This short tutorial will focus on Setting up a PDI server to ingest data from RangerMSP into Neo4j.
You should first setup and configure a working Neo4j 4.x server (it can be on the same server).  


What you need to before you begin
System Requirements:

  • Prepare Windows for Install
  • Virtually all modern desktop/server OS are supported
  • JAVA x64 (OpenJDK/OracleJDK) 8.x
  • Neo4j Server, Developer or Desktop Edition
  • Pentaho 9.x Community Edition Download here
  • Be aware the latest neo4j DB (4.x) requires JAVA 11.  PDI requires 8, so you will need to install BOTH! 

  1. Verify the JAVA_HOME system variable.  For my installation I used the OpenJDK so the path was C:\Program Files\AdoptOpenJDK\jre-8.0.265.01-hotspot

  2. Extract the archive into the folder you want to be your installation folder

  3. Navigate to the installation folder and run spoon.bat


  4. IF you see the _PENTAHO_JAVA variable point to a JAVA version other than 8 then prepend a proper path to the JAVA version 8 at the beginning of the spoon.bat:

    SET JAVA_HOME =C:\Program Files\AdoptOpenJDK\jre-8.0.265.01-hotspot

  5. BE PATIENT.  It takes a little while to load.  After a moment you should see the Pentaho Data Integration splash screen:



    This will stay on the screen for a minute or so, and then the program will finish loading.
    If all went well you should now see the welcome screen:



  6. Ok - now we need some connectivity.  You want to query a Microsoft SQL Server database right?   Of course you do!  Why not?

    First Download the Microsoft JDBC driver for MS SQL Server.

    Extract the contents.

  7. Copy the mssql-jdbc.x.y.z.jre8.jar into the \lib folder of your PDI installation folder (assuming you are using JAVA JRE/JDK v8.  That's it!  next time you launch spoon it will have the SQL jdbc driver available.

    For some more data connectivity examples see the following:

Configuring Neo4j 4.x server on Windows


Configuring Neo4j 4.x server on Windows:

This is an update from a previous article based on Neo4j v3.x

Yes, there are plenty of tutorials for setting up Neo4j already, but I wanted to focus on a few settings that makes it easier to use it with data integration.

This tutorial will focus on Neo4j server for windows.  It is NOT very complicated, and you'll be up and running in no time flat:

What you need to before you begin
System Requirements:

  • Virtually all modern desktop/server OS are supported
  • JAVA (OpenJDK/OracleJDK/ZuluJDK) 11
  • Neo4j Server or Desktop Edition.For these instructions I used the community edition which you can download from https://neo4j.com/download-center/#community

  1. Extract the archive into the folder you want to be your installation folder
  2. Use an editor to modify /conf/neo4j.conf
    Setting
    Action
    Description
    #dbms.directories.import=import
    Comment out
    allows custom imports on-demand
    apoc.import.file.enabled=true
    dbms.security.procedures.allowlist=apoc.coll.*,apoc.load.*,gds.*,apoc.*
    dbms.security.procedures.unrestricted=apoc.schema.*
    add
    Allow APOC file imports
    and apoc procedures
    #dbms.memory.heap.initial_size=5g
    #dbms.memory.heap.max_size=5g
    #dbms.memory.pagecache.size=7g
    customize memory
    Memory configuration will depend on how large your graphDB will become.  Here's a good primer:
     
    https://neo4j.com/docs/operations-manual/current/tools/neo4j-admin-memrec/
    dbms.connectors.default_listen_address=0.0.0.0
    uncomment
    Allows non-local connections

  3. Example Configuration for Plugins*:
    Plugin
    Description
    Download URL
    Notes
    Don't ask, just install it.  No really!  You want this.
    Install binary .jar into /plugins folder:

    You need to MATCH the APOC version with the Neo4j version!
    MSSQL JDBC
    Add this if you need to connect to MS SQL
    Extract mssql-jdbc-9.x.x.jre11.jar into /plugins folder
    Excel (multiple file formats)
    If you need support to import from these formats download the dependencies
    https://repo1.maven.org/maven2/org/apache/poi/poi/5.0.0/poi-5.0.0-javadoc.jar
    https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml/5.0.0/poi-ooxml-5.0.0.jar
    https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml-schemas/4.1.2/poi-ooxml-schemas-4.1.2.jar
    https://repo1.maven.org/maven2/org/apache/xmlbeans/xmlbeans/5.0.2/xmlbeans-5.0.2.jar
    https://repo1.maven.org/maven2/com/github/virtuald/curvesapi/1.06/curvesapi-1.06.jar
    Place these .jar files into the /plugins folder
    Advantage Database JDBC
    To connect to Advantage (sybase) SQL  via JDBC
    This is to support the CRM I use (RangerMSP)

  4. *For most of my installations I ONLY install the APOC plugin.  This is because I typically use an ETL tool like Pentaho Data Integration for ingesting data from other sources (Excel, JDBC, etc), rather than natively from a Neo4j cypher query.  This makes it easier to manage security, tasks and jobs  rather than hard-coding right into a CYPHER statement.

  5. Configure Windows Service
    Neo4j should be configured to run as a Windows service. Launch a command shell, and install the service from within the /bin folder

    neo4j install-service

    If you are upgrading from an older version, you will need to first unregister the service for the old version:
    neo4j uninstall-service
      
  6. Set Initial Password
    neo4j-admin set-initial-password mysupersecretpassword

  7. Start the service
    sc start neo4j
    (or use the service control panel)

  8. Verify your installation.  Browse to http://127.0.0.1:7474
    username: neo4j
    password: <same as you supplied on step 5>


    That's it!  You should be ready to go with a neo4j server that's ready to connect to SQL Server, import from CSV/XLS files, and you will have the APOC library plugins at your disposal!

Thursday, February 11, 2021

Neo4J Installation on CentOS

note: This was some internal documentation I wrote a couple years ago.. based on neo4j 3.5.14 for CentOS, but with some small adjustments you can perform against newer versions, and other Linux distros..



# Patch your new CentOS installation:
yum -y update
yum -y install epel-release
# I like htop
yum -y install htop

# First, you'll need the yum repository and key.
rpm --import https://debian.neo4j.org/neotechnology.gpg.key
cat <<EOF>  /etc/yum.repos.d/neo4j.repo
[neo4j]
name=Neo4j RPM Repository
baseurl=https://yum.neo4j.org/stable
enabled=1
gpgcheck=1
EOF

# View available versions/editions:
yum list neo4j

# I'm going to install the latest community edition, so here we go:
yum -y install neo4j-3.5.14-1

#verify installation
rpm -qa | grep neo

# Increase the open file limits for the neo4j service
#https://community.neo4j.com/t/warning-max-1024-open-files-allowed-minimum-of-40000-recommended-see-the-neo4j-manual/3679/7

systemctl edit neo4j.service

# Paste the following values into the file and save:

* soft nofile 40000
* hard nofile 40000


#create firewall service template for neo4j bolt, http, and https protocols (modify if you do not want to allow all 3)
cat <<EOF>> /usr/lib/firewalld/services/neo4j.xml
<?xml version="1.0" encoding="utf-8"?>
<service>
  <short>neo4j</short>
  <description>Neo4j is an ACID-compliant graph database management system developed by Neo4j, Inc.  Neo4j supports clients using either the Bolt binary protocol or HTTP/HTTPS.</description>
    <port protocol="tcp" port="7687"/>
  <port protocol="tcp" port="7474"/>
  <port protocol="tcp" port="7473"/>
</service>
EOF

# Add our newly created service to the firewall, reload and verify
firewall-cmd --permanent --add-service=neo4j
firewall-cmd --reload
firewall-cmd --list-services

# If you have SELinux you'll need to do this as well:
semanage port -a -t http_port_t -p tcp 7474
semanage port -a -t http_port_t -p tcp 7473
semanage port -a -t http_port_t -p tcp 7687


# Change the default password
cd /usr/bin

neo4j-admin set-initial-password Password1

# Configure neo4j connector to allow network connections
# use an editor to modify /etc/neo4j/neo4j.conf
# uncomment the line dbms.connectors.default_listen_address=0.0.0.0 if you want to accept non-local network connections

#You should adjust your memory settings.  For recommendations see
# https://neo4j.com/docs/operations-manual/current/tools/neo4j-admin-memrec/

# Add any plugins.  I can't live without APOC, so here we go:
cd /var/lib/neo4j/plugins

wget https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/3.5.0.6/apoc-3.5.0.6-all.jar

# and because I am connecting with SQL Server, I will need the SQL JDBC 7.x jar
cd /tmp
wget https://download.microsoft.com/download/2/F/C/2FC75210-EDDE-464C-8E54-45C0291032FF/sqljdbc_7.0.0.0_enu.tar.gz

tar -xzf sqljdbc_7.0.0.0_enu.tar.gz
cp /tmp/sqljdbc_7.0/enu/mssql-jdbc-7.0.0.jre8.jar /var/lib/neo4j/plugins/

# I also want Advantage (SAP Sybase) JDBC:
cd /tmp
wget http://devzone.advantagedatabase.com/dz/download.aspx?Key=iE/9IfXlCOePTGOfVrmGwsL+4iRlo8+R -O adsinstall.jar
unzip adsinstall.jar -d /tmp/ads
cp /tmp/ads/jdbc/adsjdbc.jar  /var/lib/neo4j/plugins/

# Configure neo4j service to start automatically
systemctl enable neo4j
systemctl start neo4j