Tuesday, November 12, 2019

Import vCenter infrastructure into a knowledge graph using Neo4j

Yes, I could have directly queried the Vmware WebAPI, but dealing with self-signed certificates and discovering all the API queries would have been a LOT of work.  RVTools conveniently already gathers ALL the data I'm looking for and exports it into a single Excel file, which makes this process quite a bit easier.

When complete this process will create the following database schema in your neo4j database:


Known Issues
    • Only tested against vCenter clusters (not standalone vsphere host output)
    • The script only builds Standard vSwitch and ports/portgroups.
      distributed virtual switches and ports ARE present in the .xls data export, but the .cypher will need modifications to properly map DV objects.

Installation: Steps (powershell)
  1. Login using the account you intend to use (particularly if scheduling for automation)
  2. Verify you are running powershell v5.0 or newer:

  1. If you don't have the base graph-commit script modules run the following commands to download them from the git repositories via powershell:
    This will result in the scripts being downloaded into %programfiles%\blue net inc\Graph-Commit"
set-executionpolicy unrestricted -force [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12 $client = new-object System.Net.WebClient $client.DownloadFile("https://raw.github.com/pdrangeid/graph-commit/master/update-modules.ps1","$Env:Temp\update-modules.ps1") "$Env:Temp\update-modules.ps1"

  1. Now download the script files to run the veeam data collector from the github repositories via powershell:
    This will result in the scripts being downloaded into %programfiles%\blue net inc\Graph-Commit"\
cd "$env:programfiles\blue net inc\graph-commit" .\update-modules.ps1 -gitrepo pdrangeid/vmware-graph -gitfile refresh-vmware.cypher
  1. If this is the first time using your neo4j database with my scripts, you will need to identify your Neo4j server location and provide credentials.
    This cmdlet will also verify you have the DotNET neo4j driver installed (The set-regcredentials cmdlet can install it automatically for you using the nuget package manager):

.\set-regcredentials.ps1 -credname myneo4jserver -n4j
The prerequisites (Nuget, Neo4J dotNet driver) will be validated and prompted to be installed if missing.  Once complete it will validate connectivity to your neo4j database instance.  A successful result should look like this:
  1. First let's generate your output file from rvtools.
    The example below assumes we will use passthru authentication for the vCenter server.  Review the RVTools documentation for specifying credentials.
    The resulting excel document will be placed in the import subfolder within the neo4j installation path (adjust this for your environment)
[string$RVToolsPathexe = ${env:ProgramFiles(x86)}+"\Robware\RVTools\RVTools.exe" $Arguments = " -passthroughAuth -s fqdn.yourvcenterserver.com -c ExportAll2xlsx -d c:\neo4j-community-3.5.12\import  -f fqdn.yourvcenterserver.com.xlsx" $Process = Start-Process -FilePath $RVToolsPathExe -ArgumentList $Arguments -NoNewWindow -Wait

  1. If all went well you should have your vcenter environment exported into the excel document in c:\neo4j-community-3.5.12\import
    Now we want to run the import process to ingest the data into the graph.
    The $findstring variable is used to perform a find/replace the placeholder (in the .cypher script you downloaded earlier) for the path/file to your excel document.
    Replace the 'neo4jserver' with the name of the neo4j datasource credential you used with the set-regcredentials.ps1 earlier.
    cd "$env:programfiles\blue net inc\graph-commit" $scriptpath = -join ($env:ProgramFiles,"\blue net inc\graph-commit\get-cypher-results.ps1") $findstring='{"path-to-vmware-import-file":"file:///c:/neo4j-community-3.5.12/import/fqdn.yourvcenterserver.com.xlsx"}' $csp=$(-join ($env:programfiles,"\blue net inc\graph-commit\refresh-vmware.cypher")) $result = . $scriptPath -Datasource 'myneo4jserver' -cypherscript $csp -logging 'myneo4jserver' -findrep $findstring
  2.  A successful import will cycle through the transactions and give you log queries to validate:
  3. Use the Neo4j browser: http://your-neo4jserver:7474
    Login with your credentials
  4. Review the cypher logs (run the log queries that were output from the script execution above)
  5. Review the VMware data that was imported.  Here are some sample cypher queries that will present an explorable graph:

    // SHOW vcenter, datacenter, cluster, folders and resource groups: MATCH (vc:Vcenterserver) MATCH (vc)--(vdc:Vspheredatacenter) MATCH (vc)--(vcc:Vcentercluster) WITH *,'/'+vdc.name as startpath OPTIONAL MATCH (vf:Vfolderwhere vf.path starts with startpath OPTIONAL MATCH (vrp:Vresourcepoolwhere vrp.path starts with startpath WITH * MATCH (vm:Virtualmachinewhere (vm)--(vf) or (vm)--(vrp) or (vm)--(vcc) or (vm)--(vdc) return vc,vdc,vcc,vf,vrp,vm

  6. DNS and NTP query:
    // SHOW vSphereHosts DNS,NTP, and vCenter relationships MATCH (vh:Vspherehost) OPTIONAL MATCH (vh)--(ds:Dnsserver) OPTIONAL MATCH (vh)--(ns:Ntpserver) OPTIONAL MATCH (vh)--(vc:Vcenterserver) return vh,ds,ns,vc
  7. Results: 
  8. vSphere Hosts and datastores:
    // SHOW vSpherehost datastores, types, and vcenter MATCH (vh:Vspherehost) OPTIONAL MATCH (vh)--(ds:Vdatastore) OPTIONAL MATCH (ds)--(dst:Vdatastoretype) OPTIONAL MATCH (vh)--(vc:Vcenterserver) return vh,ds,dst,vc
  9. results:  
  10. vSwitch, Portgroups, and Loadbalancing policies:
    // SHOW vSwitch portgroups, and lbpolicies MATCH (vh:Vspherehost) OPTIONAL MATCH (vh)--(vs:Vswitch) OPTIONAL MATCH (vs)--(vlbp:Vlbpolicy) OPTIONAL MATCH (vpg:Vportgroup) OPTIONAL MATCH (vhpg:Vhostportgroup)--(vpg) RETURN vh,vs,vpg,vhpg,vlbp
  11. results:  

Configuring Neo4j server:

Yes, there are plenty of tutorials for setting up Neo4j already, but I wanted to focus on a few settings that makes it easier to use it with data integration.

This tutorial will focus on Neo4j server for windows.  It is NOT very complicated, and you'll be up and running in no time flat:

What you need to before you begin
System Requirements:

  • Windows PC or server
  • JAVA (OpenJDK/Oracle/IBM Java) 8 or greater
  • Neo4j Server or Desktop Edition.For these instructions I used the community edition which you can download from https://neo4j.com/download-center/#community

  1. Extract the archive into the folder you want to be your installation folder
  2. Use an editor to modify /conf/neo4j.conf
    Comment out
    allows custom imports on-demand
    Allow APOC file imports
    customize memory
    Memory configuration will depend on how large your graphDB will become.  Here's a good primer:
    Allows non-local connections
  3. Configure Plugins:
    Download URL
    Don't ask, just install it.  No really!  You want this.
    Install binary .jar into /plugins folder
    Add this if you need to connect to MS SQL
    Extract mssql-jdbc-7.x.x.jre8.jar into /plugins folder
    Excel (multiple file formats)
    To support import from these formats download the dependencies
    Place these .jar files into the /plugins folder
    Advantage Database
    To connect to Advantage (sybase) SQL  via JDBC
    This is to support the CRM I use (CommitCRM)
  4. Configure Windows Service
    Neo4j should be configured to run as a Windows service. Launch a command shell, and install the service from within the /bin folder

    neo4j install-service

    If you are upgrading from an older version, you will need to first unregister the service for the old version:
    neo4j uninstall-service
  5. Set Initial Password
    neo4j-admin set-initial-password mysupersecretpassword

  6. Start the service
    sc start neo4j
    (or use the service control panel)

    That's it!  You should be ready to go with a neo4j server that's ready to connect to SQL Server, import from CSV/XLS files, and you will have the APOC library plugins at your disposal!

Monday, January 28, 2019

Using Powershell to execute cypher scripts with secure credentials and logging results/errors.

This is a continuation of my 1st draft: Using a Powershell wrapper to securely authenticate to Neo4J to execute CYPHER using Bolt.

PROBLEM #1: I was running several .cypher scripts as a scheduled task on Windows using cypher-shell to execute them.  This was fine, however my .cypher files had to provide plain-text to authenticate to various REST-API sites I was using to feed my Neo4j database.  So I wrote the credential ps wrapper (previous post).

PROBLEM #2: As I made changes to my scripts, I would inevitably write some syntax errors into my cypher scripts, and unknowingly break my import process.  But often, just break it a little.  Unless I manually ran each bit of code in the Neo4j Browser, I didn't have an easy way to verify the results (or lack-thereof) of my cypher script modifications.

MY WORK-AROUND: A full cypher execution method that would also log the results (and some statistics meta-data), and show me syntax errors (exceptions) from the cypher.

First you supply your Neo4j database destination & credentials using set-n4jcredentials.ps1.  Then supply any additional (API, web credentials) using set-customcredentials.ps1. These store credentials (in the registry) with secure-string for the sensitive data, and attach them to a logical datasource name.  (when requested on the command-line, your .cypher will have a search/replace of your text for the "actual" credential information retrieved from the secure-string stored in the registry before it is submitted to the neo4j engine. 

Then execute your cypher by running get-cypher-results.ps1:

.\get-cypher-results.ps1 -Datasource 'N4jDataSource' -cypherscript 'C:\path-to-my-script\myscript.cypher' -logging 'N4jDataSource'


The get-cypher-results.ps1 will segment your script into transactions (a semicolon followed by a linefeed)

You can also give "sections" of your code a label by using the keyword section at the beginning of comments in the cypher script:

// section Main import routine to create (:Asset) nodes

Each transaction will be run and the metadata results will be (optionally) recorded in a log entry (per transaction).  The logging is done (of course) as a neo4j graph using the label 
(:Cypherlogentry) The following counter items will be recorded as properties:

(how long did the transaction take to run)
Version (of the target Neo4j server)
date (epoch when the transaction ran)
linenumber (of where this transaction begins in the script)
script (full path and filename of the .cypher script)
section (named section of code)
server: fqdn or IP and port of the neo4j server
source: name of the computer the powershell script was executed from
error: (any exception error thrown by the neo4j engine will be recorded here)

All transactions from a single .cypher script will be bookended by a "BEGIN SCRIPT" and "END SCRIPT" section marker, with the END SCRIPT logging a "ResultAvailableAfter" that is a sum of all the transactions within the script.

All entries for a particular script execution will be tied together with a relationship: 
-[:PART_OF_SCRIPT_EXECUTION]- The wrapper will complete the execution and supply some example cypher queries to return the error logging for that execution.

This gave me a method to quickly run batches of .cypher code against a neo4j database, and determine if I generated any exceptions, and log metadata to track trends for code sections.

All the scripts referenced in this post are available at github.com/pdrangeid/n4j-pswrapper