GraphCommit: 2019

Saturday, December 7, 2019

graph-commit project

graph-commit project: Powershell "interpreter" for Cypher

This is the evolution from my 2nd draft: Using Powershell to execute cypher scripts with secure credentials and logging results/errors.

What was the issue?

I was running several .cypher scripts as a scheduled task on Windows using cypher-shell to execute them. This was fine, however my .cypher files had to provide plain-text to authenticate to various REST-API sites (and other datasources) I was using to feed my Neo4j database. So I created some powershell credential storage/retrieval functions.

What was the other issue?
As I made changes to my scripts, I would inevitably write some syntax errors into my cypher scripts, and unknowingly break my import process. But often, just break it a little. Unless I manually ran each bit of code in the Neo4j Browser, I didn't have a simple method to verify or validate the results (or lack-thereof).

MY SOLUTION: A set of powershell scripts/cmdlets that would allow cypher execution and log the results (and some statistics meta-data), while showing syntax errors (exceptions) from the cypher. I use the term "interpreter" loosely.

HOW IT WORKS:

Provide a Cypher query language (cql) source file script that you wish to execute. Be sure to replace any sensitive credentials or session keys with a unique string or "placeholder text" For example: mysecretvaluegoeshere
Supply your Neo4j database destination & credentials by first using set-regcredentials.ps1
Then supply any additional (API, web credentials) also using set-regcredentials.ps1

Any credentials stored with set-regcredentials store the password (in the registry) with secure-string. This is relatively secure, as it can only be retrieved by the SAME username logged onto the SAME computer. In a future version I'd prefer the script retrieve credentials from a secure store like Vault (HashiCorp)
When executing the commandline you can supply -creds1 (thru -creds4) to have the wrapper perform a find-replace based on the key/value pair you created with set-regcredentials.ps1
Alternatively you can supply data in realtime on the commandline using the -findrep switch (then you supply json values of find/replace string pairs) This is useful for manual testing, or if you are supplying a dynamic sessionkey. This would be common when authenticating to a WebAPI that gives you a one-time-use key for authorization. We use this method when graphing data using the Veeam Backup Enterprise Manager webAPI.

Then execute your cypher by running get-cypher-results.ps1:
POWERSHELL

.\get-cypher-results.ps1 -Datasource 'N4jDataSource' -cypherscript 'C:\path-to-my-script\myscript.cypher' -logging 'N4jDataSource' -creds1 'mycredname'

Or let's say you wanted to embed some cypher execution within another powershell script. You may do something like this:
POWERSHELL

cd "$env:programfiles\blue net inc\graph-commit"
. "$PSScriptRoot\bg-sharedfunctions.ps1" | Out-Null
$neo4jdatasource = "myn4jserver"
$scriptpath = -join ($PSScriptRoot,"\get-cypher-results.ps1")
$csp= "c:\the-path-to\my-source-script.cypher"
$frstring='{"mysecretvaluegoeshere":"1234567890abcd"}'
. $scriptPath -Datasource $neo4jdatasource -cypherscript $csp -logging $neo4jdatasource -findrep $frstring -verbosity 1

Results:

The get-cypher-results.ps1 will segment your script into individual transactions (a semicolon followed by a linefeed)

You can also give "sections" of your code a label by using the keyword section at the beginning of comments in the cypher script:

// section Main import routine to create (:Asset) nodes
...
Each transaction will be run and the metadata results will be (optionally) recorded in a log entry (per transaction). The logging is done (of course) as a neo4j graph using the label (:Cypherlogentry) The following counter items will be recorded as properties:

ConstraintsAdded
ConstraintsRemoved
IndexesAdded
IndexesRemoved
LabelsAdded
LabelsRemoved
NodesDeleted
Notifications
Plan
Profile
PropertiesSet
RelationshipsCreated
RelationshipsDeleted
ResultAvailableAfter (how long did the transaction take to run)
StatementType
Version (of the target Neo4j server)
date (epoch when the transaction ran)
linenumber (of where this transaction begins in the script)
script (full path and filename of the .cypher script)
section (named section of code)
server: fqdn or IP and port of the neo4j server
source: name of the computer the powershell script was executed from
error: (any exception error thrown by the neo4j engine will be recorded here)
All transactions from a single .cypher script will be bookended by a "BEGIN SCRIPT" and "END SCRIPT" section marker, with the END SCRIPT logging a "ResultAvailableAfter" that is a sum of all the transactions within the script.

All entries for a particular script execution will be tied together with a relationship: -[:PART_OF_SCRIPT_EXECUTION]- The wrapper will complete the execution and supply some example cypher queries to return the error logging for that execution.

This gave me a method to quickly run batches of .cypher code against a neo4j database, and determine if I generated any exceptions, and log metadata to track trends for code sections.

GRAPH-COMMIT Installation procedure:

PRE-REQUISITES:

Windows PC or server
An existing neo4j database installation. If you need help visit the Configuring Neo4j server post.
Powershell v5.0 or newer
DotNET neo4j driver (script will install if it is missing)

Script Repositories:

Github repo: pdrangeid/graph-commit
Please review github code before running it in your environment! Be safe folks!

powershell

set-executionpolicy remotesigned -scope Process

Run the following commands to download them from the git repositories via
POWERSHELL

[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
$client = new-object System.Net.WebClient
$client.DownloadFile("https://raw.github.com/pdrangeid/graph-commit/master/update-modules.ps1","$Env:Temp\update-modules.ps1")
. "$Env:Temp\update-modules.ps1"

The 'update-modules.ps1' script can be used to update graph-commit scripts automatically (or from other git repositories) I use to to automatically update my own operational scripts via windows scheduled tasks.

Configure your Neo4j datasource by providing the servername and credentials. This will also verify that you have the DotNet neo4j driver installed (will prompt to install for you if it is missing)

POWERSHELL

cd "$env:Programfiles\Blue Net Inc\Graph-Commit"
.\set-regcredentials.ps1 -credname myn4jserver -n4j

If you haven't already installed the Neo4j DotNet driver, click no, and then answer YES when prompted to install it for you.

You will be prompted within the powershell to confirm installation of the Neo4j driver.
If it is successfully installed you will next be prompted to verify the logical name for the Neo4j datasource to be stored:

Next provide the address for your Neo4j server. Neo4j's binary binary protocol is bolt, and defaults to TCP port 7687. You can customize this and use name or ip address.

If the Neo4j server was able to be verified you will be asked to provide logon credentials.

If your credentials are correct you should see something like this:

You are now ready to run your .cypher scripts via powershell.

Review the get-cypher-results.ps1 script for more detail, but the command line works as such:

.\get-cypher-results.ps1 -Datasource 'yourdatasource' -cypherscript 'path-to-cypher-source-code' -creds1 -findrep {json string for find/replace pairs}
There are examples of using the get-cypher-results powershell wrapper for Cypher in my other posts.

Wednesday, November 20, 2019

Create Veeam Graph

via the Veeam Backup Enterprise Manager webAPI

Problem:

How can I identify VMs that were never properly configured for backups. Or somehow aren't being backed up at the frequency intended?

Solution:

Create a knowledge graph with data from my Veeam backup servers in order to verify that backups were configured and running for intended VMs.

For example: The data could be compared via query against data in your IT Asset Management that defines which machines are supposed to be protected.

You may find it helpful to also have your VMware data within your graph. Here's how to do that.

The schema for this graph is fairly simple:

But enough of all that, on to the instructions!

Prerequisites:

Already installed the graph-commit project

https://github.com/pdrangeid/graph-commit
run on workstation that has connectivity to the Veeam Backup Enterprise Manager webAPI
A Neo4j database server
User credentials to access the neo4j database instance
User credentials to access Veeam Backup Enterprise Manager webAPI
Neo4j /plugins:
Awesome Procedures on Cypher (APOC)
Optional: the vCenter graph project
https://github.com/pdrangeid/vmware-graph

Known Issues:

The scripts currently don't clean up after themselves. I'm still working out exactly how often to purge old job data. You could also just INIT the whole graph periodically.
This graph isn't intended to import ALL information about backups. It's focused on capturing the latest successful backup for each configured VM.

Installation: Steps (powershell)

Now download the scripts to run the veeam data ingester from the github repositories via powershell:
This will result in the scripts being downloaded into %programfiles%\blue net inc\Graph-Commit"\

POWERSHELL

cd "$env:programfiles\blue net inc\graph-commit"
.\update-modules.ps1 -gitrepo pdrangeid/veeam-maint -gitfile purge-veeam.cypher
.\update-modules.ps1 -gitrepo pdrangeid/veeam-maint -gitfile init-veeam-wrapper.ps1
.\update-modules.ps1 -gitrepo pdrangeid/veeam-maint -gitfile init-veeam.cypher
.\update-modules.ps1 -gitrepo pdrangeid/veeam-maint -gitfile refresh-veeam.cypher
.\update-modules.ps1 -gitrepo pdrangeid/veeam-maint -gitfile refresh-veeam-last-backup.cypher

If this is the first time using your neo4j database with my scripts, you will need to identify your Neo4j server location and provide credentials.

This cmdlet will also verify you have the DotNET neo4j driver installed (The set-regcredentials cmdlet can install it automatically for you using the nuget package manager):
POWERSHELL

.\set-regcredentials.ps1 -credname myn4jserver -n4j

The prerequisites (Nuget, Neo4J dotNet driver) will be validated and prompted to be installed if missing. Once complete it will validate connectivity to your neo4j database instance. A successful result should look like this:

Now let's set our veeam credentials and store them in the registry.This will display a prompt for you to supply your veeam username and password. This data will be stored in

HKEY_CURRENT_USER\Software\neo4j-wrapper\Credentials\yourveeamservername
The password will be stored as a securestring value which can only be decrypted on this computer when logged in as the user you are currently authenticated as now.

If successful you will see a message, you can also verify it in the registry:
POWERSHELL

.\set-regcredentials.ps1 -credname yourveeamservername -credpath "neo4j-wrapper\Credentials"

Let's test the script. By using the -sessionkey switch we indicate we don't want to run the script, but just authenticate to the VeeamAPI and return a session key to use.
POWERSHELL

.\init-veeam-wrapper.ps1 -baseapiurl http://yourveeamserver:9399/api -veeamcred myveeam -neo4jdatasource myn4jserver -sessionkey

If you returned a proper session id that means the wrapper script was able to retrieve a session key for authentication. Run the command again omitting the -sessionkey switch and adding the -init switch to run the script for real this time.

POWERSHELL

.\init-veeam-wrapper.ps1 -baseapiurl http://yourveeamserver:9399/api -veeamcred myveeam -neo4jdatasource myn4jserver -init

What does -init do?
The -init switch runs the initial ingestion of Veeam backups. It also takes the longest.

a) Creates (:Veeamserver) nodes
b) Create (:Veeamjob) nodes, and relates them to their (:Veeamserver)

c) Creates (:Veeamprotectedvm) nodes (these are all the VMs that Veeam is aware of)

d) Finally it locates restore points to discover the MOST RECENT restore point for each (:Veeamprotectedvm)
Discovery is performed from most recent through 32 days old. Once a valid restore point is discovered it stops trying to find valid restore points for that VM (remember, we're just trying to validate the most recent valid restore point for each protected asset)

If you have multiple Veeam backup servers, be sure to run the -init process for any additional Veeam API endpoints.
Now we want to put the Veeam backups into "buckets" identifying how recently they are backed up:

POWERSHELL

$scriptpath = "$env:programfiles\blue net inc\graph-commit\get-cypher-results.ps1"
$csp="$env:programfiles\blue net inc\graph-commit\refresh-veeam-last-backup.cypher"
. $scriptPath -Datasource 'myn4jserver' -cypherscript $csp -logging 'myn4jserver'

Finally, you can now run the lighter-weight "refresh" script periodically (I run it hourly).
You only need to re-run the "init" script if you want to purge the data and start over.

POWERSHELL

$scriptpath = "$env:programfiles\blue net inc\graph-commit\get-cypher-results.ps1"
$csp="$env:programfiles\blue net inc\graph-commit\refresh-veeam-last-backup.cypher"
. $scriptPath -Datasource 'myn4jserver' -cypherscript $csp -logging 'myn4jserver'

Review the Veeam data that was imported. Here are some sample cypher queries that will present an explorable graph:
CYPHER

// SHOW veeam backups
MATCH (lgb:Lastgoodbackup)
MATCH (lgb)--(vvm:Veeamprotectedvm)
return lgb,vvm

Show specific Job information:

CYPHER

// Show jobs, backups, VMs, and lastgoodbackup for any jobs with 'exchange' in the job name
MATCH (vs:Veeamserver)--(vj:Veeamjob)--(vb:Veeambackup) where toLower(vj.name) contains 'exchange'
OPTIONAL MATCH (vb)--(vvm:Veeamprotectedvm)--(lgb:Lastgoodbackup)
return vs,vj,vb,vvm,lgb

Tuesday, November 12, 2019

Create vCenter Graph

Import vCenter infrastructure into a knowledge graph using Neo4j

Yes, I could have directly queried the Vmware WebAPI, but dealing with self-signed certificates and discovering all the API queries would have been a LOT of work. RVTools conveniently already gathers ALL the data I'm looking for and exports it into a single Excel file, which makes this process quite a bit easier.

When complete this process will create the following database schema in your neo4j database:

Prerequisites:

Already installed the graph-commit project
https://github.com/pdrangeid/graph-commit
run on workstation that has connectivity to the vCenter appliance
A Neo4j database server
User credentials to access the neo4j database instance
User credentials to access vCenter
Neo4j /plugins:
Excel documents (see the neo4j server installation post)
Awesome Procedures on Cypher (APOC)
RVTools which can be downloaded from Robware: https://www.robware.net/rvtools/download/
Several scripts outlined below from the vmware-graph project
https://github.com/pdrangeid/vmware-graph

Known Issues

Only tested against vCenter clusters (not standalone vsphere host output)
The script only builds Standard vSwitch and ports/portgroups.
distributed virtual switches and ports ARE present in the .xls data export, but the .cypher will need modifications to properly map DV objects.

Installation: Steps (powershell)

Now download the script files to run the veeam data collector from the github repositories
POWERSHELL

cd "$env:programfiles\blue net inc\graph-commit"
.\update-modules.ps1 -gitrepo pdrangeid/vmware-graph -gitfile refresh-vmware.cypher

If this is the first time using your neo4j database with my scripts, you will need to identify your Neo4j server location and provide credentials. This cmdlet will also verify you have the DotNET neo4j driver installed (The set-regcredentials cmdlet can install it automatically for you using the nuget package manager)
POWERSHELL

.\set-regcredentials.ps1 -credname myneo4jserver -n4j

First let's generate your output file from rvtools.
The example below assumes we will use passthru authentication for the vCenter server. Review the RVTools documentation for specifying credentials.
The resulting excel document will be placed in the import subfolder within the neo4j installation path (adjust this for your environment)

POWERSHELL

[string] $RVToolsPathexe = ${env:ProgramFiles(x86)}+"\Robware\RVTools\RVTools.exe"
$Arguments = " -passthroughAuth -s fqdn.yourvcenterserver.com -c ExportAll2xlsx -d c:\neo4j-community-3.5.12\import 
-f fqdn.yourvcenterserver.com.xlsx"
$Process = Start-Process -FilePath $RVToolsPathExe -ArgumentList $Arguments -NoNewWindow -Wait

If all went well you should have your vcenter environment exported into the excel document in c:\neo4j-community-3.5.12\import

Now we want to run the import process to ingest the data into the graph.

The $findstring variable is used to perform a find/replace the placeholder (in the .cypher script you downloaded earlier) for the path/file to your excel document.

Replace the 'neo4jserver' with the name of the neo4j datasource credential you used with the set-regcredentials.ps1 earlier.

POWERSHELL

cd "$env:programfiles\blue net inc\graph-commit"
$scriptpath = -join ($env:ProgramFiles,"\blue net inc\graph-commit\get-cypher-results.ps1")
$findstring='{"path-to-vmware-import-file":"file:///c:/neo4j-community-3.5.12/import/fqdn.yourvcenterserver.com.xlsx"}'
$csp=$(-join ($env:programfiles,"\blue net inc\graph-commit\refresh-vmware.cypher"))
$result = . $scriptPath -Datasource 'myneo4jserver' -cypherscript $csp -logging 'myneo4jserver' -findrep $findstring

A successful import will cycle through the transactions and give you log queries to validate:

Use the Neo4j browser: http://your-neo4jserver:7474Login with your credentials
Review the cypher logs (run the log queries that were output from the script execution above)Review the VMware data that was imported.Here are some sample cypher queries that will present an explorable graph:

CYPHER

// SHOW vcenter, datacenter, cluster, folders and resource groups:
MATCH (vc:Vcenterserver)
MATCH (vc)--(vdc:Vspheredatacenter)
MATCH (vc)--(vcc:Vcentercluster)
WITH *,'/'+vdc.name as startpath
OPTIONAL MATCH (vf:Vfolder) where vf.path starts with startpath
OPTIONAL MATCH (vrp:Vresourcepool) where vrp.path starts with startpath
WITH *
MATCH (vm:Virtualmachine) where (vm)--(vf) or (vm)--(vrp) or (vm)--(vcc) or (vm)--(vdc)
return vc,vdc,vcc,vf,vrp,vm

DNS and NTP query:
CYPHER

// SHOW vSphereHosts DNS,NTP, and vCenter relationships MATCH (vh:Vspherehost)
OPTIONAL MATCH (vh)--(ds:Dnsserver)
OPTIONAL MATCH (vh)--(ns:Ntpserver)
OPTIONAL MATCH (vh)--(vc:Vcenterserver)
return vh,ds,ns,vc

vSphere Hosts and datastores:
CYPHER

// SHOW vSpherehost datastores, types, and vcenter
MATCH (vh:Vspherehost)
OPTIONAL MATCH (vh)--(ds:Vdatastore)
OPTIONAL MATCH (ds)--(dst:Vdatastoretype)
OPTIONAL MATCH (vh)--(vc:Vcenterserver)
return vh,ds,dst,vc

vSwitch, Portgroups, and Loadbalancing policies:
CYPHER

// SHOW vSwitch portgroups, and lbpolicies
MATCH (vh:Vspherehost)
OPTIONAL MATCH (vh)--(vs:Vswitch)
OPTIONAL MATCH (vs)--(vlbp:Vlbpolicy)
OPTIONAL MATCH (vpg:Vportgroup)
OPTIONAL MATCH (vhpg:Vhostportgroup)--(vpg)
RETURN vh,vs,vpg,vhpg,vlbp

Configuring Neo4j server

Configuring Neo4j server:

Yes, there are plenty of tutorials for setting up Neo4j already, but I wanted to focus on a few settings that makes it easier to use it with data integration.

This tutorial will focus on Neo4j server for windows. It is NOT very complicated, and you'll be up and running in no time flat:

What you need to before you begin

System Requirements:

Windows PC or server
JAVA (OpenJDK/Oracle/IBM Java) 8 or greater
Neo4j Server or Desktop Edition.For these instructions I used the community edition which you can download from https://neo4j.com/download-center/#community

Extract the archive into the folder you want to be your installation folder

Use an editor to modify /conf/neo4j.conf

Setting

Action

Description

#dbms.directories.import=import

Comment out

allows custom imports on-demand

apoc.import.file.enabled=true

add

Allow APOC file imports

#dbms.memory.heap.initial_size=5g

#dbms.memory.heap.max_size=5g
#dbms.memory.pagecache.size=7g

customize memory

Memory configuration will depend on how large your graphDB will become. Here's a good primer:
https://neo4j.com/docs/operations-manual/current/tools/neo4j-admin-memrec/

dbms.connectors.default_listen_address=0.0.0.0

uncomment

Allows non-local connections

Configure Plugins:

Plugin

Description

Download URL

Notes

APOC library

Don't ask, just install it. No really! You want this.

https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/

Install binary .jar into /plugins folder

MSSQL JDBC

Add this if you need to connect to MS SQL

https://go.microsoft.com/fwlink/?linkid=2137600

Extract mssql-jdbc-7.x.x.jre8.jar into /plugins folder

Excel (multiple file formats)

To support import from these formats download the dependencies

https://repo1.maven.org/maven2/org/apache/poi/poi/4.1.2/poi-4.1.2.jar
https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml/4.1.2/poi-ooxml-4.1.2.jar
https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml-schemas/4.1.2/poi-ooxml-schemas-4.1.2.jar
https://repo1.maven.org/maven2/org/apache/xmlbeans/xmlbeans/3.1.0/xmlbeans-3.1.0.jar
https://repo1.maven.org/maven2/com/github/virtuald/curvesapi/1.06/curvesapi-1.06.jar

Place these .jar files into the /plugins folder

Advantage Database

To connect to Advantage (sybase) SQL via JDBC

http://devzone.advantagedatabase.com/dz/content.aspx?Key=20&Release=19&Product=12

This is to support the CRM I use (CommitCRM)

Configure Windows Service

Neo4j should be configured to run as a Windows service. Launch a command shell, and install the service from within the /bin folder

neo4j install-service

If you are upgrading from an older version, you will need to first unregister the service for the old version:
neo4j uninstall-service
Set Initial Password

neo4j-admin set-initial-password mysupersecretpassword
Start the service

sc start neo4j

(or use the service control panel)

That's it! You should be ready to go with a neo4j server that's ready to connect to SQL Server, import from CSV/XLS files, and you will have the APOC library plugins at your disposal!

Setting	Action	Description
#dbms.directories.import=import	Comment out	allows custom imports on-demand
apoc.import.file.enabled=true	add	Allow APOC file imports
#dbms.memory.heap.initial_size=5g #dbms.memory.heap.max_size=5g #dbms.memory.pagecache.size=7g	customize memory	Memory configuration will depend on how large your graphDB will become. Here's a good primer: https://neo4j.com/docs/operations-manual/current/tools/neo4j-admin-memrec/
dbms.connectors.default_listen_address=0.0.0.0	uncomment	Allows non-local connections

Plugin	Description	Download URL	Notes
APOC library	Don't ask, just install it. No really! You want this.	https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/	Install binary .jar into /plugins folder
MSSQL JDBC	Add this if you need to connect to MS SQL	https://go.microsoft.com/fwlink/?linkid=2137600	Extract mssql-jdbc-7.x.x.jre8.jar into /plugins folder
Excel (multiple file formats)	To support import from these formats download the dependencies	https://repo1.maven.org/maven2/org/apache/poi/poi/4.1.2/poi-4.1.2.jar https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml/4.1.2/poi-ooxml-4.1.2.jar https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml-schemas/4.1.2/poi-ooxml-schemas-4.1.2.jar https://repo1.maven.org/maven2/org/apache/xmlbeans/xmlbeans/3.1.0/xmlbeans-3.1.0.jar https://repo1.maven.org/maven2/com/github/virtuald/curvesapi/1.06/curvesapi-1.06.jar	Place these .jar files into the /plugins folder
Advantage Database	To connect to Advantage (sybase) SQL via JDBC	http://devzone.advantagedatabase.com/dz/content.aspx?Key=20&Release=19&Product=12	This is to support the CRM I use (CommitCRM)