Transforming API output for Relational Databases

Intended Audience: Python Programmers who wish to use the Axonius Python API Client to pull data from Axonius into a relational database. 

Pre-Requisites: Users need to have installed and setup the Axonius Python API Client.

Purpose: To provide a walkthrough of the useful functions offered by the Axonius Python API Client that assist in transforming the JSON based output into a normalized structure usable by RDBMS.

 

Introduction

Axonius contains a huge array of information that are neatly grouped into json objects (referred to as "complex objects") for easy consumption by the UI and JSON databases. However, for Relational Databases, the JSON based orientation does not fit neatly, each JSON record may contain fields with multiple entries where SQL expects only 1 value, passing SQL multiple values for 1 field will at best result in an error, at worst, end in field overflows.

The Client API provides an array of export functions from CSV, to tabular format to JSON, however to meet the requirements specific to relational databases a series of transformations must take place rather than a singular command. In order to overcome the gap between JSON and RDBMS accepted formats, the following must occur:

  1. complex objects will need to be flattened and simplified
  2. records which contain multiple values will need to be duplicated for each value in order to prevent data loss.

Approach

To achieve the required format, the Axonius API client offers two very powerful commands that when combined, massage the data in a way that may be leveraged by a relational database:

  1. field_flatten : This command takes a complex object and creates multiple fields according to its contents. For example the standard format for installed software appears as such within a single field:
  2. field_explode : for specific fields in axonius, each record returned by the API may contain tens to hundreds to thousands of entries within that single field. SQL however expects each field to contain only 1 entry per record. In order to format fields that contain lists into something more palatable for SQL, we use the explode function to duplicate the record for each entry in the list of 1 specific field.

 

field_flatten

To illustrate how field_flatten works, we will focus on the installed_software field. This contains a list of objects and is used in a large amount of use cases and is a complex field with around 7 attributes. If we wish to access each of those fields we will need to extract each attribute into its own column.

Normal output

Command Example: devices.get(fields = ["specific_data.data.installed_software"])

"specific_data.data.installed_software": [
{
     "description": "Info-ZIP Zip for Win32 console",
     "name": "Zip",
     "name_version": "Zip-3.0.0.0",
     "path": "C:\\Users\\Administrator\\AppData\\Local\\BigFix\\Enterprise Console\\localhost\\EvaluationUser\\Sites\\Software Distribution\\zip.exe",
     "vendor": "Info-ZIP",
     "version": "3.0.0.0",
     "version_raw": "000000003000000000000000000000000"
   }
]

Using field_flatten

Command Example: devices.get(fields = ["specific_data.data.installed_software"], field_flatten = True)  

"specific_data.data.installed_software.description": "Info-ZIP Zip for Win32 console",
"specific_data.data.installed_software.name": "Zip",
"specific_data.data.installed_software.name_version": "Zip-3.0.0.0",
"specific_data.data.installed_software.path": "C:\\Users\\Administrator\\AppData\\Local\\BigFix\\Enterprise Console\\localhost\\EvaluationUser\\Sites\\Software Distribution\\zip.exe",
"specific_data.data.installed_software.vendor": "Info-ZIP",
"specific_data.data.installed_software.version": "3.0.0.0",
"specific_data.data.installed_software.version_raw": "000000003000000000000000000000000

In this example we can see that, for instances where we have an object field, we can extract the contents of an object field into their own unique columns which are much more palatable for SQL based DBs. 

However, this flattening does not work where the field contains a list of objects. Flattening a list will end up with the expected columns however it will fill each column with a list of values for that column, so not quite there yet. So to get this to where we need it to be, we need some way to duplicate the information outside the complex object in order to achieve the goal.

 

field_explode

Using the same example with a little more data we can demonstrate how field_explode assists with data manipulation and helps make up for the missing puzzle piece left by field flatten. Examining the installed software field we can see that each device may have more than one piece of software installed, so the field will contain multiple values. In order to extract out this information we will need to be able to create a record for each device and software combination.

Standard output

Command Example: devices.get(fields = ["specific_data.data.installed_software"]) 

"internal_axon_id": "1286b9913520845bb9f644e93755760e",
"specific_data.data.installed_software": [
{
"description": "Info-ZIP Zip for Win32 console",
"name": "Zip",
"name_version": "Zip-3.0.0.0",
"path": "C:\\Users\\Administrator\\AppData\\Local\\BigFix\\Enterprise Console\\localhost\\EvaluationUser\\Sites\\Software Distribution\\zip.exe",
"vendor": "Info-ZIP",
"version": "3.0.0.0",
"version_raw": "000000003000000000000000000000000"
},
{
"description": "Google Update Setup",
"name": "Google Update",
"name_version": "Google Update-1.3.36.82",
"path": "C:\\Windows\\Temp\\{D46FC7F8-0387-46CF-A273-12C8D0454987}-GoogleUpdateSetup.exe",
"vendor": "Google LLC",
"version": "1.3.36.82",
"version_raw": "000000001000000030000003600000082"
}]

We can see two objects in the same installed_software field and a unique ID. For SQL, we would expect to see 2 rows with 2 different entries under the software fields but the same hostname. by applying the following command we end up with that result.

Using Field_Explode

Command Example : devices.get(fields = ["specific_data.data.installed_software"], field_explode = 'installed_software')

Row 1

"internal_axon_id": "1286b9913520845bb9f644e93755760e",
"specific_data.data.installed_software": 
{
"description": "Info-ZIP Zip for Win32 console",
"name": "Zip",
"name_version": "Zip-3.0.0.0",
"path": "C:\\Users\\Administrator\\AppData\\Local\\BigFix\\Enterprise Console\\localhost\\EvaluationUser\\Sites\\Software Distribution\\zip.exe",
"vendor": "Info-ZIP",
"version": "3.0.0.0",
"version_raw": "000000003000000000000000000000000"
}

Row 2

"internal_axon_id": "1286b9913520845bb9f644e93755760e",
"specific_data.data.installed_software":
{
"description": "Google Update Setup",
"name": "Google Update",
"name_version": "Google Update-1.3.36.82",
"path": "C:\\Windows\\Temp\\{D46FC7F8-0387-46CF-A273-12C8D0454987}-GoogleUpdateSetup.exe",
"vendor": "Google LLC",
"version": "1.3.36.82",
"version_raw": "000000001000000030000003600000082"
}

 Using the explode feature we can see now that there are 2 rows, both with the unique ID (internal_axon_id) and a different installed_software entry. This eliminates the issue of data being lost by being lumped together in a single record.

Combining it all together

Using both commands together we can now flatten out the object into separate fields, allowing us to get at the data contained within each object, as well as exploding out the lists, allowing us to see all list items on a separate row. This can be performed all in the same call without additional steps on the client side.

Standard Output

Command Example: devices.get(fields = ["specific_data.data.installed_software"]) 

"internal_axon_id": "1286b9913520845bb9f644e93755760e",
"specific_data.data.installed_software": [
{
"description": "Info-ZIP Zip for Win32 console",
"name": "Zip",
"name_version": "Zip-3.0.0.0",
"path": "C:\\Users\\Administrator\\AppData\\Local\\BigFix\\Enterprise Console\\localhost\\EvaluationUser\\Sites\\Software Distribution\\zip.exe",
"vendor": "Info-ZIP",
"version": "3.0.0.0",
"version_raw": "000000003000000000000000000000000"
},

{
"description": "Google Update Setup",
"name": "Google Update",
"name_version": "Google Update-1.3.36.82",
"path": "C:\\Windows\\Temp\\{D46FC7F8-0387-46CF-A273-12C8D0454987}-GoogleUpdateSetup.exe",
"vendor": "Google LLC",
"version": "1.3.36.82",
"version_raw": "000000001000000030000003600000082"
}]



Output using both Explode and Flatten

Example Command: devices.get(fields=['specific_data.data.installed_software'], field_flatten = True, field_explode = 'installed_software')

Row 1

"internal_axon_id": "1286b9913520845bb9f644e93755760e",
"specific_data.data.installed_software.description": "Info-ZIP Zip for Win32 console",
"specific_data.data.installed_software.name": "Zip",
"specific_data.data.installed_software.name_version": "Zip-3.0.0.0",
"specific_data.data.installed_software.path": "C:\\Users\\Administrator\\AppData\\Local\\BigFix\\Enterprise Console\\localhost\\EvaluationUser\\Sites\\Software Distribution\\zip.exe",
"specific_data.data.installed_software.vendor": "Info-ZIP",
"specific_data.data.installed_software.version": "3.0.0.0",
"specific_data.data.installed_software.version_raw": "000000003000000000000000000000000"

 

Row 2

"specific_data.data.internal_axon_id": "1286b9913520845bb9f644e93755760e",
"specific_data.data.installed_software.name": "Google Update",
"specific_data.data.installed_software.version": "1.3.36.82",
"specific_data.data.installed_software.name_version": "Google Update-1.3.36.82",
"specific_data.data.installed_software.major_version": null,
"specific_data.data.installed_software.major_minor_version": null,
"specific_data.data.installed_software.architecture": null,
"specific_data.data.installed_software.description": "Google Update Setup",
"specific_data.data.installed_software.category": null,
"specific_data.data.installed_software.vendor": "Google LLC",
"specific_data.data.installed_software.publisher": null,
"specific_data.data.installed_software.cve_count": null,
"specific_data.data.installed_software.sw_license": null,
"specific_data.data.installed_software.path": "C:\\Windows\\Temp\\{D46FC7F8-0387-46CF-A273-12C8D0454987}-GoogleUpdateSetup.exe",
"specific_data.data.installed_software.source": null,
"specific_data.data.installed_software.installed_on": null



Handling Multiple Complex Objects

When you are hoping to read through 2 or more complex objects (say you wish to bring in CVE's alongside your software associations), this method will require some additional effort. The field_explode function only works on individual fields at one time, so you cannot explode two separate fields at the same time.

To accomplish this, determine the complex fields you wish to extract and use a loop to call the API for each complex object individually. Each object alongside the internal_axon_id can be used to create several tables with a common primary key (note: internal_axon_id is always unique for every device). 

Example:

complex_obj_list = ['specific_data.data.installed_software', 'specific_data.data.software_cves']

for item in complex_obj_list:
  api_output = client.devices.get(fields=[item], field_flatten = True, field_explode=item)
  
#Process each table here:

  print(api_output[0])

 

Wrapping Up & Further Documentation

Using this approach, you are left with a flat python dict file that can be directly ingested via any python SQL library. This allows you to bring through any complex field from Axonius and conform it to standard relational database forms (1 field of 1 row contains 1 value). This allows for the construction of custom queries and visualizations using the relational format which is widely adopted.

Useful Axonius API Documentation

Axonius Python API Client Github: Link

Axonius Python API Client Documentation:  Link

List of Export Callbacks (Including field_explode, and field_flatten) - Link

CSV exports - Link 

JSON exports -  Link 

Python Libraries for SQL:

SQL Alchemy - Link

SQLite3 Connector - Link

MySQL Connector - Link



0

Comments

0 comments

Please sign in to leave a comment.

Didn't find what you were looking for?

New post