Best Practices: MID Server Administration

Below are some of my personal thoughts and advice on MID Server configuration, set-up and maintenance. Some of these are pretty well known and adopted, and some are based on my personal experiences.

  1. Basics

    a. Sizing / Server Specs

    Specification wise, RAM/Memory and CPU Core Count/(and associated threads) are the most important. Usually 8GB RAM, 4 Core CPU is recommended with 40+GB disk and 64bit OS which is a requirement being enforced going forward. Within the MID server parameters there are options to use more RAM, Threads, etc. Generally specifying to use more RAM is recommended, and using more threads, results can vary.

    b. Network (Bandwidth and Location)

    This point is the key reason that MID servers are needed in the first place. They are a necessary appliance so that you can securely interact with other resources and devices on a network. It’s important to have fast connection speed (usually 100mb/s upload is recommended), and put the MID server physically close to the devices it is interacting with, for quick and optimized interactions. If you have multiple data centers, the servers should be spread out in those data centers, and put ones on similar subnets as the majority of devices.

  2. Segregate by Purpose

    a. History

    The biggest flaw I’ve always seen with enterprise MID server set ups, is that there is no separation between Discovery and Orchestration usage. Many Discovery credentials are server admin or granted powerful rights to log into machines and run discovery commands. There are a couple ways you can force orchestration tasks to use specific credentials, using credential tags, and hardcoding MID server IP relationships with the target Orchestration device. However you can’t lock down the Discovery credentials to only be used for discovery, which can be a major security issue.

    b. Resolution

    The only guaranteed mechanism for locking down those credentials is to lock them down to only be allowed to use on specific MID servers. Additionally on those MID servers you have the options to not allow Orchestration application. Thus it’s critical to have separate MID servers for Discovery, Orchestration, JDBC and other activities to enforce proper credential usage.

  3. Security

    Review the MID Server Hardening Guide.

    There are a number of overlooked recommendations, for setting specific security parameters, like disabling SSL, TLS 1.0 and 1.1, setting the DH group value and encrypting proxy passwords (if applicable).

  4. Closing Thoughts

    Besides just initial setup, every administrator knows that you have to keep current with upgrades, restarts, and rekeying the credentials. Besides those standard activities to take it a step further, you can set up proactive monitoring, such as using the built in MID Server resource threshold alerts, or advanced tools like PerfMon, Microsoft SCOM or Datadog.

Hope everyone learned something! I’ve been absent, but hopefully we can start off 2021 right and have a lot of content. Please comment your ideas or anything you want to see!

Monitoring Series: Research into ServiceNow Performance Dashboard

Below are some of my ramblings, thoughts and research into the ‘servlet performance metrics’ dashboard from ServiceNow, and how this custom UI page homepage really works.

servlet performance.PNG

Observation #1 - The Front End is in the Perf Dashboard Plugin

This dashboard is part of the core system plugin, ‘Performance Dashboards’ (com.glide.performance_dashboards'), which is an extremely lightweight plugin for just housing the shell UI and a couple supporting scripts.

Observation #2 - The Library used is JRobin (Derived from RRDtool)

Within the scripts you can see data references to tables starting with jrobin, and those point to the JRobin Plugin (RRDS), which is just a Java implementation of the RRDtool system. They even kept the Robin Robin heritage by letting the table labels start with ‘Round Robin’. It’s worth noting that you can’t see any of these jrobin tables OOB, they are locked down to maint, I had to go into each individual read ACL, and add roles to be able to view it.

rrdtool.png

Observation #3 - ServiceNow uses an RRDTool Database to Store Monitoring data

This leads us to a another discovery, that all this data is being parsed from an Round Robin Database (RRD), and then there are supporting tables in ServiceNow which define the data refresh intervals (spoiler, they all refresh every 2min), and information about the Round Robin Archive. I found a good introduction to RRD here.

Observation #4 - ServiceNow Undocumented Monitoring APIs

While snooping around in the client side javascript, I found reference to the APIs which are called to provide the data. The sys_id’s needed to call these APIs are the in jrobin tables, and there is also other client side parameters.

p = "/api/now/v1/performance_dashboards/data_categories",
f = "/api/now/v1/performance_dashboards/graphs/overlays",
m = "/api/now/v1/performance_dashboards/graphs/category/<dataCategoryId>",
g = "/api/now/v1/performance_dashboards/graphs/<graphSysId>",
y = "/api/now/v1/performance_dashboards/events",
v = "/api/now/v1/performance_dashboards/nodes",
_ = "api/now/v1/performance_dashboards/suggestions"

Future Observations…

I would like to look more into the list of jrobin_graph_line’s and understand how the aggregator relates to the data source (jrobin_datasource).

I want to do some testing and see what format and parameters are needed to use those Performance Dashboards APIs.

I find it interesting how rrd4j appears to be the more widely adopted java port of RRDTool, vs jrobin. I could see ServiceNow eventually move to this, if they don’t discontinue or entirely re-structure their db monitoring backend. But ServiceNow has stuck with JRobin since 2006, so I find it doubtful it would change any time soon.