Monitoring Series: Database Size Report

ServiceNow instance performance is a cause of concern and notice for any ServiceNow admin. To get ahead of the curve, it is a good idea to keep an eye on your instance database size.

Database size, the sum of all the tables and table index data stored in the instance database server. It’s easy to ask ServiceNow directly or fill out their self service form on the Support page, but even easier to just make a report and put in on a dashboard yourself!

Step by Step

  1. First go to Reports > Create New

  2. Set a name like “Current Database Size (MB)”, and select the table UA Instance Info [ua_instance_state].

  3. Select type as Speedometer (you can also do single score, but it’s nice to set ranges!)

  4. On the configure tab, select Aggregation as Sum and field as “Primary database size (MB)”

  5. On the style tab, you can set the direction as Minimize, and set reasonable ranges you think your database size should fall under.

  6. Finally, you can go and add it to a dashboard.

The finished result should look something like this:

db_size_detail.PNG

This report is an excellent addition to any system administrator’s dashboard.

Faster API than GlideRecord?

What if I told you there was an even more optimized way too get data from a ServiceNow table on the server side besides GlideRecord?


This idea stemmed from my research on how the Table REST API allows you to just select certain fields from the table, and that optimizing your query significantly.

There is a similar trick using GlideAggregate, and just adding each field individually, like you would for counting. Props to Tony DiRienzo for posting this on the Community.

var ga = new GlideAggregate('sys_user'); ga.groupBy('user_name'); ga.groupBy('sys_id'); // This makes sure we get all records, even if there are duplicates ga.query(); while (ga.next()) { // Do what you need to with the user_name field var userName = ga.getValue('user_name'); }

In many cases this type of query runs up to 5 times faster, and works better for fewer variables. To expand it, all you need to do is add another groupBy() for each field you need included.

Only caveat here is that you lose the performance gain if you dot-walk in the results.

This is just another great tool to add to a developers toolkit to build ServiceNow applications.

Why Deleting User Records is Slow

If you’ve ever spent any time as a system administrator with ServiceNow, you’ve probably had to delete a User record, either as demo data, created in mistake, or to clean up the data in general.

The Problem

Deleting data on the sys_user table takes a really long time, to the tune of minutes per user.

The Why

This is because of how many table column references there are to the user table, and how with cascade delete rules, that every one of those columns and related records must be queried for each user deleted.

The Fix (or closest to it…)

The easiest first step is to run use the setWorkflow(false) function when deleting records. This stops any notifications or business rules from firing.

Next, delete a single user using a background script and look at the detailed output. If there are messages like:

Could not find field u_userref in table u_example during cascade delete

Locate the sys_dictionary record for that column and delete it. Sometimes there are leftover user column references when the column is deleted, but the record still remains.

Next look at all sys_dictionary user references, and review the Cascade Delete rule settings for those columns. Consider changing some of the columns to None if maintaining the reference is not important. More information on Cascade Delete rules can be found here: https://docs.servicenow.com/bundle/paris-platform-administration/page/administer/field-administration/task/t_CascadeDeleteRules.html

Once all those fixes have been applied, you can see a drastic improvement in the speed of the deletes. Even cutting the time in half!

Extra credit: ServiceNow provides an SQL Debugger tool which can help you identify where the slowest parts of the DB operation are, including Cascade Delete operations.

Best Practices: MID Server Administration

Below are some of my personal thoughts and advice on MID Server configuration, set-up and maintenance. Some of these are pretty well known and adopted, and some are based on my personal experiences.

  1. Basics

    a. Sizing / Server Specs

    Specification wise, RAM/Memory and CPU Core Count/(and associated threads) are the most important. Usually 8GB RAM, 4 Core CPU is recommended with 40+GB disk and 64bit OS which is a requirement being enforced going forward. Within the MID server parameters there are options to use more RAM, Threads, etc. Generally specifying to use more RAM is recommended, and using more threads, results can vary.

    b. Network (Bandwidth and Location)

    This point is the key reason that MID servers are needed in the first place. They are a necessary appliance so that you can securely interact with other resources and devices on a network. It’s important to have fast connection speed (usually 100mb/s upload is recommended), and put the MID server physically close to the devices it is interacting with, for quick and optimized interactions. If you have multiple data centers, the servers should be spread out in those data centers, and put ones on similar subnets as the majority of devices.

  2. Segregate by Purpose

    a. History

    The biggest flaw I’ve always seen with enterprise MID server set ups, is that there is no separation between Discovery and Orchestration usage. Many Discovery credentials are server admin or granted powerful rights to log into machines and run discovery commands. There are a couple ways you can force orchestration tasks to use specific credentials, using credential tags, and hardcoding MID server IP relationships with the target Orchestration device. However you can’t lock down the Discovery credentials to only be used for discovery, which can be a major security issue.

    b. Resolution

    The only guaranteed mechanism for locking down those credentials is to lock them down to only be allowed to use on specific MID servers. Additionally on those MID servers you have the options to not allow Orchestration application. Thus it’s critical to have separate MID servers for Discovery, Orchestration, JDBC and other activities to enforce proper credential usage.

  3. Security

    Review the MID Server Hardening Guide.

    There are a number of overlooked recommendations, for setting specific security parameters, like disabling SSL, TLS 1.0 and 1.1, setting the DH group value and encrypting proxy passwords (if applicable).

  4. Closing Thoughts

    Besides just initial setup, every administrator knows that you have to keep current with upgrades, restarts, and rekeying the credentials. Besides those standard activities to take it a step further, you can set up proactive monitoring, such as using the built in MID Server resource threshold alerts, or advanced tools like PerfMon, Microsoft SCOM or Datadog.

Hope everyone learned something! I’ve been absent, but hopefully we can start off 2021 right and have a lot of content. Please comment your ideas or anything you want to see!

Monitoring Series: Research into ServiceNow Performance Dashboard

Below are some of my ramblings, thoughts and research into the ‘servlet performance metrics’ dashboard from ServiceNow, and how this custom UI page homepage really works.

servlet performance.PNG

Observation #1 - The Front End is in the Perf Dashboard Plugin

This dashboard is part of the core system plugin, ‘Performance Dashboards’ (com.glide.performance_dashboards'), which is an extremely lightweight plugin for just housing the shell UI and a couple supporting scripts.

Observation #2 - The Library used is JRobin (Derived from RRDtool)

Within the scripts you can see data references to tables starting with jrobin, and those point to the JRobin Plugin (RRDS), which is just a Java implementation of the RRDtool system. They even kept the Robin Robin heritage by letting the table labels start with ‘Round Robin’. It’s worth noting that you can’t see any of these jrobin tables OOB, they are locked down to maint, I had to go into each individual read ACL, and add roles to be able to view it.

rrdtool.png

Observation #3 - ServiceNow uses an RRDTool Database to Store Monitoring data

This leads us to a another discovery, that all this data is being parsed from an Round Robin Database (RRD), and then there are supporting tables in ServiceNow which define the data refresh intervals (spoiler, they all refresh every 2min), and information about the Round Robin Archive. I found a good introduction to RRD here.

Observation #4 - ServiceNow Undocumented Monitoring APIs

While snooping around in the client side javascript, I found reference to the APIs which are called to provide the data. The sys_id’s needed to call these APIs are the in jrobin tables, and there is also other client side parameters.

p = "/api/now/v1/performance_dashboards/data_categories",
f = "/api/now/v1/performance_dashboards/graphs/overlays",
m = "/api/now/v1/performance_dashboards/graphs/category/<dataCategoryId>",
g = "/api/now/v1/performance_dashboards/graphs/<graphSysId>",
y = "/api/now/v1/performance_dashboards/events",
v = "/api/now/v1/performance_dashboards/nodes",
_ = "api/now/v1/performance_dashboards/suggestions"

Future Observations…

I would like to look more into the list of jrobin_graph_line’s and understand how the aggregator relates to the data source (jrobin_datasource).

I want to do some testing and see what format and parameters are needed to use those Performance Dashboards APIs.

I find it interesting how rrd4j appears to be the more widely adopted java port of RRDTool, vs jrobin. I could see ServiceNow eventually move to this, if they don’t discontinue or entirely re-structure their db monitoring backend. But ServiceNow has stuck with JRobin since 2006, so I find it doubtful it would change any time soon.

Deleting Fast in ServiceNow

Another post dedicated to the die hard ServiceNow developers. Have you ever wondered how fast (or slow) the deletion methods that ServiceNow employs are? I’m talking about GlideRecord deleteRecord, GlideRecord deleteMultiple and GlideMultipleDelete execute (undocumented). Today we’re going to rank all the methods, and put some numbers behind the mystery.

It’s no surprise that deleteRecord does the worst due to the looping overhead. It’s surprising that deleteMultiple comes out on top of execute by a small margin of 74ms.

It’s no surprise that deleteRecord does the worst due to the looping overhead. It’s surprising that deleteMultiple comes out on top of execute by a small margin of 74ms.


Due to personal experience, I’ve noticed that some sources are prioritized in terms of deletes, so again here are numbers but with different sources overlaid on them. I ran tests with 1k, 10k, 50k, 100k, and 200k to get a fair representation.

This is the summation of about 350k record deletes, and what method handled the varied delete batches.

This is the summation of about 350k record deletes, and what method handled the varied delete batches.

In my personal experience again, I had a gut feeling that the workflow activities are somehow optimized for a slightly less deletion overhead - and script actions as being relatively slow.

Delete Multiple - The fastest method by order of magnitude of 10

GlideMultipleDelete execute - The second fastest method (10x slower than DeleteMutiple, but very fast)

Delete Record - There is absolutely never a good reason to use the method ever. It’s 1,270x slower than Delete Multiple. Unless you just want to waste everyone’s time. We’re talking 1s versus 24min difference for 200k deletes.

Note: The numbers shown were all run on a personal developer instance with no other processes running. Additionally I was running these delete methods with basically no filters. Filters will extremely slow down your deletion rate as well as how big the table is you’re operating on. Your numbers could vary based on node count, jobs running, active users, and the database server load and capacity as well. They could also vary depending on what else is running in your system (ie if your event log is overflowing, script actions will be slower).

Want to validate my results? Here is some sample code you can adapt.

    deleteUsingGRDelete : function(){
        this.startTimer();
        var grTestData = new GlideRecord(this.tbl);
        grTestData.query();
        while(grTestData.next()){
            grTestData.deleteRecord();
        }
        this.endTimer('GlideRecord deleteRecord');
    },
    deleteUsingGRDeleteMultiple : function(){
        this.startTimer();
        var grTestData = new GlideRecord(this.tbl);
        grTestData.query();
        grTestData.deleteMultiple();
        this.endTimer('GlideRecord deleteMultiple');
    },
    deleteUsingGlideMultiple : function(){
        this.startTimer();
         var md = new GlideMultipleDelete(this.tbl);
         md.execute();
        this.endTimer('GlideMultipleDelete execute');
    },
    startTimer : function(){
        this.start = new Date().getTime();
    },
    endTimer : function(test_name){
        var end = new Date().getTime();
        var dur = end - this.start;
        gs.log('Test '+this.size+' '+ test_name + ': '+ dur,this.type);
        var grResults = new GlideRecord('u_deletetesterresults');
        grResults.initialize();
        grResults.u_record_count = this.size;
        grResults.u_test_name = test_name;
        grResults.u_source = this.source;
        grResults.u_execution_ms = dur;
        grResults.u_execution_time.setDateNumericValue(dur);
        grResults.insert();
    },

Further areas of research to consider… If there is enough interest I may do a follow up post.

  • Historically ServiceNow has recommended the Table Cleaner (sys_auto_flush) for mass deletions. The deletion times on that could be slightly more optimized.

  • Audited & Update sync table deletions could be much slower and heavier. It would be interesting to see the measured impact to deletions by adding these options.

  • Number of columns and column types on the table. I did mine with the base system fields as well as a string description field with size 80. If you added 2 string fields with size 4k, would it make a measurable impact to deletions? Also does null data delete faster vs filled columns or is it treated the same?

One final thought… The most optimal I measured for large data deletions was about 152 deletes/ms, or about 152,000 deletes/s on my ‘clean’ instance and simple table without filters. I would recommend running some deletion baselines in your instance to see what you can expect out of performance.

Scoped Apps, the good, bad and ugly

I’ve talked with many developers and scoped app development adoption isn’t very high, and it’s pretty understandable since there usually isn’t a good reason to switch for smaller deployments.

I’ve been developing in Scoped Apps for a year straight now, and help support over 250 scoped apps deployed. I feel like I’ve seen just about everything I’ve needed to see.

Note: At the time of writing this I am on a London Patch 7 instance.

The Good

  1. It scales really well. If you have a lot of developers, there is no way giving all of them admin is practical, so scoped apps is a good cop out. You can just keep making new scoped apps as new ideas and developers come up.

  2. It’s pretty encapsulated. If I want my Script include to not be called by other scopes, I can restrict it. It’s pretty hard to really mess up the system by developing in a scoped app. This makes instance administrators and users happy.

  3. Studio is the best thing since sliced bread. Code search, app file list, and freakin tabs! I don’t think I want to go back to global development after using this. It’s as close to an IDE as we are probably going to get in ServiceNow, and I like the direction.

The Bad

  1. Scoped Apps don’t track deletes. This is by design from ServiceNow, so store apps can’t delete other files in your instance. Thankfully you can work around this by doing scoped update sets for deletes, or by doing an uninstall and reinstall of the app. But it isn’t ideal at all.

  2. Scoped Server Side APIs suck. If I had a nickel for every time I copied some global code and tried to run it in a scoped app and it said ‘Method does not exist’ or ‘Scoped Access Error’ I wouldn’t still be working on ServiceNow. Thankfully there are again workarounds exposing a global script include, and calling REST APIs to cross scope. I’m talking about APIs like GlideEncrypter used to decrypt credentials.

  3. That Application Manager interface is crazy buggy. I’m talking about that page you use to install and update scoped apps (and store apps). The search works after the 3rd or 4th time, and the refresh and loading is broken. It was at least bearable pre-London, but has gotten worse with the overhaul. Maybe it’s less buggy if you only have like 5 apps.

  4. The Manage Developers option needs a scroll bar…. and support for search people with the same name. Just in general the implementation of the manage developers interface is bad. If you want to add more than 8 people, or like someone who’s name is John Smith, you can’t pick the right one (without modifying that user’s name to like John Smith1, and reloading).

  5. The Create Application File is not complete. There are certain types that just don’t show up under the application file create option, and other things that don’t even show up in the app list as files in your app. This is pretty annoying whenever they come up. Thankfully they at least got most of the big things.

The Ugly

  1. For Scoped Catalog Item workflows Uninstall/Reinstall (to fix tracked deletes) breaks all active contexts. And once the old contexts are deleted there is basically no good way to get them back due to the complex workflow relationships. Except doing a full instance back-up.

  2. Scoped Restrictions isn’t really built out. You can’t lock down a table to not have like a business rule created on it from another scope. You can’t select what scopes to white-list and other ones to block. This makes security really difficult because it is all or nothing.

  3. ServiceNow doesn’t treat delegated developers as developers. They took away the ability to make update sets with the London release from delegated developers. Additionally Scoped App developers can’t actually update ACLs even the ones in their scope. When we raised a HI ticket and asked, the answer was that they needed security_admin. Yeah that isn’t gonna fly. Scoped App developers should effectively be admins in their app.