Monitoring Series: Database Size Report

September 20, 2021 by Tyler Teter in performance

ServiceNow instance performance is a cause of concern and notice for any ServiceNow admin. To get ahead of the curve, it is a good idea to keep an eye on your instance database size.

Database size, the sum of all the tables and table index data stored in the instance database server. It’s easy to ask ServiceNow directly or fill out their self service form on the Support page, but even easier to just make a report and put in on a dashboard yourself!

Step by Step

First go to Reports > Create New
Set a name like “Current Database Size (MB)”, and select the table UA Instance Info [ua_instance_state].
Select type as Speedometer (you can also do single score, but it’s nice to set ranges!)
On the configure tab, select Aggregation as Sum and field as “Primary database size (MB)”
On the style tab, you can set the direction as Minimize, and set reasonable ranges you think your database size should fall under.
Finally, you can go and add it to a dashboard.

The finished result should look something like this:

This report is an excellent addition to any system administrator’s dashboard.

Impersonate and Elevate via Server Script

September 13, 2021 by Tyler Teter in undocumented, security

Back to more incredibly useful undocumented APIs in ServiceNow… being able to impersonate a user and elevate to security admin to perform restricted actions via a script.

How To

Impersonate

gs.getSession().impersonate('sys_id_of_user');

Elevate

GlideSecurityManager.get().enableElevatedRole('security_admin');

Impersonate back

The trick here is to store the original user id before you impersonate if you want to impersonate back

var originalID = gs.getUserID();

gs.getSession().impersonate('sys_id_of_user');

gs.getSession().impersonate(originalID); //impersonate back

Example Use Cases

There are quite a number of reasons you might want to impersonate or elevate.

Impersonate

Approve on behalf of another user
Activity log, comments, and work notes showing as an automation user

Elevate

Grant admin roles to user accounts automatically via a script
Grant yourself snc_read_only
Modify ACLs via a script

PS. Impersonations are tracked in the system logs, in case someone may be up to something, always good to check there.

Workflow Tips: Start with a Timer Activity

September 07, 2021 by Tyler Teter in best practices, quality of life

It’s beginning to be a well known tip, but I’m surprised at how many people still don’t know about this simple trick!

The trick is to include a 1 second timer activity at the beginning of your workflow, or anywhere after a user interaction can happen (approvals, close task, etc).

The way this works is that workflows are started typically based on a user interaction, and then the first activity runs as the user in the user’s session. Depending on the script executed, this can run long on the user session causing a bad experience where the user has to wait on the activity to finish executing. By adding the one second timer it instead breaks off from the user session and queues the workflow to run in the background as a scheduled job.

Besides running in the background, another key benefit is that the scheduled job is run by a system user. This can also sometimes resolve some access errors, and for updates that occur, it will be written to the activity log as the System Administrator instead of the actual user who kicked off the workflow activity.

On the flip side, it’s also considered bad practice to have to many timer activities, and activities in the workflow in general, so it’s good to use this tip in moderation!

ServiceNow KB covering the topic: https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0647534

Faster API than GlideRecord?

July 05, 2021 by Tyler Teter in research

What if I told you there was an even more optimized way too get data from a ServiceNow table on the server side besides GlideRecord?

This idea stemmed from my research on how the Table REST API allows you to just select certain fields from the table, and that optimizing your query significantly.

There is a similar trick using GlideAggregate, and just adding each field individually, like you would for counting. Props to Tony DiRienzo for posting this on the Community.

var ga = new GlideAggregate('sys_user');
ga.groupBy('user_name');
ga.groupBy('sys_id'); // This makes sure we get all records, even if there are duplicates
ga.query();

while (ga.next()) {
  // Do what you need to with the user_name field
  var userName = ga.getValue('user_name');
}

In many cases this type of query runs up to 5 times faster, and works better for fewer variables. To expand it, all you need to do is add another groupBy() for each field you need included.

Only caveat here is that you lose the performance gain if you dot-walk in the results.

This is just another great tool to add to a developers toolkit to build ServiceNow applications.

Why Deleting User Records is Slow

February 24, 2021 by Tyler Teter

If you’ve ever spent any time as a system administrator with ServiceNow, you’ve probably had to delete a User record, either as demo data, created in mistake, or to clean up the data in general.

The Problem

Deleting data on the sys_user table takes a really long time, to the tune of minutes per user.

The Why

This is because of how many table column references there are to the user table, and how with cascade delete rules, that every one of those columns and related records must be queried for each user deleted.

The Fix (or closest to it…)

The easiest first step is to run use the setWorkflow(false) function when deleting records. This stops any notifications or business rules from firing.

Next, delete a single user using a background script and look at the detailed output. If there are messages like:

Could not find field u_userref in table u_example during cascade delete

Locate the sys_dictionary record for that column and delete it. Sometimes there are leftover user column references when the column is deleted, but the record still remains.

Next look at all sys_dictionary user references, and review the Cascade Delete rule settings for those columns. Consider changing some of the columns to None if maintaining the reference is not important. More information on Cascade Delete rules can be found here: https://docs.servicenow.com/bundle/paris-platform-administration/page/administer/field-administration/task/t_CascadeDeleteRules.html

Once all those fixes have been applied, you can see a drastic improvement in the speed of the deletes. Even cutting the time in half!

Extra credit: ServiceNow provides an SQL Debugger tool which can help you identify where the slowest parts of the DB operation are, including Cascade Delete operations.

Best Practices: MID Server Administration

January 03, 2021 by Tyler Teter in best practices

Below are some of my personal thoughts and advice on MID Server configuration, set-up and maintenance. Some of these are pretty well known and adopted, and some are based on my personal experiences.

Basics
a. Sizing / Server Specs
Specification wise, RAM/Memory and CPU Core Count/(and associated threads) are the most important. Usually 8GB RAM, 4 Core CPU is recommended with 40+GB disk and 64bit OS which is a requirement being enforced going forward. Within the MID server parameters there are options to use more RAM, Threads, etc. Generally specifying to use more RAM is recommended, and using more threads, results can vary.
b. Network (Bandwidth and Location)
This point is the key reason that MID servers are needed in the first place. They are a necessary appliance so that you can securely interact with other resources and devices on a network. It’s important to have fast connection speed (usually 100mb/s upload is recommended), and put the MID server physically close to the devices it is interacting with, for quick and optimized interactions. If you have multiple data centers, the servers should be spread out in those data centers, and put ones on similar subnets as the majority of devices.
Segregate by Purpose
a. History
The biggest flaw I’ve always seen with enterprise MID server set ups, is that there is no separation between Discovery and Orchestration usage. Many Discovery credentials are server admin or granted powerful rights to log into machines and run discovery commands. There are a couple ways you can force orchestration tasks to use specific credentials, using credential tags, and hardcoding MID server IP relationships with the target Orchestration device. However you can’t lock down the Discovery credentials to only be used for discovery, which can be a major security issue.
b. Resolution
The only guaranteed mechanism for locking down those credentials is to lock them down to only be allowed to use on specific MID servers. Additionally on those MID servers you have the options to not allow Orchestration application. Thus it’s critical to have separate MID servers for Discovery, Orchestration, JDBC and other activities to enforce proper credential usage.
Security
Review the MID Server Hardening Guide.
There are a number of overlooked recommendations, for setting specific security parameters, like disabling SSL, TLS 1.0 and 1.1, setting the DH group value and encrypting proxy passwords (if applicable).
Closing Thoughts
Besides just initial setup, every administrator knows that you have to keep current with upgrades, restarts, and rekeying the credentials. Besides those standard activities to take it a step further, you can set up proactive monitoring, such as using the built in MID Server resource threshold alerts, or advanced tools like PerfMon, Microsoft SCOM or Datadog.

Hope everyone learned something! I’ve been absent, but hopefully we can start off 2021 right and have a lot of content. Please comment your ideas or anything you want to see!

Monitoring Series: Research into ServiceNow Performance Dashboard

June 12, 2020 by Tyler Teter in undocumented, research

Below are some of my ramblings, thoughts and research into the ‘servlet performance metrics’ dashboard from ServiceNow, and how this custom UI page homepage really works.

Observation #1 - The Front End is in the Perf Dashboard Plugin

This dashboard is part of the core system plugin, ‘Performance Dashboards’ (com.glide.performance_dashboards'), which is an extremely lightweight plugin for just housing the shell UI and a couple supporting scripts.

Observation #2 - The Library used is JRobin (Derived from RRDtool)

Within the scripts you can see data references to tables starting with jrobin, and those point to the JRobin Plugin (RRDS), which is just a Java implementation of the RRDtool system. They even kept the Robin Robin heritage by letting the table labels start with ‘Round Robin’. It’s worth noting that you can’t see any of these jrobin tables OOB, they are locked down to maint, I had to go into each individual read ACL, and add roles to be able to view it.

Observation #3 - ServiceNow uses an RRDTool Database to Store Monitoring data

This leads us to a another discovery, that all this data is being parsed from an Round Robin Database (RRD), and then there are supporting tables in ServiceNow which define the data refresh intervals (spoiler, they all refresh every 2min), and information about the Round Robin Archive. I found a good introduction to RRD here.

Observation #4 - ServiceNow Undocumented Monitoring APIs

While snooping around in the client side javascript, I found reference to the APIs which are called to provide the data. The sys_id’s needed to call these APIs are the in jrobin tables, and there is also other client side parameters.

p = "/api/now/v1/performance_dashboards/data_categories",
f = "/api/now/v1/performance_dashboards/graphs/overlays",
m = "/api/now/v1/performance_dashboards/graphs/category/<dataCategoryId>",
g = "/api/now/v1/performance_dashboards/graphs/<graphSysId>",
y = "/api/now/v1/performance_dashboards/events",
v = "/api/now/v1/performance_dashboards/nodes",
_ = "api/now/v1/performance_dashboards/suggestions"

Future Observations…

I would like to look more into the list of jrobin_graph_line’s and understand how the aggregator relates to the data source (jrobin_datasource).

I want to do some testing and see what format and parameters are needed to use those Performance Dashboards APIs.

I find it interesting how rrd4j appears to be the more widely adopted java port of RRDTool, vs jrobin. I could see ServiceNow eventually move to this, if they don’t discontinue or entirely re-structure their db monitoring backend. But ServiceNow has stuck with JRobin since 2006, so I find it doubtful it would change any time soon.

ServiceNow Cowsay Logger

January 10, 2020 by Tyler Teter

My second programming teacher (for C and Unix) enlightened me about the fantastic cowsay - so I honor him today as tribute.

If you aren’t familiar with cowsay it’s merely a little program for displaying text with ASCII pictures of cows. Now you’re wondering how this relates to ServiceNow…. and it doesn’t. Besides that I think it’s cool and no one else has ever done it before.

In the original cowsay code, there are options for special eyes, tongues, speech bubbles and even support for various ASCII images. There is some latent code exposed to support these options, but I haven’t exposed them with functions yet, so just simple cows for now!

Most people back in the day piped the fortune command to cowsay, to make things more amusing. I also found a neat little public API for the Fortune adage, and was able to adapt that as well. It took me a bit of tinkering to realize I needed to specify a ‘User-Agent’ header though.

Instructions: Download the source code file and copy/paste to a script include named ‘Cowsay’.

Source Code

Syntax usage:

var c = new Cowsay();
gs.print(c.say(c.fortune()));
//or
gs.print(c.say("ServiceSometimes"));

Deleting Fast in ServiceNow

August 26, 2019 by Tyler Teter in research

Another post dedicated to the die hard ServiceNow developers. Have you ever wondered how fast (or slow) the deletion methods that ServiceNow employs are? I’m talking about GlideRecord deleteRecord, GlideRecord deleteMultiple and GlideMultipleDelete execute (undocumented). Today we’re going to rank all the methods, and put some numbers behind the mystery.

It’s no surprise that deleteRecord does the worst due to the looping overhead. It’s surprising that deleteMultiple comes out on top of execute by a small margin of 74ms.

Due to personal experience, I’ve noticed that some sources are prioritized in terms of deletes, so again here are numbers but with different sources overlaid on them. I ran tests with 1k, 10k, 50k, 100k, and 200k to get a fair representation.

This is the summation of about 350k record deletes, and what method handled the varied delete batches.

In my personal experience again, I had a gut feeling that the workflow activities are somehow optimized for a slightly less deletion overhead - and script actions as being relatively slow.

Delete Multiple - The fastest method by order of magnitude of 10

GlideMultipleDelete execute - The second fastest method (10x slower than DeleteMutiple, but very fast)

Delete Record - There is absolutely never a good reason to use the method ever. It’s 1,270x slower than Delete Multiple. Unless you just want to waste everyone’s time. We’re talking 1s versus 24min difference for 200k deletes.

Note: The numbers shown were all run on a personal developer instance with no other processes running. Additionally I was running these delete methods with basically no filters. Filters will extremely slow down your deletion rate as well as how big the table is you’re operating on. Your numbers could vary based on node count, jobs running, active users, and the database server load and capacity as well. They could also vary depending on what else is running in your system (ie if your event log is overflowing, script actions will be slower).

Want to validate my results? Here is some sample code you can adapt.

    deleteUsingGRDelete : function(){
        this.startTimer();
        var grTestData = new GlideRecord(this.tbl);
        grTestData.query();
        while(grTestData.next()){
            grTestData.deleteRecord();
        }
        this.endTimer('GlideRecord deleteRecord');
    },
    deleteUsingGRDeleteMultiple : function(){
        this.startTimer();
        var grTestData = new GlideRecord(this.tbl);
        grTestData.query();
        grTestData.deleteMultiple();
        this.endTimer('GlideRecord deleteMultiple');
    },
    deleteUsingGlideMultiple : function(){
        this.startTimer();
         var md = new GlideMultipleDelete(this.tbl);
         md.execute();
        this.endTimer('GlideMultipleDelete execute');
    },
    startTimer : function(){
        this.start = new Date().getTime();
    },
    endTimer : function(test_name){
        var end = new Date().getTime();
        var dur = end - this.start;
        gs.log('Test '+this.size+' '+ test_name + ': '+ dur,this.type);
        var grResults = new GlideRecord('u_deletetesterresults');
        grResults.initialize();
        grResults.u_record_count = this.size;
        grResults.u_test_name = test_name;
        grResults.u_source = this.source;
        grResults.u_execution_ms = dur;
        grResults.u_execution_time.setDateNumericValue(dur);
        grResults.insert();
    },

Further areas of research to consider… If there is enough interest I may do a follow up post.

Historically ServiceNow has recommended the Table Cleaner (sys_auto_flush) for mass deletions. The deletion times on that could be slightly more optimized.
Audited & Update sync table deletions could be much slower and heavier. It would be interesting to see the measured impact to deletions by adding these options.
Number of columns and column types on the table. I did mine with the base system fields as well as a string description field with size 80. If you added 2 string fields with size 4k, would it make a measurable impact to deletions? Also does null data delete faster vs filled columns or is it treated the same?

One final thought… The most optimal I measured for large data deletions was about 152 deletes/ms, or about 152,000 deletes/s on my ‘clean’ instance and simple table without filters. I would recommend running some deletion baselines in your instance to see what you can expect out of performance.

Scoped Apps, the good, bad and ugly

July 07, 2019 by Tyler Teter in opinion

I’ve talked with many developers and scoped app development adoption isn’t very high, and it’s pretty understandable since there usually isn’t a good reason to switch for smaller deployments.

I’ve been developing in Scoped Apps for a year straight now, and help support over 250 scoped apps deployed. I feel like I’ve seen just about everything I’ve needed to see.

Note: At the time of writing this I am on a London Patch 7 instance.

The Good

It scales really well. If you have a lot of developers, there is no way giving all of them admin is practical, so scoped apps is a good cop out. You can just keep making new scoped apps as new ideas and developers come up.
It’s pretty encapsulated. If I want my Script include to not be called by other scopes, I can restrict it. It’s pretty hard to really mess up the system by developing in a scoped app. This makes instance administrators and users happy.
Studio is the best thing since sliced bread. Code search, app file list, and freakin tabs! I don’t think I want to go back to global development after using this. It’s as close to an IDE as we are probably going to get in ServiceNow, and I like the direction.

The Bad

Scoped Apps don’t track deletes. This is by design from ServiceNow, so store apps can’t delete other files in your instance. Thankfully you can work around this by doing scoped update sets for deletes, or by doing an uninstall and reinstall of the app. But it isn’t ideal at all.
Scoped Server Side APIs suck. If I had a nickel for every time I copied some global code and tried to run it in a scoped app and it said ‘Method does not exist’ or ‘Scoped Access Error’ I wouldn’t still be working on ServiceNow. Thankfully there are again workarounds exposing a global script include, and calling REST APIs to cross scope. I’m talking about APIs like GlideEncrypter used to decrypt credentials.
That Application Manager interface is crazy buggy. I’m talking about that page you use to install and update scoped apps (and store apps). The search works after the 3rd or 4th time, and the refresh and loading is broken. It was at least bearable pre-London, but has gotten worse with the overhaul. Maybe it’s less buggy if you only have like 5 apps.
The Manage Developers option needs a scroll bar…. and support for search people with the same name. Just in general the implementation of the manage developers interface is bad. If you want to add more than 8 people, or like someone who’s name is John Smith, you can’t pick the right one (without modifying that user’s name to like John Smith1, and reloading).
The Create Application File is not complete. There are certain types that just don’t show up under the application file create option, and other things that don’t even show up in the app list as files in your app. This is pretty annoying whenever they come up. Thankfully they at least got most of the big things.

The Ugly

For Scoped Catalog Item workflows Uninstall/Reinstall (to fix tracked deletes) breaks all active contexts. And once the old contexts are deleted there is basically no good way to get them back due to the complex workflow relationships. Except doing a full instance back-up.
Scoped Restrictions isn’t really built out. You can’t lock down a table to not have like a business rule created on it from another scope. You can’t select what scopes to white-list and other ones to block. This makes security really difficult because it is all or nothing.
ServiceNow doesn’t treat delegated developers as developers. They took away the ability to make update sets with the London release from delegated developers. Additionally Scoped App developers can’t actually update ACLs even the ones in their scope. When we raised a HI ticket and asked, the answer was that they needed security_admin. Yeah that isn’t gonna fly. Scoped App developers should effectively be admins in their app.