Deleting Fast and Safe in ServiceNow

One of my first articles, Deleting Fast in ServiceNow, is my most popular and controversial, and for good reason. This is the last of my series on following up on my most popular articles, at least for now!

In summary of my prior article, I evaluated different delete options in ServiceNow to evaluate which was the fastest to delete records of the APIs available. I found that GlideRecord deleteMultiple running from a workflow had the best execution time overall.

The Controversy

I slammed the GlideRecord deleteRecord method pretty hard, since it was over 1,000 times slower, but I didn’t really unpack the need to sometimes not fire business rules, notifications, and workflows while deleting. The deleteMultiple option does trigger business rules by default, and all the above, however, the method setWorkflow(false) does actually work with deleteMultiple as well!

That being said, it’s typically safer to disable any OnDelete notifications, business rules and then run deleteMultiple. You may also want to consider turning off audit delete as well beforehand, otherwise you’ll have to clean up the audit table records with deleteMultiple again (unless you want the safety net).

The method deleteRecord still has it’s place when you want to delete a single, or less than a handful of records, and in some ways it can be a bit safer since it is slower to delete.

Best Practice

Before doing any mass deleting, I would strongly recommend to read ServiceNow’s KB0717791 on Mass Deletion recommendations. There is also a good resource, called Safety Tips for writing Background Scripts, which covers a lot of the common mistakes people make while doing things like deleting. If you are deleting medium to small datasets, it actually isn’t a bad idea to run it as a background script, since a rollback context is generated, which allows you to restore the data.

Exploring More

As promised I looked into some additional factors that could have a play with deleting. Namely Auditing, big fields with data, and a baseline deletion for reference.

delete speed by data type with data series of large, audit and baseline

A comparison that shows delete speed by different types of records.

Auditing Impact

For testing auditing, I had 3 small string fields similar to the baseline and just enabled table level auditing. Unsuspectingly, turning on auditing on a table drastically reduces the delete operation speed, as it has to check cascade reference rules, back up a copy of the record onto the audit table, etc. It’s almost surprising though how this impact is very linear. The deletion time is increased .03s per record processed. Goes back to show how important it is to minimize auditing unless it is absolutely necessary.

Big Fields Impact

For testing big fields, I added 3 large 4000 character limit string fields, and populated them with random data. The impact is noticeable, taking 150s longer to delete 200k records than the baseline, but overall, the linear rate increase is .0008s per record processed. From my research, this seems to boil down to the buffer pool size, which the data is cached in case of an undo while it is being deleted.

Baseline Comparison

My baseline table only had 3 small string fields, with no auditing. It took <1s to delete 10k records, and less than 10s to delete 200k records - which leaves the base speed for record deletes to happen around 1 record every 0.00005s. Mind blazingly fast! So if you want to reduce delete speed, it has more to do with the data size and options (auditing) than the count.

Advanced Deleting - Time Slice Method

Wanted to throw in a strong mention about how some tables in ServiceNow, namely sys_audit which are notorious for being large and sharded have to be handled special when it comes to deleting (and other DB operations). There is a known technique where you would step day by day, and sometimes hour by hour to delete all the records within that timeframe. This method takes advantage if the data is indexed by time/created on, and sharded/broken up by time. This way you are strategically retrieving and accessing data in sequence, and removing it surgically. I could probably write a full article on the algorithm - feel free to comment if interested!

Parting thoughts

It’s good to be curious, and see how far we can push the needle. I wanted to leave with the fact that there are even more aspects to explore.

  • Indexing - Typically after data is deleted, the indexing data is not automatically re-processed. This sometimes can lead to the index portion being bigger than the actual table.

  • Table growth monitoring is a good practice. There is a self service catalog item in ServiceNow Support site to pull the top 20 or more tables on your instance. This is a good thing to check regularly. There is also a technology ServiceNow might be releasing more widespread in the future called Instance Observer which has some capabilities for table growth monitoring.

  • MariaDB explains the complexities of big deletes on this KB page, https://mariadb.com/kb/en/big-deletes/. While I think this is really good info, a lot of it does boil down to what options and decisions ServiceNow made in terms of their database configuration, to really let you optimize deletes to the max. Some answers we may never know (unless you work for ServiceNow).

4 Ways to Publish a ServiceNow Application

Did you know there is more than one way to publish a scoped application? There are at least 4 methods that I know of, so let’s unpack them!

As a quick re-cap, scoped applications are ways to bundle configurations to ServiceNow, into a protected application scope. They can then be deployed to other instances outside of the one they are developed in via a Publish to create a new version and then a deploy.

#1 Native Studio UI

The most common way that everyone does it - through the native UI in studio.

#2 Using a Script

The undocumented mechanism to install a scoped app would be through a script.

publish: function(appID,version,devNotes){
	var trackerID;
	try { //run install command
		var progress_name = "Upload to the App Repository";
		var worker = new GlideScriptedHierarchicalWorker();
		worker.setProgressName(progress_name);
		worker.setBackground(true);
		worker.setCannotCancel(true);
		worker.setScriptIncludeName("sn_appauthor.ScopedAppUploaderAJAX");
		worker.setScriptIncludeMethod("start");
		var g_req = new GlobalRequest();
		g_req.setPreference('sysparm_sys_id',appID);
		g_req.setPreference('sysparm_version',version);
		g_req.setPreference('sysparm_dev_notes',devNotes);
		g_req.setPreference('sysparm_username', gs.getUserName());
		g_req.setPreference('sysparm_password', '');
		g_req.setPreference('sysparm_publish_to_store', 'false');
		g_req.setPreference('sysparm_target_upload_URL', '');
		worker.putConstructorArg('g_request',g_req);
			
		worker.start();
		trackerID = worker.getProgressID();
	} catch (err) {
		gs.error("Encountered error installing scoped app " + appID + ".\nError: " + err.message, this.type);
	}
	return trackerID;
},
var GlobalRequest = Class.create();
GlobalRequest.prototype = {
    initialize: function() {
		this.parms = {};
    },
	setPreference: function(key, value){
		this.parms[key] = value;
	},
	getParameter: function(key){
		return this.parms[key];
	},
    type: 'GlobalRequest'
};

#3 Update Sets

While it is officially documented, I feel like many people don’t know about this feature. You can publish a scoped app to an update set, which captures all the changes. Also a key benefit to using this method is that whichever instance you install the update set into, you can then open the app up in studio there.

#4 Using the CI/CD Spoke / API

For several releases now, ServiceNow offers a spoke and API for being able to publish an application.
Spoke Actions

  • Publish Application With ID

  • Publish Application With Scope

API: POST /sn_cicd/app_repo/publish

Publishes the specified application and all of its artifacts to the application repository.

Faster API than GlideRecord?

What if I told you there was an even more optimized way too get data from a ServiceNow table on the server side besides GlideRecord?


This idea stemmed from my research on how the Table REST API allows you to just select certain fields from the table, and that optimizing your query significantly.

There is a similar trick using GlideAggregate, and just adding each field individually, like you would for counting. Props to Tony DiRienzo for posting this on the Community.

var ga = new GlideAggregate('sys_user'); ga.groupBy('user_name'); ga.groupBy('sys_id'); // This makes sure we get all records, even if there are duplicates ga.query(); while (ga.next()) { // Do what you need to with the user_name field var userName = ga.getValue('user_name'); }

In many cases this type of query runs up to 5 times faster, and works better for fewer variables. To expand it, all you need to do is add another groupBy() for each field you need included.

Only caveat here is that you lose the performance gain if you dot-walk in the results.

This is just another great tool to add to a developers toolkit to build ServiceNow applications.

Monitoring Series: Research into ServiceNow Performance Dashboard

Below are some of my ramblings, thoughts and research into the ‘servlet performance metrics’ dashboard from ServiceNow, and how this custom UI page homepage really works.

servlet performance.PNG

Observation #1 - The Front End is in the Perf Dashboard Plugin

This dashboard is part of the core system plugin, ‘Performance Dashboards’ (com.glide.performance_dashboards'), which is an extremely lightweight plugin for just housing the shell UI and a couple supporting scripts.

Observation #2 - The Library used is JRobin (Derived from RRDtool)

Within the scripts you can see data references to tables starting with jrobin, and those point to the JRobin Plugin (RRDS), which is just a Java implementation of the RRDtool system. They even kept the Robin Robin heritage by letting the table labels start with ‘Round Robin’. It’s worth noting that you can’t see any of these jrobin tables OOB, they are locked down to maint, I had to go into each individual read ACL, and add roles to be able to view it.

rrdtool.png

Observation #3 - ServiceNow uses an RRDTool Database to Store Monitoring data

This leads us to a another discovery, that all this data is being parsed from an Round Robin Database (RRD), and then there are supporting tables in ServiceNow which define the data refresh intervals (spoiler, they all refresh every 2min), and information about the Round Robin Archive. I found a good introduction to RRD here.

Observation #4 - ServiceNow Undocumented Monitoring APIs

While snooping around in the client side javascript, I found reference to the APIs which are called to provide the data. The sys_id’s needed to call these APIs are the in jrobin tables, and there is also other client side parameters.

p = "/api/now/v1/performance_dashboards/data_categories",
f = "/api/now/v1/performance_dashboards/graphs/overlays",
m = "/api/now/v1/performance_dashboards/graphs/category/<dataCategoryId>",
g = "/api/now/v1/performance_dashboards/graphs/<graphSysId>",
y = "/api/now/v1/performance_dashboards/events",
v = "/api/now/v1/performance_dashboards/nodes",
_ = "api/now/v1/performance_dashboards/suggestions"

Future Observations…

I would like to look more into the list of jrobin_graph_line’s and understand how the aggregator relates to the data source (jrobin_datasource).

I want to do some testing and see what format and parameters are needed to use those Performance Dashboards APIs.

I find it interesting how rrd4j appears to be the more widely adopted java port of RRDTool, vs jrobin. I could see ServiceNow eventually move to this, if they don’t discontinue or entirely re-structure their db monitoring backend. But ServiceNow has stuck with JRobin since 2006, so I find it doubtful it would change any time soon.

Deleting Fast in ServiceNow

Another post dedicated to the die hard ServiceNow developers. Have you ever wondered how fast (or slow) the deletion methods that ServiceNow employs are? I’m talking about GlideRecord deleteRecord, GlideRecord deleteMultiple and GlideMultipleDelete execute (undocumented). Today we’re going to rank all the methods, and put some numbers behind the mystery.

It’s no surprise that deleteRecord does the worst due to the looping overhead. It’s surprising that deleteMultiple comes out on top of execute by a small margin of 74ms.

It’s no surprise that deleteRecord does the worst due to the looping overhead. It’s surprising that deleteMultiple comes out on top of execute by a small margin of 74ms.


Due to personal experience, I’ve noticed that some sources are prioritized in terms of deletes, so again here are numbers but with different sources overlaid on them. I ran tests with 1k, 10k, 50k, 100k, and 200k to get a fair representation.

This is the summation of about 350k record deletes, and what method handled the varied delete batches.

This is the summation of about 350k record deletes, and what method handled the varied delete batches.

In my personal experience again, I had a gut feeling that the workflow activities are somehow optimized for a slightly less deletion overhead - and script actions as being relatively slow.

Delete Multiple - The fastest method by order of magnitude of 10

GlideMultipleDelete execute - The second fastest method (10x slower than DeleteMutiple, but very fast)

Delete Record - There is absolutely never a good reason to use the method ever. It’s 1,270x slower than Delete Multiple. Unless you just want to waste everyone’s time. We’re talking 1s versus 24min difference for 200k deletes.

Note: The numbers shown were all run on a personal developer instance with no other processes running. Additionally I was running these delete methods with basically no filters. Filters will extremely slow down your deletion rate as well as how big the table is you’re operating on. Your numbers could vary based on node count, jobs running, active users, and the database server load and capacity as well. They could also vary depending on what else is running in your system (ie if your event log is overflowing, script actions will be slower).

Want to validate my results? Here is some sample code you can adapt.

    deleteUsingGRDelete : function(){
        this.startTimer();
        var grTestData = new GlideRecord(this.tbl);
        grTestData.query();
        while(grTestData.next()){
            grTestData.deleteRecord();
        }
        this.endTimer('GlideRecord deleteRecord');
    },
    deleteUsingGRDeleteMultiple : function(){
        this.startTimer();
        var grTestData = new GlideRecord(this.tbl);
        grTestData.query();
        grTestData.deleteMultiple();
        this.endTimer('GlideRecord deleteMultiple');
    },
    deleteUsingGlideMultiple : function(){
        this.startTimer();
         var md = new GlideMultipleDelete(this.tbl);
         md.execute();
        this.endTimer('GlideMultipleDelete execute');
    },
    startTimer : function(){
        this.start = new Date().getTime();
    },
    endTimer : function(test_name){
        var end = new Date().getTime();
        var dur = end - this.start;
        gs.log('Test '+this.size+' '+ test_name + ': '+ dur,this.type);
        var grResults = new GlideRecord('u_deletetesterresults');
        grResults.initialize();
        grResults.u_record_count = this.size;
        grResults.u_test_name = test_name;
        grResults.u_source = this.source;
        grResults.u_execution_ms = dur;
        grResults.u_execution_time.setDateNumericValue(dur);
        grResults.insert();
    },

Further areas of research to consider… If there is enough interest I may do a follow up post.

  • Historically ServiceNow has recommended the Table Cleaner (sys_auto_flush) for mass deletions. The deletion times on that could be slightly more optimized.

  • Audited & Update sync table deletions could be much slower and heavier. It would be interesting to see the measured impact to deletions by adding these options.

  • Number of columns and column types on the table. I did mine with the base system fields as well as a string description field with size 80. If you added 2 string fields with size 4k, would it make a measurable impact to deletions? Also does null data delete faster vs filled columns or is it treated the same?

One final thought… The most optimal I measured for large data deletions was about 152 deletes/ms, or about 152,000 deletes/s on my ‘clean’ instance and simple table without filters. I would recommend running some deletion baselines in your instance to see what you can expect out of performance.