Get Notified When Critical Processes Stop Running Using Kolide + Osquery

Fritz Ifert-Miller
Attention Reader:
This article was originally published on January 24, 2019 and refers to Kolide Cloud which was formally retired in February of 2019. For posterity, this post is still available, but we encourage you to read our Device Trust announcement.

Osquery delivers unparalleled speed for endpoint monitoring. With a Live Query, you can ask thousands of devices simultaneously to report on their current state and get results back in realtime. However, no one has time to sit at a console running queries all day long.

Users need a method to persistently monitor specific items and collect that data over time. This requirement led to the creation of Collections. Collections allow users to repeatedly run queries and cache the results for devices over time.

The scheduled nature of these Collections means we can stay up to date on the current or last known state of a device, without manually running the same Live Query hundreds of times a day.

An example of a collection might be one that catalogs all of the macOS applications installed across your fleet. The query would look like:

SELECT * from apps;

The data output by this Collection is stored in two formats side by side, Results and Events.

Collections results

Results represent the most current data of devices matching the conditions of the Collection.

In this example, the Results tab would be comprised of any application that is currently installed on a mac. For this device there are 450 currently installed applications.

a screenshot of the applications on a device shown in the kolide web application

Collections events

Events represent the entire historical dataset of devices which have ever matched the conditions of the Collection.

So in our example, it would include all installed applications: observed OR observed and then later removed. Hence, the reason we have 10,429 events.

a screenshot of the events on a device shown in the kolide web application

You might think 10,429 historical applications sounds like a lot. This is because anytime a row of data is changed, it results in a removed and added event. To better understand this concept, we can take the example of an application that undergoes an update.

| name            | version | event   | time                     |
| Adobe Photoshop | 11.0.1  | added   | 2019-01-23T17:40:28.000Z |
| Adobe Photoshop | 10.9.8  | removed | 2019-01-23T17:40:28.000Z |
| Adobe Photoshop | 10.9.8  | added   | 2018-03-10T09:21:15.000Z |

After being updated, the version number of the application will change. Since version number is part of our result set, the original row of data for that application will no longer match the updated values. The old row will have a removed event and the newly collected data (with the updated version) number will be an added event.

Considering the likelihood and impact of churn

If you want to avoid producing unwanted churn of mostly duplicative events, you need to consider the schema of your queried table and what columns are subject to frequent change. For example, if you were to query the mounts table you would not want to SELECT * or include inodes or blocks columns, as they will be different nearly every time the mounts table is queried.

The most powerful feature of Collections is the Advanced Search, which allows you to search offline datasets and to build notification rules using boolean logic.

This type of searching allows you to drill down through your results set without necessitating multiple iterative Live Query attempts. For instance, using this we can check for Adobe Photoshop apps that haven’t been opened in a long time to facilitate software license retrieval.

a screenshot of the advanced search function

Using Advanced Search from the Events tab we can build conditional rulesets for Event Tags which can power external Notifications.

If we wanted to check that a critical process was always up and running (Apache webserver, etc.) we could create a Collection and Event Tag to check for the process’ absence. For our example, let’s create a Notification which alerts us anytime the Slack desktop app process is not running.

Crafting the proper query

You might be tempted to use the following query to monitor a process’ presence on a device:

SELECT * FROM processes

Unfortunately, because of the exceptionally dynamic state of processes on a device, this would create huge amounts of churn as we discussed above, and your dataset would easily reach millions of rows of data recording anytime a process started and stopped.

Finding the absence of data

We want to take a more targeted approach with the processes table. Likewise, because we are trying to chase the absence of data, we will need to utilize a query with NOT EXISTS logic:

SELECT * FROM system_info
        FROM processes
        WHERE  name = "Slack");

This query returns the system_info for any device which is found not to have the Slack process currently running.

Creating the collection

We can go ahead and create the collection in Kolide Cloud by navigating to:

a screenshot of a new collection creation in the kolide web app

After naming our new Collection and choosing which platforms it will run on, we can copy paste our query from above and create the new Collection. Collections get added to the osquery schedule and as a result, typically take several minutes to initially populate with data.

Once the collection is created we will go to the Events tab, open the Advanced Search and create a new Event Tag for any events that match:

_kolide_action is added

This way, we will know anytime a device is added to the Collection, meaning in our case, that the Slack process stopped running.

a screenshot that shows the search for the parameter "added"

Configuring notifications on top of collections

Now that we have created a new Event Tag we can navigate to the Notifications page of the app and create a New Slack Channel notification:

a screenshot showing slack notification configuration in the kolide web app

Once the Slack notification has been configured we will get notifications in the specified Slack channel anytime the Event Tag criteria is met. So if we were to kill our Slack app for testing purposes we should see a new Event produced and a subsequent notification:

That’s a Wrap!

We will now receive a notification every time a device is online but the Slack desktop app process is not active. This same logic could be applied to any other process or the installation of an application using a similar NOT EXISTS query. We’ve had customers use similar queries to:

  • Check that their desired antivirus app was installed

  • Check that Apache is running

  • Check for properly customized etc/hosts

Finding the absence of expected data is just as critical as detecting the presence of unwanted data.

If you haven’t explored the incredible utility of osquery yet on your own, Kolide offers the fastest and easiest route to get up and running.

If you’d like to read more osquery content like this, sign up for our biweekly newsletter.

Share this story:

More articles you
might enjoy:

Deep Dives
Buying Kolide vs Building Your Own Osquery Solution
Jason Meller
Inside Kolide
How to Run Osqueryi With Kolide Launcher Tables
Fritz Ifert-Miller
Deep Dives
Is Windows Defender Enough to Pass SOC 2?
Jason Meller
Watch a Demo
Watch a Demo