Categories
SQL

KQL: A SQL Developer’s First Look

Someone drops a KQL query in your lap and says “you know SQL right?”. Honestly, it’s different enough to trip you up, but similar enough that you’ll pick it up quickly.

In this post I want to touch on what KQL is, why it matters, and show how familiar (or different) it is compared to our beloved SQL.

What is KQL

KQL is Kusto Query Language, and is used to query data – much like SQL. It looks like this:

AppEvents
| where Name == "DemoApiRequest"
| take 10

If you’re familiar with databases and SQL you’ll understand what’s going on here, albeit in a different way. You’ll also spot a similarity to PowerShell syntax with the pipe passing data from one operator to the next.

KQL is used across Microsoft products, typically related to logs, metrics, or telemetry. The language also supports complex searching and analytics which is key for this type of large-volume event-like data. You’ll see KQL in areas such as Log Analytics, or Real-Time Analytics in Fabric.

As a SQL Developer, KQL is a different way to query data, but it feels familiar. It can handle structured and unstructured data, but is more often used with semi-structured data such as the examples above.

SQL vs KQL

KQL is similar to SQL but differs in terms of order and keywords.

As mentioned above, statements are joined together via Pipes (|) to form a chain of options, rather than a single ‘statement’ in SQL. Also, whilst there are overlaps in the features, different terminology is used, for example:

SQLKQL
SELECTproject
WHEREwhere
GROUP BYsummarize
ORDER BYorder by
TOPtake

One language-specific point to be aware of is that case sensitivity matters more in KQL. That includes tables, columns, operators, and strings.

Let’s look at some examples to see how this plays out:

Simple example

In SQL we could review the latest entries like this:

SELECT TOP 100 *
FROM AppEvents
WHERE Name = 'DemoApiRequest'
ORDER BY TimeGenerated DESC

Which in KQL becomes:

AppEvents
| where Name == "DemoApiRequest"
| order by TimeGenerated desc 
| take 100

The order of operators let us read the logic: ‘with the data from AppEvents, apply a filter based on Name, sort the results, and take the first 100’. I find it more readable than SQL, and has a similar order of operations to LINQ if you’re familiar with that from .Net.

We also see another nuance that = becomes ==. This is the default case sensitive match, so records where Name is set to ‘demoapirequest’ would be returned from the (usually case-insensitive) SQL, but not KQL. For a case insensitive match in KQL we’d need to use the =~operator. This can (and will) catch you out.

Diving deeper

If we did simple aggregation with SQL it might be:

SELECT Name, Entries = COUNT(1)
FROM AppEvents
GROUP BY Name

But in KQL it could look like:

AppEvents
| summarize
    Entries = count()
    by Name

Again, similar to the SQL.

Where it gets a little more interesting is if we have semi-structured data, such as a JSON object in a column. For event data, storing details in JSON can be especially helpful as different applications, APIs, or tools may have different properties to expose.

When using a JSON object in KQL, elements inside the JSON object aren’t defined by the schema so they must be converted to their appropriate data type. Like so:

AppEvents
| summarize
    AvgDurationMs = avg(todouble(Measurements.DurationMs))
    by tostring(Properties.ServiceName)

Here I have Measurements and Properties which are columns containing JSON objects, and DurationMs and ServiceName are elements within those objects.

The pattern is similar to SQL, but the method of accessing the JSON elements is much easier with the use of dot notation.

Try it yourself

If you want to get rolling with this yourself, you can get moving with a couple of Azure resources and some sample data loaded.

A simple option is to create a Log Analytics workspace, and then an Application Insights instance connected to it. With that done, ask your favourite assistant to generate a sample payload and script to ingest.

Here’s a gist containing the PowerShell and sample object I used for ingestion if you’d like to feed those to your assistant as a reference.

Once you’ve got some sample data loaded, jump into your Log Analytics workspace and open Logs. From here, grab any of the queries above if you’re using the payload format, and have a play. The query designer has auto complete available too which helps when finding your feet.

If you’d like to take more of the analytics tilt with this, hit the Share button and you can export the results to a Power BI dataset or M query.

Wrap up

Here we’ve looked at the Kusto Query Language (KQL) and the similarities and differences to SQL. Whilst SQL is traditionally for structured, transactional, historical data; KQL excels with semi-structured, event-based, real-time data.

With KQL being primarily used for telemetry and event data, I’ve found performance to be fantastic even with modestly large datasets. It’s clearly much more pared down in its features compared to SQL, but its functionality and efficiency are well matched for its typical use cases.

The difference between SQL and KQL isn’t quite as dramatic as pop and K-pop. So maybe try out KQL for yourself and see what you think.

Leave a comment