Power BI - DAX - An Introduction | Graeme Elliott

The ability to use DAX is an important skill to have for building great Power BI models. It stands for Data Analysis Expressions, and it's the key to the analytical capabilities available in Power BI. You use it to add new columns to your tables inside the Model editor and not in the Query editor (where we used the language known as M). You could also use DAX to define measures, which are the numeric values representing sums, counts, averages, etc. These are the values that show up as columns or bars, or lines in your charts and are vital to insightful visualisations.

If you have experience using Excel functions, you might think you see some familiar functions when you start to learn DAX. Still, there are definitely some differences, even though they look similar. The primary difference that you need to understand is that Excel functions operate on individual cells. Cells contain values, and you can reference them by using a variable for the cell representing its coordinates. You can also operate on ranges of cells where you specify the beginning and endpoints of the range.

DAX functions, on the other hand, operate exclusively on tables or columns, and that's it. So, for example, the COUNTROWS function could look at the entire table, or DAX can operate on columns.

The big learning curve with DAX is learning to think about applying operations to just tables or just columns; there is no access to individual cells.

DAX has two areas where it really shines as a language. First is with aggregations. This is when you want to take thousands or millions of rows of data and mush them together in a single value, such as sales growth from last year or total quantity sold by product. Second, it's really good at filtering. Often you'll need to look at data by company, region, time and division. The DAX engine can do this lightning-fast because of how the data's stored.

There are a number of scenarios where DAX doesn't make sense. The first of which is operational reporting. These reports are detail heavy and used to support day-to-day operations. This is the opposite of analytical reporting. DAX isn't designed for listing-out line items on a sales order or providing a list of timesheet entries; it's designed for aggregating data, for combining it into summaries, not detailed lists that have to be pixel-perfect.

The second area where DAX struggles is wide tables. The more columns you add to a table, the worse the performance gets, so if you've got a table with 50 columns, DAX will have much worse compression and performance than normal. The last area is many-to-many relationships. DAX is optimised for when you have a few transaction tables and some dimension tables surrounding those. DAX doesn't directly support many-to-many relationships between tables. There are ways around this, either by changing your formulas or by using bidirectional filtering, but it can be a pain, or it can lead to results that you're not expecting.

Measures vs Calculated Columns

Measures are created by using DAX to calculate and summarise values for your visualisations. Measures that use the SUM function are very commonly used in Power BI, and you can get this value just by adding revenue to a table or chart, but the benefit of having a dedicated measure is that you can now reference these sums in other calculations, like for Gross Profit where you reference the measures by name and then use an arithmetic operator. And you can use other DAX functions, such as division, to get the Gross Profit %, which divides the Gross Profit by Revenue. Whenever you have division, it's usually a good idea to test the value of the divisor first to make sure that it's not blank because you could wind up with a divide-by-zero error. Always separate calculations into separate measures as this leads to expressions being easier to read and easier to troubleshoot.

Another thing to know is that measures differ from calculated columns in that they're only calculated when you include them in a visualisation, such as a chart or a map. And although a measure can belong to a table, you're not required to list the table it's found in when you use it in a calculation because measure names need to be unique across the model. A measure does not add to the memory required for your model because it's not calculating anything in advance for storage; it just calculates at query time when you're creating a visualisation or refreshing your report.

So how do we add meaning? Calculated columns essentially expands your data horizontally by adding a new column. That column is treated the same as all the other columns in your table. This method is intuitive and easy to get a handle on.

Measures summarises your data vertically, condensing columns into singular values. Measures are best thought of as a way of describing key performance metrics. While a calculated column allows you to combine a price and quantity to get an extended price, measures allow you to summarise and to get, say, average sales, or year-over-year growth for sales, for example.

When do we use each of these different methods? Calculated columns are, by nature, much more operational. They're much more like a Swiss Army Knife, where you're just trying to get something done. By default, they can only look within the row they're evaluating. So, their scope is often very small. It's difficult to tell someone much about the entire enterprise based on a single sales entry. Measures, in contrast, are often very analytical by nature. They're taking whole columns of data and condensing them into a single value. That value usually has some implied meaning. Think about the metric of year-over-year growth, for example. How much revenue did we make compared to last year? Well, if we were to show a company grew 10% over last year, that would involve looking at two entire years of revenue. It would also have an implied meaning along with it because, for a stable company, 10% annual growth is pretty good. But for a startup, 10% growth could be a death knell.

Using DAX to bypass ambiguity around relationships

If you try to make a relationship active but receive the below error message:

this means you cannot simultaneously link one query to two fact tables. To get around this, the following 'USERELATIONSHIP' formula can be used to make the relationship:

USERELATIONSHIP('Movie Values'[Release Date], 'Date'[Date]))