BACK TO BLOG

Exploring Techniques in Power BI

In this post, I am documenting a few data modelling techniques which I found useful. Some of them might be quite basic and some more intermediate, but they are a selected few that I have found to be useful in my experience with Power BI.

Common Transformations

Blog Post Image

Whenever you import data into Power BI, you often need to transform the data in some way. There are many options available in the Power Query Editor, but there's just a handful that you'll use regularly. One of the first things you should consider when working on your Power BI model is whether you need to rename your queries with more meaningful names, and that process will also update the table names in the model when you apply changes.

Next, consider whether you need to eliminate columns that you don't need. Although it's tempting to include every little bit of data in case someone needs it, too much data puts a drain on memory and adds clutter to your model. You can always add rows or columns back later if you need to. A similar consideration is filtering out rows because all the data in your model is being held in memory, and you want to ensure that you only include the data that you really need. The key idea here is that you should review your data and reduce your data whenever possible by using filters to reduce the amount of memory on your computer that's required to hold all the data, especially if your data source is quite large.

Now that you're down to the needed data consider renaming columns to make them more understandable to your users. And then next, check the data types of all your columns. Power BI does a good job of anticipating the correct data types for you, but it's not a perfect process. For Power BI to work correctly on numbers and dates, you want to be sure those columns are configured with the correct data types. And another common task is to clean up values in your data. For example, you might have characters in a text string that your users don't need, so you can replace those values with an empty string to improve the appearance of the text. Or you might have errors in the data that you want to remove.

Blog Post Image

After using the basic transformations to shape your data, there may or may not be more to do, depending on the data you're working with. However, you'll never use all the possible transformations in every model that you create.

You need to move data out of a query into its own separate query and shape it in a different way. There are two ways that you can do this, one is to duplicate a query, and in that case, you literally make a copy of the query. And any further changes that you make to the first query are not reflected in this copy. That means you can delete columns or add columns to the first query, and the second query stays the same as it was when you created it. Then, in the second query, you can remove steps or add steps, affecting only the second query while the original query is untouched.

As an alternative, you can reference a query. In that case, any changes that you make in the original query carry over to the reference copy. However, whatever steps you add to the reference query affect only that query.

Another transformation you may need is unpivot, and this is useful when you have similar types of data spread out over multiple columns in a single row, and you want those values to be set up in separate rows.

Merging Queries

Another way to filter rows of data in your model is by merging queries. This technique requires two queries with some data in common so you can match them up. There's likely to be data in one query that's not found in another query, and also, one query may have a large amount of data, so why hold all that extraneous data if it's not going to be used? The solution is to use the merge queries transformation to filter one query down so that it only contains data that is in the main/largest query.

Find the Merge Queries button, and then in the dialogue box, the selected query displays the columns it currently contains.

Blog Post Image
Blog Post Image

And there's a drop‑down list below where you select the query you want to merge it with. A preview of the query's columns is displayed.

Blog Post Image

Now you need to tell how to match up the rows of data in each query. Both of them will have a column as a common unique identifier. Click the column,s that match up.

Blog Post Image

The 'Join Kind' drop‑down list is where you choose how the merge is supposed to work. There are many types of relational database joins (left, right, inner, etc.). For this task, I chose 'Inner' as I only want to display data for which there is a matching row.

Blog Post Image

The merge transformation adds a column with cell values of 'Table' to the query selected at the start. If you only want the effect of the merge to filter the rows in the selected query then you can remove this column.

Custom Columns

Blog Post Image

One way to create additional columns in a query is to work with custom columns, and Power BI gives you several ways to do this.

The easiest way is to create a column from examples, which means you type in some values in a column to show Power BI what you want the result to look like, and Power BI tries to determine what the formula needs to be to produce that result. It works best if your end result is not too complicated; it should be something easy like concatenating values from different columns or extracting a year out of a date field. You can always try providing an example first, and if Power BI can't help, you can then build out a formula to create a custom column. Power BI provides IntelliSense to help guide your formula development.

Another way to add a column is to invoke a custom function, such as combining multiple files in a folder into a single query. You can write your own functions to do complex work, such as making calls to a web service and dynamically constructing the web page based on dates in a table.

You can also create a conditional column. Essentially, you can set up an if/then/else description of how you want Power BI to generate a result. You could achieve the same result by creating your own custom formula, but the conditional column transformation provides a simple no-code interface for you.

Then there is an index column transformation, and this gives you the ability to add a column and number your rows from 1 to whatever, or you could start your index from 0. And this might be helpful if you need to have a unique value for each row of data in a query and you don't already have that kind of column in your data.

Lastly, there is a transformation to duplicate a column. You might make a copy of data in a column and then apply some additional transformation to it in another step.

To see how powerful Power BI can be, let's look at an example of adding a custom column from examples. This example will be taking a short date column (24/12/2022) and extracting just the year, and putting the year into a custom column.

Select one column (this will be the 'date' column) and then click on the 'Add Column' tab in the ribbon. Click on 'Column From Examples' and click on 'From Selection'.

Blog Post Image

A new column will appear called 'Column 1'. Click on the first cell in the first row of column 1 and type in the value you want to obtain (I have put in the year from the relevant cell), and press 'enter' on your keyboard. Column 1's name has changed to 'Text After Delimiter' (as a delimiter is being used to extract the year from the date), and the grey values are what Power BI deems to be the required correct values. Click on 'OK', and the custom column has been created.

Blog Post Image

This was pretty simple, but Power BI may need more information, so you might find yourself providing more than one example. If Power BI can't figure out what you mean, then a different method might need to be investigated, such as writing your own formula.

Unpivot Columns

Suppose you have data split across different columns (for example, a movie has more than one possible genre, leading to columns for genres 1, 2 and 3). In that case, unpivoting the data will transform it into one single row.

Blog Post Image

Right-click on the columns needing to be unpivoted and click on 'Unpivot Columns'.

Blog Post Image

This will then combine the data into two rows; one called 'Attribute', which was the name of the column headers, and one called 'Value' which are the values for the cells in the rows.

Blog Post Image

Once the values have been cleaned up, you will end up with the following:

Blog Post Image

Notice that there are multiple rows for one 'IMDB ID'? This is because the data has been transformed from being in columns to one row.