Data preparation is one of the hardest and least-discussed Tableau skills, as it’s all done before you ever open the program. Getting your data into the right shape to create the views you want is almost always going to be the bulk of your visualization work. And when you go from a workshop or instruction session to using your own data, or when you try to follow a tutorial step-by-step, but your data is in a different shape, you can start running into frustrating barriers. What you thought as a “knowledge of Tableau” problem might just be a data problem.
So, how do you know if your data is headache-free?
Much of the data we see in libraries comes in this format. It reduces duplication in data entry and allows you to quickly compare high-level aggregated totals. However, Tableau prefers raw/normalized data as its strength is in aggregating, grouping, and performing calculations on the data itself.
If you load a cross-tab dataset in Tableau, the program will try its best to give you something to work with, but it won’t be intuitive, nor will it match any of the instructions you find online. As you see in the image above, Tableau reads each column in the data as its own field—treating July as a different category of information from August. Dragging and dropping these fields into Tableau will give you some basic information – you will be able to see how many visitors came in July – but as soon as you try to add more than one month, you’ll start to get frustrated.
Tableau is looking for data in a machine-readable format, known as:
This is the preferred format for use in Tableau. Reformatting the data in this way will make working with the information in Tableau much more intuitive, and you will be able to build more robust visualizations and dashboards.
This is a normalized version of the data above. As you can see, the information has been reshaped so that each different kind of data is in a new column – year, month, library, and visitor count. There is a lot of duplication in the data entry (year and branch, in particular), but once the data is in Tableau, the program correctly interprets the different types of information in the file.
How do you normalize data?
Quick data normalization in Excel
- Create a pivot table with multiple consolidation ranges. This requires use of the PivotTable and PivotChart Wizard (ALT + D + P is the keyboard shortcut), not the default icon in the ribbon.
- Select “Multiple consolidation ranges”
- Select “I will create the page fields”
- Select the entire table for the range and click “add”
- Finish the wizard (remaining choices will not matter) and uncheck the Column and Row fields from the resulting PivotTable. You should see a single cell with a total Value
- Double click this value. The normalized data will appear in a new workbook with the row IDs on the left, the column headers in the middle, and the values on the right.
- Rename the default column headers (“Row, ” “Column,” “Value” to something more meaningful.
- Filter for blank values and delete the empty cells
This normalization trick can be used in combination with other Excel tricks for fairly sophisticated wrangling without the need for specialized tools or programs. Stay tuned for an overview of more advanced data wrangling in Excel, focusing on Concatenate, Text to Columns, and Trim.