Thursday, 24 November 2016

Why INDEX MATCH is Better Than VLOOKUP



When deciding between which vertical lookup formula to use, the majority of Excel experts agree that INDEX MATCH is a better formula than VLOOKUP.  However, many people still resort to using VLOOKUP because it’s a simpler formula.  One major driver of this problem is that most people still don’t fully understand the benefits of switching from VLOOKUP, and without such an understanding, they are unwilling to invest the time to learn the more complex formula.  The purpose of this post is to describe in detail all of the benefits of using INDEX MATCH and convince you that you should use INDEX MATCH exclusively for all of your vertical lookups.

If you don’t know how to use INDEX MATCH, please click here for a detailed tutorial.  Once you’ve learned INDEX MATCH and started using the formula consistently, you’ll reduce the number of spreadsheet errors you make, become more efficient in navigating Excel, and significantly improve your ability to create complex Excel models that have a heavy databasecomponent.

INDEX MATCH vs VLOOKUP 01

With the VLOOKUP syntax, you specify your entire table array, AND THEN you specify a column reference to indicate which column you want to pull data from.

INDEX MATCH vs VLOOKUP 02



It’s a small difference, but this additional step undoubtedly leads to more errors.  This error is especially prevalent when you have a large table array and need to visually count the number of columns you want to move over.  When you use INDEX MATCH, no such counting is required.
The next two advantages of using INDEX MATCH also come from the fact that VLOOKUP requires a static column reference input for the values you want to return.  Please note that you could get around the static reference issue by using a formula within the VLOOKUP syntax, creating a combination formula such as VLOOKUP MATCH or VLOOKUP HLOOKUP.  However, it’s much simpler just to learn INDEX MATCH and gain all the additional benefits beyond just having dynamic column reference.

Insert Column Immunity

The greatest benefit of using INDEX MATCH over VLOOKUP is the fact that, with INDEX MATCH, you can insert columns in your table array without distorting your lookup results.  Any time you work with a large dataset, there’s a good chance you’ll need to go back to edit our columns and potentially insert a new column.  With VLOOKUP, any inserted or deleted column that displaces your return values will change the results of your formulas.
Take the VLOOKUP example below.  Here, we’ve setup the formula to pull the State value from our data table.  Because it is a VLOOKUP formula, we have referenced the 4th column.
INDEX MATCH vs VLOOKUP 03

If we insert a column in the middle of the table array, the new result is now “Seattle”; we are no longer pulling the correct value for State and must change the column reference.
INDEX MATCH vs VLOOKUP 04
INDEX MATCH has insert column immunity, so you can insert and delete columns without worrying about updating every associated lookup formula.

Easier to Drag and Copy

When working with large datasets, it’s rare to write just a single lookup formula; you’re likely going to need to drag and copy your formula to multiple cells to perform multiple lookups.  For example, let’s say that for a specific ID, I want to return a series of values from my table in the exact same order as they appear in the table.  When I try to drag and copy the standard VLOOKUP formula across (with a reference lock on both the lookup value and the table array), the lookup doesn’t work because it just pulls the same value for each entry.  This deficiency is again caused by VLOOKUP requiring a specific column reference input for your return values.
INDEX MATCH vs VLOOKUP 05
With INDEX MATCH, because you can set the return column to float (essentially by not reference locking it) the return column will move as you copy your formula over, providing you the different fields as they appear.
INDEX MATCH vs VLOOKUP 06

No Array Restriction

Another key limitation of VLOOKUP is that it requires you to specify a square table array in which your column reference cannot move beyond.  The key situation when this becomes a problem is when you append a new field to your dataset beyond your original table array.  Consider the example below:
INDEX MATCH vs VLOOKUP 07
If I append a new column to this dataset, I can’t use my original VLOOKUP formula to pull values from that new column.  If I change my column reference to “6” the formula returns an error because my table is only 5 columns wide.  To make the formula work, I’d have to update the table array I specified every time I add a new column.
INDEX MATCH vs VLOOKUP 08
With INDEX MATCH, you don’t need to specify a table array and therefore don’t face this problem.

Right to Left Lookup

One of the key benefits of using INDEX MATCH is being able to create lookup keys on the right side of your data table.  New lookup keys are generally created by running calculations on fields within your original dataset.  These keys represent your lookup column within the INDEX MATCH syntax.  When using INDEX MATCH, you can append these new lookup keys to the right side of your table and perform a right-to-left lookup to pull the values you want to return.  The INDEX MATCH syntax doesn’t care whether your lookup column is on the left or right side of your return column.
INDEX MATCH vs VLOOKUP 09
With VLOOKUP, because you can only perform a left-to-right lookup, any new lookup key you add must be on the left side of your original table array.  Therefore, every time you add a new key, you have to shift your entire dataset to the right by one column.  Not only is this annoying, but it can also interfere with existing formulas and calculations you’ve created in your spreadsheet.

Doubles as an HLOOKUP

This probably isn’t the biggest deal to most users, but it’s still worth mentioning.  A VLOOKUP restricts you to only performing vertical lookups on a table array.  I can’t, for example, lookup values across the top of my dataset and perform a horizontal lookup.  To achieve this objective, I’d need to use a completely different formula.
With INDEX MATCH, you can simply arrange both your lookup array and return array horizontally to perform an HLOOKUP.  I don’t recommend doing horizontal lookups in general, but if you ever need to, you should definitely use the INDEX MATCH formula.

Lower Processing Need

The processing benefit of using INDEX MATCH has been marginalized in recent versions of Excel because the software’s processing capabilities have expanded so significantly.  But just give everyone a brief history lesson I’ll elaborate upon this benefit.  In some of Excel’s older versions, the software had tangible processing limitations that would be noticeable if you built a huge data set.  In some situations, I was required to lookup values for thousands of rows so I could append a new column to a large table.  Once I added all of these formulas, the software would freeze up and take several minutes to calculate the return values.  I eventually had to replace my VLOOKUP formulas with INDEX MATCH to speed up the calculations.
The reason for this difference is actually fairly simple.  VLOOKUP requires more processing power from Excel because it needs to evaluate the entire table array you’ve selected.  With INDEX MATCH, Excel only has to consider the lookup column and the return column.  With fewer absolute cells to consider, Excel can process this formula much faster.
Again, please keep in mind that this benefit is probably no longer noticeable given the improvements made in Excel’s processing power.  However, if for any reason you need to run thousands of lookup formulas within a spreadsheet, make sure to use INDEX MATCH.

Conclusion

The one disadvantage of using INDEX MATCH is that it is clearly harder to learn and more difficult to remember.  Excel does not have the syntax built into the software and the syntax by itself is not intuitive.  However, this hurdle can easily be resolved by reading a simple tutorial on how to use the formula.
Making the switch from VLOOKUP to INDEX MATCH was one of the key steps I took to improve the quality of the Excel models I built.  I now use INDEX MATCH exclusively for all of my vertical lookups.  I do this not just because it’s a better formula, but also because it helps me remember the more complex syntax.  Once you’ve made the switch, and experienced the benefits of using INDEX MATCH, I promise you that you won’t go back to VLOOKUP.

Thursday, 22 September 2016

Excel : The Best Tool Ever

Introduction

I’ve always admired the immense power of Excel. This software is not only capable of doing basic data computations, but you can also perform data analysis using it. It is widely used for many purposes including the likes of financial modeling and business planning. It can become a good stepping stone for people who are new to the world of data analysis.
Even before learning SQL, SAS, R or Python, it is advisable to have knowledge of Excel. It does no harm to add excel in your skill sets. Excel, with its wide range of functions, visualization, arrays empowers you to quickly generate insights from data which would be hard to see otherwise.
It has a few drawbacks as well. It can’t handle large data sets very efficiently. I’ve personally faced this issue. Try doing computations of data ~ 200,000 entries and you’ll notice that excel starts struggling. There are ways to work around and handle this data to some extent, but Excel is not a big data tool. In such cases, R or Python are the best.
I feel fortunate that my journey started with Excel. Over the years, I’ve learnt many tricks to work to deal with data faster than ever. Excel has numerous functions. It becomes confusing at times to choose the best one. In this article, I’ll provide you some tips and tricks to work on Excel and save you time. This article is best suited to people keen to upgrade their data analysis skills.
Note: If you think you are a master coder in data science, you won’t find this article useful. For others, I’d recommend you to practice these tricks to develop a concrete understanding of them.

Commonly used functions

1. Vlookup(): It helps to search a value in a table and returns a corresponding value. Let’s look at the table below (Policy and Customer). In Policy table, we want to map city name from the customer tables based on common key “Customer id”. Here, function vlookup() would help to perform this task.
Vlookup_1
Syntax: =VLOOKUP(Key to lookup, Source_table, column of source table, are you ok with relative match?)
For above problem, we can write formula in cell “F4” as =VLOOKUP(B4, $H$4:$L$15, 5, 0) and this will return the city name for all the Customer id 1 and post that copy this formula for all Customer ids.
Tip: Do not forget to lock the range of the second table using “$” sign – a common error when copying this formula down. This is known as relative referencing.

2. CONCATINATE():  It is very useful to combine text from two or more cells into one cell. For example: we want to create a URL based on input of host name and request path.
Concatenate

Syntax: =Concatenate(Text1, Text2,.....Textn)
Above problem can be solved using formula, =concatenate(B3,C3) and copy it.
Tip: I prefer using “&” symbol, because it is shorter than typing a full “concatenate” formula, and does the exactly same thing. The formula can be written as  “= B3&C3”.
3. LEN() – This function tells you about the length of a cell i.e. number of characters including spaces and special characters .
Syntax: =Len(Text)
Example: =Len(B3) = 23

4. LOWER(), UPPER() and PROPER() –These three functions help to change the text to lower, upper and sentence case respectively (First letter of each word capital).
Syntax: =Upper(Text)/ Lower(Text) / Proper(Text)
In data analysis project, these are helpful in converting classes of different case to a single case else these are considered as different classes of the given feature. Look at the below snapshot, column A has five classes (labels) where as Column B has only two because we have converted the content to lower case.
Cases
5. TRIM(): This is a handy function used to clean text that has leading and trailing white space. Often when you get a dump of data from a database the text you’re dealing with is padded with blanks. And if you don’t deal with them, they are also treated as unique entries in a list, which is certainly not helpful.
Syntax: =Trim(Text)

6. If(): I find it one of the most useful function in excel. It lets you use conditional formulas which calculate one way when a certain thing is true, and another way when false. For example, you want to mark each sales as “High” and “Low”. If sales is greater than or equals to $5000 then “High” else “Low”.
Syntax: =IF(condition, True Statement, False Statement)
Conditional

Generating inference from Data

1. Pivot Table: Whenever you are working with company data, you seek answers for questions like “How much revenue is contributed by branches of North region?” or “What was the average number of customers for product A?” and many others.
Excel’s PivotTable helps you to answer these questions effortlessly. Pivot table is a summary table that lets you count, average, sum, and perform other calculations according to the reference feature you have selected i.e.  It converts a data table to inference table which helps us to take decisions. Look at the below snapshot:
PivotAbove, you can see that table on the left has sales detail against each customer with region and product mapping. In table to the right, we have summarized the information at region level which now helps us to generate a inference that South region has highest sales.
Methods to create Pivot table:
Step-1: Click somewhere in the list of data. Choose the Insert tab, and click PivotTable. Excel will automatically select the area containing data, including the headings. If it does not select the area correctly, drag over the area to select it manually. Placing the PivotTable on a new sheet is best, so click New Worksheet for the location and then click OKPivot2Step-2: Now, you can see the PivotTable Field List panel, which contains the fields from your list; all you need to do is to arrange them in the boxes at the foot of the panel. Once you have done that, the diagram on the left becomes your PivotTable.
Pivot3Above, you can see that we have arranged “Region” in row, “Product id” in column and sum of “Premium” is taken as value. Now you are ready with pivot table which shows Region and Product wise sum of premium. You can also use count, average, min, max and other summary metric. For more detail on Pivot table, I would suggest you to refer this link.

2. Creating Charts: Building a chart/ graph in excel requires nothing more than selecting the range of data you wish to chart and press F11. This will create a excel chart in default chart style but you can change it by selecting different chart style. If you prefer the chart to be on the same worksheet as the data, instead of pressing F11, press ALT + F1.
Of course, in either case, once you have created the chart, you can customize to your particular needs to communicate your desired message.ChartsTo know about different properties of charts, I would recommend to refer this link.

Data Cleaning

1. Remove duplicate values: Excel has inbuilt feature to remove duplicate values from a table. It removes the duplicate values from given table based on selected columns i.e. if you have selected two columns then it searches for duplicate value having same combination of both columns data.
Duplicate
Above, you can see that A001 and A002 have duplicate value but if we select both columns “ID” and “Name” then we have only one duplicate value (A002, 2).
Follow the these steps to remove duplicate values: Select data –> Go to Data ribbon –> Remove DuplicatesDuplicate2
2. Text to Columns: Let’s say you have data stored in column as shown in below snapshot.
Text_ColumnAbove, you can see that values are separated by semi colon “;”. Now to split these values in different column, I will recommend to use “Text to Columns” feature in excel. Follow below steps to convert it to different columns:
  1. Select the range A1:A6
  2. Go to “Data” ribbon –> “Text to Columns”
    Text_Column_2Above, we have two options “Delimited” and “Fixed width”. I have selected delimited because the values are separated by a delimiter(;). If we would be interested to split data based on the width such as first four character to first column, 5 to 10th character to second column, then we would choose Fixed width.
  3. Click on Next –>Mark check box on for “Semi colon” then Next and finish.
    Text_Column3

Essential keyboard shortcuts

Keyboard shortcuts are the best way to navigate cells or enter formulas more quickly. We’ve listed our favorites below.
  1. Ctrl +[Down|Up Arrow]: Moves to the top or bottom cell of the current column and combination of Ctrlwith Left|Right Arrow key, moves to the cell furthest left or right in the current row
  2. Ctrl + Shift + Down/Up Arrow: Selects all the cells above or below the current cell
  3. Ctrl+ Home: Navigates to cell A1
  4. Ctrl+End: Navigates to the last cell that contains data
  5. Alt+F1: Creates a chart based on selected data set.
  6. Ctrl+Shift+L: Activate auto filter to data table
  7. Alt+Down Arrow: To open the drop down menu of autofilter.  To use this shortcut:
  8. Alt+D+S: To sort the data set
  9. Ctrl+O: Open a new workbook
  10. Ctrl+N: Create a new workbook
  11. F4: Select the range and press F4 key, it will change the reference to absolute, mixed and relative.
Note: This isn’t an exhaustive list. Feel free to share your favorite keyboard shortcuts in Excel in the comments section below. Literally, I do 80% of excel tasks using shortcuts.

End Notes

Excel is arguably one of the best programs ever made, and it has remained the gold standard for nearly all businesses worldwide. But whether you’re a newbie or a power user, there’s always something left to learn. Or do you think you’ve seen it all and done it all? Let us know what we’ve missed in the comments.