So I’m going to share some things I learned about Excel. Good things.
First off let’s state a VERY important Detail, EXCEL IS NOT A DATABASE! But it does some REALLY cool database type queries!
A project I was working on for a few months involved a lot of data Analysis in Excel and first off, I’m not an Excel expert.
I’m very comfortable with working with spreadsheets and I learn INCREDIBLY fast. But I had some funky problems I was dealing with (Now all of the Excel experts stand down and stop laughing when you hear about my problems…. K?)
So one of the biggest problems I encountered was working with the DcountA,Dcount,DSum functions and getting some matches to work properly. Very often the numbers were higher or lower than they should have been.
The reason for this was in me not understanding about some of the challenges in the cells containing the matches, the cannot contain certain characters. Two I ran across (That seem pretty obvious) are the ? and *. Wildcard characters.
Here’s an example
So when I had lists over 100,000 long and I ran a query against them I found the queries terminated early. It was due to these wildcards in the data. So what to do?
If you are new to Excel like I was you can use a “SUBSTITUTE” function. Which does exactly what it sounds like it will do, Substitute one value for another. So to clean my data up I had to make a new column, Let’s call it GoodUserName containing a SUBSTITUTE for each cell to remove the “?” and replacing It with a “-“ – This example is targeting Cell A2
So now the list will look like this and I would target the queries against this new table instead.
Now if you needed to remove the “?” and the “*” you can actually wrap one inside the other like this.
This will make the data look like this now which removes all of the foreign Wildcards from our data.
Now whatever you do one side, you’ll need to do to the other. So on the Database half of the data, you’ll need to have some similar functions to ensure both tables match up. In Excel data I added an additional column called “GoodUserName” which contained the same type of formatting.
Now here’s another tricky challenge I ran across. Even WITH clean data, I was getting too many matches! Well THAT was maddening…
Here’s our sample Query table
And the Database
As you can see visually there is only ONE actual Last name that says “Smith”
But when I ran a DCOUNTA against this Database and applying this Query like this
I kept getting 4.
So it turned out by default I was getting the same results as running a MATCH function without specifying a “0” for “Exact Match”.
To fix this you need to ensure the data contains an “=” at the very beginning. So rather than soiling the data waters by editing all of the cells you can do something similar to the SUBSTITUTE. What you really need to do is use the CONCATENATE function to add the “=” to the data.
Like the Substitute we’ll be building a new table to target the Query against. Here the CONCATENATE is targeting the A1 Cell.
Here is our new sample Query table
So my DCOUNT will now look like this, I’m querying the MODIFIED data with an “=” to force an Explicit match
The result is now 1 (as it should be) from the same database
I also had cells that SHOULD have Matched in many instances and didn’t which drove me mad to no end. Here is a perfect example.
From the same Query table as last time I SHOULD be seeing a count of 4 for the lastname “Jones” but my DCOUNTA only showed 2. Upon examining the data I noticed this…. Trailing Spaces!
So to deal with this a new column was added to the Database for the Lastname, In this case I called it “CleanLastName”. This column simply contained a “TRIM” function to drop all leading and trailing spaces. Here we target cell A2 as an example
Now we need to ensure we adjust the name of our Lookup table to have it match the same added column in the database
But my DCOUNTA didn’t initially work. In fact the count was 0, which turned out made sense because I had expanded the SIZE of the database (an additional column) but had not adjusted to Query. So it had to be changed from this
to look like this to obtain the additional Column “C”
Now the correct DCOUNTA value of 4 was retrieved.