So… if you have Excel, transforming semi-structured text into rows and columns in a spreadsheet is fairly trivial, but if you use Google Docs or a comparable web-based too, you probably don’t have that feature. Don’t give up hope. Where there’s a will there’s a way, right? This way is a little harder, but you’ll learn how to make data spin on its head and you’ll have bragging rights when you’re done.*
The hard way
Here’s some of the sample data I used in the previous conversion example. I wanted to split the Twitter ID, first name and last name into individual columns.
There are several common spreadsheet functions that can be used to coerce data from one form to another. Used together, they can be extremely powerful, albeit inelegant. The key is to break down complex problems into many simple problems and combine the solutions.
You can write complex formulas with functions nested within functions, or you can temporarily add extra columns to your spreadsheet and do things step by step. For the sake of illustration, I’ll go one step at a time.
Step 1: Cleanup
The first thing to get out of the way is data cleanup. Extra spaces where you don’t need or want them can lead to unpredictable results. Take a look at cell A2.
You can’t see them, but there are four spaces before the asterisk and an extra space at the end of the line. I used the trim(text) function to strip any spaces from the beginning and end of the cell so I have clean data to work with.
To the naked eye, the result looks the same, but the leading and trailing spaces are gone. If I knew that there was a consistent number of spaces in every row, I could skip this step. I decided to play it safe.
Next, I wanted to get rid of the asterisk and the space at the beginning of the cell. The functions in Excel and Google Docs spreadsheets are all about getting the results you want, not eliminating the results you don’t want, so you have to think about the problem in that light.
There is no function to remove two characters from the beginning or end of a cell, but there is a right(text,number) function to get characters from the end of a cell. (Similarly, there’s a left(text,number) function to get characters from the beginning of a cell.)
The challenge is that you have to tell the left() and right() functions how many characters you want. In this case, I wanted all but the first two, so I had to use the len(text) function to find out how many characters are in each cell and subtract two (the number of characters I didn’t want).
Alternately, I could have nested those two steps and achieved the same result.
Step 2: Extracting the Twitter ID
The next step is a little trickier. I wanted to extract the Twitter ID, which varies in length, so I had to grab the correct number of characters each time. I know that Twitter IDs are all one word, and there’s a space after each ID, so I used the find(find_text,text) function to locate the first space character in the cell and grabbed everything to the left of that using the left() function. I had to put the space in quotes for it to be recognized as a literal text string.**
Again, I could have nested those two steps and achieved the same result.
In fact, I could have nested the cleanup and the text extraction steps in one long formula: =left(right(trim(A2),len(trim(A2))-2),find(” “,right(trim(A2),len(trim(A2))-2)))
It looks ugly, but if you build it step by step from the inside out, it’s not much more difficult than the alternative.
Step 3: Variations on a theme
It’s clear sailing from here. Mix and match left(), right(), len() and find() to extract the first and last names — same methods in different combinations.
Once I found the Twitter ID, I was able to find the full name by extracting the characters to the right of the Twitter ID. I also used trim() to get rid of extra spaces.
To find the first name, I extracted everything to the left of the first space in the full name…
…and I extracted everything to the right of the first space to find the last name.
Take it easy
Nested formulas can look and feel intimidating, even to spreadsheet pros. Sometimes it’s easier to create temporary columns rather than nesting formulas. You can see the results of your work and identify problems as you go.
Once everything is working, you can either make a nested formula by copying things that work into new formulas, or you can hide the columns with your intermediary steps so your spreadsheet looks neat and friendly.
A well-designed spreadsheet can make your lists more useful by automating tedious tasks. Organization is just the beginning.
Notes:
THANK YOU for this really useful and well written post. It was such a huge help. We’ve been keeping our client list in an Excel spreadsheet – just copying and pasting what people wrote when they filled out the shopping cart form. We just migrated our e-store to a more robust platform that actually has CRM built in and I was trying to figure out a way to clean up all this messy data – extra spaces, middle initials, etc – to boil it down to first and last name. This post solved that problem. It doesn’t fix people who don’t know how to use capital letters or who only know how to use capital letters, but it did get a chunk of the work done really quickly.
Thank you VERY much.
Jennie at http://www.justpoppin.com
Jennie, I’m glad you found this useful. The proper() function should solve that last problem you’re having. It capitalizes the first letter in all words of a text string, so something like =proper(”whisper & SHOUT”) becomes “Whisper & Shout”. I hope that helps.