Tuesday, April 23, 2013

Excel Data Explorer and the Twitter Search API


I had seen Data Explorer for the first time at the PASS Summit last year, but was not fully aware of its capabilities until the recent PASS BA Conference. I saw demos of how it can load web data into Excel. I thought Data Explorer was more of a cloud BI solution, but now understand that its a very powerful query engine/data transformation tool. You can import data from many sources like you can PowerPivot, but what's fascinating is its ability to load web data from static html pages, tables on web pages, web services, etc. I had a need to search recent activity on Twitter and decided to test drive Data Explorer in Excel.

Twitter's Search API returns JSON formatted data, and I heard multiple times that Data Explorer can handle JSON. To start, here's a sample url string you can send to Twitter anonymously using Version 1 of the Search API:


Here's an explanation of the parameters used in this URL string:
  • q: The query text. See Using the Twitter Search API for examples and operators.
  • rpp: The number of results returned. We're limited to 100 here, but can get more with the REST API.
  • until: Tweet date is less than this date formatted as YYYY-MM-DD.
Good news is that Data Explorer is available for Excel 2013 and Excel 2010 SP1. Assuming you've installed Data Explorer cold, you'll want to be sure to turn on a couple of settings: 1) Enable Advanced Query Editing and 2) Fast Combine. This is for parameterization of our queries and copy/pasting formulas later on. I don't know why these options exist since you can use the tool with or without them. Personally, I think Microsoft should just leave them on. It took me a fair amount of time before realizing these settings were my problem. Nonetheless, to turn on these features, up in the Data Explorer ribbon, click on the Settings button and select the Advanced Query Editing option and then OK. Under Settings is a button for Fast Combine. Click it and accept the prompt to Enable.

Now we're ready.

In the Data Explorer ribbon, choose the button to retrieve data from Web.


Enter the Twitter Search URL string above in the form and select OK.

This will open a Data Explorer Query window with a table of information about the query results. Click on the word "List" and you'll get a list of records. If you expand the Steps dialog on the right, you'll notice that this added a step in your data shaping process. Pretty cool, right?


Right click on the header of this list and select the To Table option. This convert this list object to a table. If you don't see any records by this point, you'll have to change the parameters of the URL string until you see some records returned. Get familiar with the limitations of Twitter's v1 Search API here.

Accept the list of defaults in the To Table dialog. Click OK.

The resulting table is very similar to the list, but with a very important exception. In the header of the table next to the word "Column" is a small two-arrow icon. Click this little button and it will automatically read the columns defined in the JSON file and give you the option to choose the ones you want to bring into your query. Leave them all selected for now. You can always remove columns later or by modifying the code at this Step in your data shaping process.

Click OK on the columns you'd like to expand and you'll finally get columns of all the tweets.

Now you can rename columns, change data types, or hide columns. OR, you can just copy the code below in the formula window and click refresh.

let

    Source = Json.Document(Web.Contents("http://search.twitter.com/search.json?q=passbac&rpp=100&until=2013-04-21")),

    results = Source[results],

    TableFromList = Table.FromList(results, Splitter.SplitByNothing(), null, null, ExtraValues.Error),

    ChangedType = Table.TransformColumns(TableFromList,{},Value.FromText),

    #"Expand Column1" = Table.ExpandRecordColumn(ChangedType, "Column1", {"created_at", "from_user", "from_user_name", "geo", "iso_language_code", "text"}, {"CreatedDate", "FromUser", "FromUsername", "Geo", "ISOLanguageCode", "Text"}),

    ChangedType1 = Table.TransformColumns(#"Expand Column1",{{"CreatedDate", DateTime.From}})

in

    ChangedType1



Click Done on the query window and the data will be imported into Excel. Yes!

Now let's parameterize this query so that it's easy to update the search results from Excel instead of repeatedly opening Data Explorer to change the URL. Originally, I followed much of Chris Webb's post on how to do this a parameterized query string in the formula, but for this post, I found it easier building my Twitter Search URL string with an Excel formula. This can handle empty parameters a little better and can encode the "q" parameter text string using Excel's ENCODEURL function as Chris' post suggests you do.

Add another couple tables in another worksheet named Parameters: one to list each parameter's value and one for the Search URL string. It's nice to add a description of each parameter as shown below so you can remember each parameter's definition in the string. Then in the Search URL string table, build the URL with the following formula:

=CONCATENATE("https://search.twitter.com/search.json?q=",ENCODEURL(B2),IF(ISBLANK(B3),"",CONCATENATE("&rpp=",ENCODEURL(B3))),IF(ISBLANK(B4),"",CONCATENATE("&since=",ENCODEURL(B4))))


Before you continue, make sure to note the name you gave the table containing the parsed URL. You can do this is in the Design tab of the Table Tools ribbon. By default, Excel will name your tables Table1, Table2, and so on.

Now go back to the sheet with the Data Explorer query, and you can either click on the Filter & Shape button in the Query tab of the Table Tools ribbon, or click on the Filter & Shape button in the Query Settings dialog at right that appears when you bring up this worksheet.

Now with the query window open, click on the formula button to open up the formula editor. You'll want to replace the url string in double-quotes with Excel.CurrentWorkbook(){[Name="SearchString"]}[Content]{0}[SearchString] where SearchString is the name I assigned to the table containing the URL I constructed on the Parameters worksheet. Hit enter to exit out of the formula editor, and click Done in the query editor. Now you have a parameterized query. The resulting formula for the whole spreadsheet will look similar to the one below with differences in case you renamed columns or changed data types.
let

    Source = Json.Document(Web.Contents(Excel.CurrentWorkbook(){[Name="SearchString"]}[Content]{0}[SearchString])),

    results = Source[results],

    TableFromList = Table.FromList(results, Splitter.SplitByNothing(), null, null, ExtraValues.Error),

    ChangedType = Table.TransformColumns(TableFromList,{},Value.FromText),

    #"Expand Column1" = Table.ExpandRecordColumn(ChangedType, "Column1", {"created_at", "from_user", "from_user_name", "geo", "iso_language_code", "text"}, {"CreatedDate", "FromUser", "FromUsername", "Geo", "ISOLanguageCode", "Text"}),

    ChangedType1 = Table.TransformColumns(#"Expand Column1",{{"CreatedDate", DateTime.From}})

in

    ChangedType1



There's a lot you can do with this design pattern, so be sure to check out the Twitter documentation for using the Search API. I uploaded this sample spreadsheet here which I created in Excel 2013 if you want to try out some different query strings for yourself. Important: Using a parameter from an Excel workbook is possible because I enabled Fast Combine before doing anything with Data Explorer. If you forget, you'll get an error that says 'Query' is accessing data sources which can not be used together. If this is you, then enabling Fast Combine should do the trick.

It's also worth mentioning that version 1 of Twitter's Search API is nice because it doesn't require any special authentication, but this method is deprecated and may be eliminated at any time. The preferred method to search tweets is to use v1.1 of Twitter's Search API which uses OAuth to authenticate yourself before actually pulling any data from Twitter. If you have to do this, then you'll have the extra task of developing a utility with .NET or building a script task in SSIS to pull JSON file(s) one-by-one into a directory. Data Explorer is able to handle multiple files in a single directory, and JSON files should fall right in line with this. If any of you would like to tackle that challenge next, please leave a comment with a link to your blog or published article for all of us to read.

More Data Explorer fun coming soon. Thanks for reading!

2 comments: