Exporting CommCare data

Throughout my time at operation Fistula we have used every way I know of to get our data from our CommCare apps into Tableau for analysis and presentation. There are three ways you can do this:

Manual exports on the CommCare site
The CommCare Export Tool
CommCare APIs

How do you collect data?

There are different types of CommCare users. Which data-export approach will be best for your organisation will depend on how you collect and analyse data.

The steady-apper: You have one or two well-made apps that you use for your work that gets set up and then has relatively few updates
The multi-apper: You create new apps frequently and have many apps that need to be produced, often in a relatively short time frame

Most CommCare users I’ve spoken to fall into category one, while we at Operation Fistula are definitely in category two. We create new apps frequently, as we digitise existing paper forms for clinicians and community workers across the world so that they can collect their patient information electronically.

The second question you should ask yourself is if you even need to export your data. There are different ways that data is utilised after it has been collected

Regular fixed reports: You might simply need to report on several agreed metrics once every few months to a predictable deadline.
Dynamic analysis and frequent demands: If you are more like us, there will constantly be new questions to answer and various timelines to follow. For instance, a partner might want their Tableau dashboard to be updated with the newest cases they have just entered because they have an important meeting a few days later. Or a donor has a question that needs to be answered urgently with the newest data.

Manual exports

When I first started working with Operation Fistula manual downloads were all there were and it was my task to figure out how to make these automatic. Working with this setup for the first few months was extremely frustrating but that is partly due to the demands of our work, which requires constant monitoring and frequent changes.

How: Manual exports from CommCare are easy, although a little tedious if you have many different apps. Just navigate to the Data menu option and select either form downloads or case downloads, then add a new export. You can customise the field names and which fields should be included and save this export so it can be repeated at a later date.

Pros: It’s free and simple. If you have a small number of apps and forms and a specific report you need to produce every quarter then this might be enough for you.

Cons: When you have a lot of forms and apps this list becomes pretty messy because any one-time request for data results in a new saved export. Especially when working with several colleagues on the platform, this quickly leads to duplicates unless there is a clear system for how exports are saved.

As the name suggests these downloads are not automated, so if you are trying to track something over time and get frequent updates this is a very inefficient way to do it. While you can make some changes to the download, there might be further cleaning steps required. So these will need to be repeated for each download (hopefully with an ETL tool such as Alteryx, which makes this easier but still not painless)

The CommCare Export Tool

This tool is provided by CommCare in order to allow automatic exports of your data. It’s not an out of the box tool as much as a set of code and instructions in order to create your exports. We had a volunteer who created this system for us, as it was beyond my knowledge in this area.

Our system lived on AWS and consisted of a file storing system, a Postgres database and a scheduler. Once the system is set up, you need to create a template in Excel that explains to the tool exactly how your form should be exported, which allows you to make some changes to field names, account for field duplication that happened through renaming and insert filters amongst other things.

We used this for a while but in the end it wasn’t the best solution for us.

Pros: If you have a limited number of apps with few changes and little cleansing needed then this is a viable option to achieve automation. You will need to have someone skilled enough to set this up for you but this could probably be achieved in a few days by someone with the right skills. In our case we were able to source this resource for free through a volunteer.

If you are adept with SQL you could perform your cleaning processes directly in the database or with another tool that allows you to automate this process.

Cons: The initial setup will need to be performed by someone with a good deal of developer or IT knowledge. It involves some Python. Because it runs on AWS there is some cost associated. Depending on the size of your app(s) you might get away with a micro instance and in the first year of use you will have your free tier which will make this pretty low cost. We had to upgrade to a small instance as we frequently experienced crashes. In the end we paid about £25/month.

The biggest reason we switched was the inflexibility of this system and the issues whenever creating a new automation. Any time changes were made to an app the template had to be amended, then reuploaded, then tested. Testing frequently revealed issues with our templates that sometimes took multiple iterations to figure out. This might be mitigated with a better storage place for the templates than we had in our case but the ways to make changes to templates still felt very limited and cumbersome. In part, the problem is that there is such limited insight into what the tool is actually doing, unless you pick apart the underlying code of the tool. This makes troubleshooting very difficult.

Another huge issue was that data still needed to be cleaned after being imported into the database. This meant running it through Alteryx and thus not actually having an automated setup at all in the end, as we do not have a way to automatically run our Alteryx workflows.

Furthermore, the export tool currently only supports exports to selected open-source databases and not our Exasol database.

CommCare APIs

The website is a little confusing on this point because there is an API that is depricated (and this is generally the first search result) but there are other APIs that are still active. This is the setup we use currently, which is getting us the closest to automation yet.

I have built a set of macros and workflows in Alteryx that allow me to easily access the data from CommCare, clean it and then output it to our Exasol database. While we unfortunately still don’t have a way to automate our Alteryx workflows, this gets us to a point where we can refresh our data with the click of a button.

Pros: This is the most flexible model yet. It allows us a lot of freedom in the way that our system is set up and the exact structure of our database. While we use Alteryx, APIs can be easily used with a multitude of software and coding languages. You could translate all of our Alteryx workflows into an open-source language or alternative tool of your choice.

Cons: With lots of flexibility also comes lots of work, so the initial process of setting up your processes might be time-consuming. This is worth it for our setup where we might create a new app, want to monitor the pilot phase, make changes to the app, and then analyse the actual data coming in. The data export for a new app is set up very quickly with our approach and changes can be implemented easily. But if your platform is less complex the amount of effort that goes into figuring out the basics is possibly not worth it.

This also means that potentially fewer team members will be able to support with these activities, as they require more specialised knowledge.

Conclusion

Each of the above approaches has its use cases. The approach you take will depend on the number of apps you have, the frequency that changes are made to these apps, and the skills and financial resources that are available to you.

Understanding your data in Tableau – A step by step guide

Tableau is a great tool to create charts at impressive speeds. But with this trend of fast analytics, and a focus on impressive visuals, the data checking process and thus accuracy aspect is sometimes be neglected. Fundamentally, you need to understand your data in order to make an effective dashboard

Right at the start of training the team at Operation Fistula in Tableau we used a #MakeoverMonday dataset on Nike factories. Of course the team focused on the variable looking at the proportion of women working in factories. When they presented their visualisations at the end of our one hour I quickly realised two things:

They were presenting the numbers at the country level, although the data was at the factory level
They were presenting the sum of this measure, even though it was a percentage

With these two aspects taken together, the visualisation obviously were not showing any meaningful or accurate picture of the situation.

In order to help avoid similar mistakes in the future and approach the data in a more systematic manner, I set up a checklist, that the team can work through in order to arrive at a visualisation that is an accurate reflection of the data while also being an effective tool of communication. Apart from the first step (look at the data preview or Excel file) all following steps should be completed using charts made in Tableau. This is far more effective than trying to figure something out from looking at a table of data of course.

I hope this checklist can be of use to others as well. I would love to hear from you if you have any steps that you would add to this list!

Steps to building a Tableau visualisation

Step 1 – Familiarise yourself with your data

First look at the raw data to get a general overview of the fields. Sometimes the dataset is so small you can understand it with one glance. Then use visual analytics in Tableau to answer the following questions:

How many rows of data do you have?
What is the lowest level of detail? I.e. what does Number of records count?
Are there duplicate entries?
What are the members of the different dimensions?
If there are several levels of detail, how do they relate? Are there hierarchies? (E.g. category and subcategory)
What do numeric fields mean? Are they percentages or whole numbers?
Are all fields assigned their correct data type?
Are all fields correctly classified as dimensions or measures?

Step 2 – Start asking questions

Using Tableau, explore the data to find first insights. There are likely hundreds of ways you can combine the fields in the data, and it might take some time until you find a combination that is meaningful. Some of the things you can do are:

Break down your continuous measures with the different dimensions
Look for correlations between your continuous measures by creating scatterplots
Identify outliers
Look for trends over time

Step 3 – Build your visualisation

Once you have identified your story, think about how to best communicate it. Your story can be the answer to a very specific question you can ask about the data or it can guide the user in answering their own questions:

What are you trying to say with your visualisation? What is your message?
Are you creating an exploratory or explanatory visualisation?
Do you need just one chart or will more charts add to your story?
What chart types are best suited for your data and story?
How can colours help you communicate your insights?
What is the best dashboard layout for your message?
Does your dashboard need additional explanation?
What is a concise, informative title that expresses the purpose of the dashboard?

Four things I learned building my first iterative macro in Alteryx

I’ve been using Alteryx for just over two and a half years now and have built several standard and batch macros, but never an iterative macro – until now.

Until fairly recently I wasn’t even sure what the difference was between an iterative and a batch macro, so let me share my take on this right at the start here.

How are iterative and batch macros different?

Batch macros run once for each row that they are fed. Each row of your input has a new value that affects the workflow in your macro and it will stop once it has gone through all rows. This is useful when you already know the list of variables that you will have to run through your macro, for instance if you want to call an API where you need to define the year for which you want the data to be called, but you would like all the years you can get. In this case you make a list of all the years you want for the input, package up your workflow into a batch macro and it will run once for each year in your list.

But often you might not know the list of values for which you have to run your workflow or how often you need to run it. You might only get this information by starting your workflow. In my case I was calling an API that contained an offset value. The basic API call only returned 20 records at a time and then included an offset value, that you could use to call the next 20 records. So you need to run a call, get the offset value and add it to the next call and so on.

This is what an iterative macro is made for. Rather than starting out with a list of values, it can take data created in your workflow and then run the next iteration based on this in a loop. In an iterative macro, you typically have one input and two outputs, one for the data you are generating and one for the data that you are sending back to the start of the workflow for the next run.

It took me quite a while to figure out how to set up my iterative macro to loop through the different offset values, and get all of our 2000+ contacts from Hubspot in 20 record increments. I learned a few things in the process, that seem pretty obvious now, but might be useful for you if you are struggling through your own iterative macro build process.

1. Configuring your output-input loop

When you set up your macro, you will have to indicate in your Interface Designer pane, which of your two macro outputs is your iteration output. This one will be fed back in every loop and will act as the new input. Make sure the headers of your output exactly match the headers of your input, otherwise Alteryx won’t know what to do with your new values

2. Finding errors

The most difficult part of building macros (this goes for the batch variety as well) is that it is happening in a black box. It can be incredible tedious to try and find out at what point something is going wrong, because you cannot watch the records as they pass through the macro, you just see things coming out all wrong on the other side.

You can just move your iterative output around, or create a new one for testing, but this won’t always be a suitable method for discovering the particular error you are looking for. I discovered the x-ray tool, which helps you overcome this by providing small snapshots of a few rows of data wherever you place it.

I haven’t been able to get the iteration output to show me any data, which is quite inconvenient, as it can be really helpful to investigate what has gone through each iteration. You can use the x-ray tool as a helper here, or just add another macro output during the build process, which receives exactly the same data as your iterative macro output.

3. Ending your iterations

You can determine a set number of times that your iterative macro should run. This is especially useful during the testing phase and to set a limit to how long a workflow should run before it shuts itself down eventually in case one of the configurations is incorrect but generally you will want something more dynamic than that. An iterative macro will run as long as the iteration output receives any data that it can feed back to the input. That means, that even if the value that you are referencing is NULL or empty, as long as there is a record, Alteryx will continue to run through the iterative loop until the max number of iterations is reached.

You should use filters and joins to make sure your workflow ends when a certain condition is met. In my use case there is another field that informs me if there is a follow-up page. As soon as that is 0 rather than 1, a series of tools prevent the offset value from being passed through the iterative loop.

If the field ‘has-more’ is not true no value is passed to the left side of the join, so the right side with the offset value passes through the right output. No records reach the ‘I Output’ in that scenario.

4. Checking your output

This is one I have been caught out by several times. When you test your iterative macro and just view the data output you will only be shown the first line of data from each iteration output.

Why? No idea.

You can see the full result of your macro if you attach any other tool and then look at the output of your macro or just use a browse tool.

Airtable and Alteryx

Have you heard of Airtable? We’ve been using it at Operation Fistula for several months now and are in love with it. Airtable is a collaborative online spreadsheet tool, that allows people who don’t normally work with data to create simple data bases that are easy to make, maintain and navigate. We now use it for various purposes such as:

Planning out apps before they are put into CommCare
Creating task lists
Tracking time on tasks
Submitting expenses
Keeping track of dashboard request
Mapping our filing system

An Airtable we are using to suggests talks to watch from TC18

These are just a few examples, and you will have to try it yourself to really understand the value that Airtable adds and how flexible of a tool it is. This post explains a little more what we do with our Airtable data.

Getting data out of Airtable

Collecting data is super easy in Airtable, and much of that information is happy to continue to live there as well. But at Operation Fistula we like to visualise our data in Tableau to make sure that we are getting the most use from it.

Manual downloads from Airtable are easy, but there is currently no way of connecting it directly to Tableau for a live connection. There are a few attempts at a Web Data Connector floating about, but none of these work at the moment and development of a WDC will be difficult until Airtable releases a metdata API.

Airtable does however have very extensive and clear API documentation, and I use Alteryx to extract data and then send it straight to our Tableau Online environment, so that the rest of the team can use the data they entered in Airtable to build visualisations that can help them with their work.

From Airtable to Alteryx

Using the macro

I created a macro, which you can find in the Alteryx gallery. The interface allows you to enter the API for the table you would like to access and your unique API key. You can find both of these in your Airtable API documentation.

The table reference is outlined in yellow in the image below. The API is outlined in orange. In order to view this, make sure to tick the box in the top right corner, which will reveal your key.

How it works

The outer shell macro serves the sole purpose of reading in those two bits of information, then sending them through the iterative macro that is nested inside of it.

This iterative macro is needed because because Airtable only retrieves 20 records at a time, so an offset is required for pagination (Thanks to Peter, who taught me how to do this part). So the first part of the macro sends the right information through the download tool and assesses if another iteration is needed, and what information should be passed back for the next loop.

The second part of the iterative macro parses out the json into an orderly table. I have used this macro with a number of tables now and have so far not run into any issues, but it is possible that the setup of the workflow encounters problems with certain field types that we haven’t used in any of the tables that I am accessing. If you come across any issues please let me know. I will endeavour to keep the macro updated.

From Alteryx to Tableau

Once through the macro, I typically still have a few steps of processing that I do in Alteryx. Often times this is limited to selecting only those fields that I want to import, but if you have multi-select fields in your table these might need to be split out to be used in analysis.

I then use the ‘Publish to Tableau Server‘ tool in order to set up a new data source on our Tableau Online environment. As we don’t have a way of scheduling Alteryx workflows at the moment I have all of my Airtables within one app, so that I only need to run the app to refresh all of my Airtable data on Tableau Online.

Change your mark colour when using an action filter in Tableau

What we want to do

I recently had the chance to get quite colourful for a dashboard that we are developing at Operation Fistula. The landing page gives the user an overview of the issue areas different entrepreneurs work in, and where in the world they are active. An action filter connects the two sheets so that we can filter the map to just one issue area.

In order to make it clear that this is what is happening, I wanted the marks on the map to change to the colour of the issue area when it is filtered, so that it is clear that the green dots on the map correspond to the green bar below.

How to do it

We need to create a calculation in order to get our action filter to behave this way. What we are telling Tableau is basically: When just one of the issue areas is selected, then return just the one value associated with that filter, when there is more than one issue area selected, show the default black

In a calculation it looks like this:

Let’s take that apart.

1. We are creating an IF statement, that tests a condition and returns one of two values: the string ‘All’ or the string that corresponds to the the issue area.
2. When there is more than one issue area, we want to return the value ‘All’ (This could be any other string though). The view is broken down to the country level, and each country has a different number of issue areas associated with it. So we need to use a level of detail calculation to exclude the [Country] field from this part of the calculation or ‘All’ might apply to those countries that have only one issue area associated with them. By excluding [Country] Tableau now looks at all of the issue areas in the view and counts each unique issue area once. At the outset this is 23, as this is the total number of issue areas we have in the data.
3. When we apply the action filter, the whole view will only show one issue area. This is when the first condition of the IF statement is no longer TRUE and the ELSE condition gets activated. Now that we only have one issue area, the calculated field will return the [Issue area and group] field.

So much for the calculation. The next steps are needed to complete the dashboard action setup.

1. Place the newly created calculation on the colour area of the marks card on the map sheet.
2. Place both your sheets onto a dashboard.
3. Set up your action filter so that when you select a bar the map is filtered to just the category that you have selected.

Now, you should see that the map colour has changed, but it will probably not be the same as the colour of the bar that you selected. That is because the colour assignment is tied to a specific field, and while you are referencing the original field [Issue area and group] in your calculation, it is still a new field.

So you need to go through and assign the same colours to the new field as well. This is a bit tricky, as you need to have the action filter activated and need to have the reference colour available to know what you are setting it to. There might be better ways of doing this, and if you can think of one please leave a comment, but I went with the following steps:

1. On the dashboard, reveal the colour legend for the map.
2. Select your first bar to activate the filter, the colour legend will change to your selection.
3. Double-click the legend, then double-click the dimension member and use the colour picker to get the right colour from your bar.
4. Repeat for every filter selection.
5. Delete the colour legend from your dashboard.

Normally you wouldn’t be using as many colours as I am hopefully, so this should be a fairly efficient process. Caveat is of course that if a new dimension member is added to your data you will need to adapt the colour match manually.

Naledi + Data

Explorations in the world of Tableau and Alteryx