How do you handle your data?

A common issue I have is how to design, collect and analyze quantitative data for program evaluation. I’m curious what are some best practices that people in community abide by and if others feel they would benefit from a session on it? Are there methods we could outline that would be especially helpful for smaller projects that possibly don’t have a dedicated data scientist onboard? Possible key points of focus:

-How to estimate the counterfactual (what would have happened without your intervention)

-What assumptions different analytical models make and when appropriate to use them.

-Best practices for saving and sharing data (keeping with the open source nature of everything!)

Please comment with your thoughts and ideas!

1 Like

More context?

Hi @Shajara, long time! Case in question, are you trying to process data you have from Minerva’s educational online platform? Other than storage (maybe @Matthias can help with references to useful platforms), what would you like to do with it, what questions do you have that you hope that specific data could answer?

Are you still in Berlin, how’s it going?


Hello @Noemi, I’m doing well and am still at Minerva. I actually just finished with my semester in Buenos Aires! This line of thought has come from both assisting on an impact evaluation of a program the Federal Youth Ministry of Argentina is starting and a course I just took on program evaluation methodologies. These experiences showed me that a lot of the times people are collecting data that fails to measure what the types of changes they want to show or don’t know how to best analysis it after they have it.

My post was less of a direct question of how to use a specific data set and more of a starting point for creating space for people to start sharing advice and create best practices around data collection and analysis. I think this could be especially relevant to small interventions, that maybe lack a dedicated data analyst.

Also, how are you and what have you been up to these last few months?


1 Like

Very relevant!

This is super-relevant, I am very interested in it.

I have to admit I have no idea how you might do this in a rough-and-ready way. Hope to learn more from the session.

1 Like

On the same page!

Hey @Alberto,

I agree that I’m not sure how to do it in such a clear cut way. Perhaps, the best way such a session could happen if instead of trying to provide a top-down comprehensive answer to all data problems, it could be more bottom up session focused on getting people to engage with the thoughts underlying different analytical methods.

Where we would use some of the OpenCare projects that have already been introduced and have participants break into small groups, each with a different example and someone with a strong data background. Then have them discuss what kind of data has been collected, what different types of analysis could show, as well as the assumptions of each approach.

Followed by a short presentation session where each group describes there conclusions of how they would have run their analysis. Where other groups have a chance to ask questions about what it actually means to run such an analysis (eg. using a linear model when perhaps one does not exist). And then conclude with summarizing takeaways and themes seen across groups, and creating a dedicated forum topic on Edgeryders where people reach out to the community with questions they have around data analytics.

What are your thoughts on this? This is just a quick outline of my initial thoughts and is completely amendable. Also, apologies if I take long to respond I have limited internet access for two weeks.

1 Like

That’s a great idea!

Well done, @Shajara ! This is actually actionable. You seem to have a knack for this stuff. Let me try to reformulate it, to make sure I understand.

  1. We ask people working with projects represented at OpenVillage to sit down and participate in a "ballpark evaluation sprint".
  2. We build groups of say 4 project people + 1 data person (we should have a few. I can be one, and @melancon can be another). 
  3. We build eval sketches from the data: data that are easy/cheap to get, data that are so obviously useful that people crave them. Data of the first kind are realistic to obtain; data of the second kind require additional effort, but maybe one that could be baked into subsequent projects. "Obviously useful" data refer to what my friend Giulio Quaggiotto calls "data as input": he (and others) observe that in NGO work data is normally produced for reporting, and everybody hates reporting. What we are not doing is re-use the data that people on the ground need to do the work for reporting. 
  4. We try to use these sketches to figure out how small projects can do eval with small money. If we can crack this, everyone becomes more systematic and more fundable.


Wow, fantastic insight!

@Alberto Wow! You got the structure I was trying to outline, and you took it much further! I’m especially interested in your third bullet point, as it addresses some of the nuances I am trying to understand myself. What would an example of “obviously useful” data look like, so I can better contextualize it?

Also, your friend’s post brings in a great point about “thick data,” which you touched on earlier with your requests for people with experience in qualitative analysis. We can use these sprints to not just look at the hard numbers but also us the unique advantage that participants have deep knowledge of their own projects to show what analysis can look like when you combine hard data with firsthand, stake holder’s perspective.

Looking forward to hearing your thoughts!


@shajara, we are also very interested in this and would be excited to work together. However, we have a particular focus on how to preserve the anonyminity of participants in a given system’s records to provide a layer of defense from state violence. We are designing a workshop now that explores the scalability of some of the techniques we saw from the solidarity clinics in Greece, such as identifying certain characteristics about the care seeker that the participants are aware of, etc. We’d be happy to talk more about this is there is interest!



I’ll help too

I’m kind of a low-grade data scientist myself, and am struggling with the same issue. In Edgeryders, we do mostly small projects – not realistic to set aside huge resources for impact eval. But, if you are handling data, you are going to need resources. Reason: statistics. The simplest project eval tool is a test on the hypothesis that

variable you are trying to affect with the project = variable you are trying to affect without the project

If the test rejects the hypothesis, you have impact.

Apart from the obvious problems of finding the counterfactual etc, this has a problem: you need enough datapoints to support your rejection. “Enough” gets complicated quickly: it depends on the size of the difference between the value of the variable with/without the project, on assumptions on the distributions of the error term in each realisation of the experiment etc. But basically: the more datapoints you have, the more accurate your test. You want to be able to say, for example, that the difference in the variable between with-project and without-project is significant at the 95% confidence level. But datapoints cost. It comes down to this: project eval with statistics can usually not be done for small projects.

Does anybody have experience with qualitative eval methods, more tolerant of small sample sizes? Ethnography maybe?

What I’m trying to say is that I volunteer to help @Shajara set up a session on this at OpenVillage.

1 Like

Excited to work together

Hey @Alberto, really good points I hadn’t considered around the practical limitations of sample size. I would love to work with you on flushing this out! Please, see my other comment that I tagged you in for a possible outline.


@powermakesussick, this is actually really interesting to me! I know you work on state level scales, but I was wondering what is the smallest population you’ve worked with? The reason being, in my university’s mental health office we have been wanting to collect more comprehensive data to use for planning interventions throughout the term, however, we have been hesitant to do so because we don’t want to in any way encroach on the confidentiality of students. Knowing how to effective anonymize this data while still being able to work with it, could be deeply beneficial!

Impact tool

Hi @Shajara ! I really like this idea, as it addresses a pain point many projects have, especially smaller NGO’s. I’m involved as a founding team member in a young NGO and we are now discovering the value of data & measurements, as well as the costs that come with it.

Supporting organisations know this and a consortium of NGO’s in Belgium has recently launched Impact Wizard. It is spposed to help assess your impact, to then communicate it better to the public, funders, team members etc. There’s a free trial week and then you need to get a yearly subscription of €90, which seems like a fair price to me. I haven’t had the time to check it out in detail, so I’m not sure how powerful it is in terms of data analysis.

Looking at it from your expertise, how does the idea relate to the Impact Wizard tool? @Shajara | @Alberto