How Mercy Corps syncs databases in Afghanistan
- Posted: May 29, 2008 at 6:34 AM
- 1,009 Views
Right click “Save as…”
My guests for this week's Perspectives show are Barbara Willett and Nigel Snoad. Barbara works for Mercy Corps in Afghanistan, as the design, monitoring, and evaluation manager for a number of agricultural development programs. Nigel Snoad is a lead capabilities researcher for Microsoft Humanitarian Systems. Together they've pioneered the use of FeedSync as a way to synchronize data collection and reporting in an environment where Internet connectivity is spotty, and where lightweight, two-way synchronization is essential.
Nigel Snoad and Barbara Willett
JU: Barbara, we want to discuss the database synchronization system that you've partnered with Nigel to develop, as part of the Mercy Corps work in Afghanistan.
BW: I'm the design, monitoring and evaluation manager here in Afghanistan. I arrived last year in March, about the same time we had a consultant in doing a general review. He also brought another consultant who'd worked with Mercy Corp on technical issues, including the development of databases.
JU: And can you explain what, in this context, is being designed, monitored, and evaluated? What are the programs you're supported, and what do those programs do?
BW: In Afghanistan, almost all our programs revolve around agricultural development. We have a number of programs funded by USDA, the British Government, the European Commission, all with the same goal of improving the livelihoods of the Afghan people.
JU: Is your microfinance program among those?
BW: Yes, we have a microfinance program, but it's one of our older ones, and it's self-sustaining now, no longer administered directly by our monitoring and evaluation system.
JU: What are some examples of programs that are?
BW: The ABAD [Agro-Business and Agriculture Development] program is where all this started. It supports business development, and technical capacity training for farmers.
There's also a lot of work in animal health, redeveloping and reestablishing veterinary field units, and training veterinaries, para-veterinaries, and female livestock workers.
JU: So your management challenge, relative to these programs, is what?
BW: It's developing tools that are applicable to multiple programs doing the same kinds of things. Everybody's involved in agricultural development, and interested in measuring improvement in sales and production. It's a challenge to collect that data and share it -- both operational data and impact data.
JU: So the field workers are in various locations around the country, with intermittent Internet access?
BW: Right. And in these circumstances, we want to standardize how we collect, synchronize, share, and report on this information.
JU: You had a pre-existing system based on Microsoft Access, as I understand it, and there were problems synchronizing those databases to your central office.
BW: Initially we didn't have Access, actually. When I arrived there wasn't any centralized system at all. Everything was Excel-based, sharing spreadsheets month-to-month from this region to that region. So we started the Access system, then later we realized it wasn't really working out because of the Internet problems, and because the process was bulky and cumbersome.
JU: Nigel, that's where you come in, right?
NS: Yes. Our humanitarian team was in Afghanistan looking to talk with people, do a bit of show and tell, and mainly get feedback about what people would like, and what they really need. And then use that to iterate what we were doing, and look for partnerships to do pilots.
With Mercy Corps we said, here's what we've got, here's what we're thinking, does that make sense to you?
Mercy Corps was great for that, because they were fairly well advanced in their thinking about how they were using their Access solution, and the architecture of what they wanted to do was quite clear.
JU: So from your perspective, Barbara, what was the outcome? Did it look just like what you had before, except that the synchronization problems were magically solved?
BW: Yes. I just wanted things to share, I wanted to know that we all had the same database, and somehow, whether it's every week or every month or every minute, the information just has to connect.
On paper it looked like we could do that with Access replication, but when we realized the problems that was causing, we realized that this technology Microsoft had been talking about -- which seemed maybe a little beyond our needs -- might actually solve the problems that we had. They wanted to try it, we wanted to try it, so it seemed to dovetail well.
And yes, it made happen in reality what I'd wanted to happen on paper. I was willing to accept weekly or even monthly if that's what it took, but now it happens every 10 minutes.
JU: Is this a situation where the updates that flow in from various locations tend not to conflict with one another?
BW: Yes, conflict resolution hasn't yet been much of an issue. Our biggest issues have been in our own database development, because the database itself is still evolving. So each time that changes, it affects the job mapping and FeedSync.
NS: The conflict resolution stuff is in there, and I think it'll become increasingly important as the size of the data grows, and as the activity from all the endpoints grows.
But we were quite deliberate about trying this in one place, and seeing if from Mercy Corps' perspective it worked out the way the Microsoft team had envisioned. It really was a tight spiral between developing new ideas and technologies, and also proving them and using them.
It was great to be able to do that in a real environment, but also a fairly controlled one, which was our first pilot in Kunduz. But then, you took it all over the country, damn you. [Laughs]
JU: There was a preexisting synchronization system that was found wanting. What was that, why didn't it work, and why is the FeedSync solution working?
NS: The solution Mercy Corps was originally piloting was based on Access and Access replicas. Which is a great technology, but in Afghanistan they were struggling with an unreliable Internet connection, and those replicas weren't working well. There was a lot of data to send, and there was a peer-to-peer VPN that would work OK sometimes, but flake out sometimes, mainly due to the Internet connection.
And in some cases, there was no Internet connection at all. So you have to send something by courier, be it a file on disk or on a memory stick.
JU: With FeedSync it's the same in the no-Internet case, you still have to sneaker-net the file.
NS: Absolutely. But with Access, when you export to a file, that's a one-way transfer. And there's all kinds of data you want to get back. Corrections to data, if there's a problem. (That's where conflicts can arise.) Then there are the reference lists and the lookups -- names of provinces, names of villages, names of staff people who are attending training sessions. All these things have to flow back to the edge, and be kept consistent.
JU: The point being that FeedSync isn't just lightweight, and more resilient to poor connectivity, but also that it's two-way.
NS: It's a two-way technology, and you've got different versions of the same thing, not one version that you're trying to somehow import and export and merge.
JU: Barbara, is this two-way aspect evident to you as a user of the system?
BW: Absolutely. At first I didn't understand a lot of the terminology, and the discussions and explanations. I kept hearing the word lightweight, and I didn't really understand what that meant.
But when I compared the Access replication, which basically takes the entire database and replicates it to another place -- which takes a long time, and then the Internet cuts out and you've corrupted the whole structure -- now instead of that you're sending just pieces of information. If it doesn't work right now, it'll work in a half hour, it just keeps trying, and it's completely lightweight and easy in that sense
And definitely the two-way street. We were still very much developing things, and even if it were perfectly developed there are still changes that have to happen from our side. As Nigel said: staff lists. People enter training records and they have to apply them to names of staff, but we get new staff people all the time. We have to continually update the names from our side so they have an appropriate list to choose from.
NS: One of the things we thought about when we were building the job manager, which is the piece that runs the FeedSync on people's desktops, is that it's an application that just sits there. You build a bunch of jobs, and a job takes a data source and syncs it with another data source.
In the case of Mercy Corps, that means take a table from an Access database and sync it with a feed on a website that acts as a relay. That's a job, and you have one of those for each table in the database. Of course referential integrity is something you can try and manage, and there's some support for that.
The other piece is that we can run a sync to a file. It takes the table in the database and syncs to a feed, in this case an RSS feed, on a file source. If the memory stick is plugged in, and you've got things set up right, it just works. The user doesn't have to worry whether it's being exported to the right place, or about what the file is called. And similarly for the Internet case.
JU: At this point, is the master database what's up on the server at Live Labs?
NS: Well the master source is really the database in Kabul, but yeah, the replicas are being also managed on feedsync.mslivelabs.com, where the plug is that anyone can go and set up a feed and a synchronization endpoint. All the databases sync to that, and then Kabul syncs to that and gets the data back down. And vice versa.
I should point out that FeedSync used to be called SSE [Simple Sharing Extensions], and this started back then. The first users of SSE in anger, if you like, were Mercy Corps in Afghanistan, which was exciting. But we had a lot to learn. Now they're moving to FeedSync. What that means is a slightly different version of the specification, a different service on the website, and a new version of tools I just gave to Faheem a day and a half ago -- he's the technical manager for Barbara's group. This latest version of the tools is the one that I hope we're releasing publicly in a couple of weeks.
JU: Barbara, I'm sure that going forward you'd like to see a better way to do schema evolution, so that the changes to the database structure can be part of this seamless synchronization.
BW: Yes, that would be ideal.
NS: That's a long conversation...
JU: Yes. Here's something else. I know that Mercy Corp works in parts of the world where connectivity is basically SMS more than Internet, or maybe exclusively SMS. That seems like something that FeedSync could be adapted for. Have you thought about that?
BW: Sure, some of our other offices have been using SMS as a way to share bits of information. We haven't found the need to do that yet, because we don't have that many people far out in the field who would need to enter data. In other programs where there visiting sites and schools all over the country, then yes.
NS: We've already built an SMS adapter for this, it's in testing at the moment. And it does exactly what you suggest. Rather than sending the data over the Internet, it breaks it up into SMS packets. There can be a lot of packets, of course, and it can be expensive. But we were talking to a different NGO in Afghanistan, which operates in very insecure areas where they don't have Internet at all. There, they are very interested in doing something similar to what Barbara is doing, but using SMS. First, because in some of these areas the security is so bad they can't even be seen to be carrying data. Second, if it takes six hours to drive somewhere, the cost of a bunch of 1-cent SMS messages is a lot cheaper than the cost of the petrol involved.
We've tried this, and it works quite nicely. I'm excited about that for the future. But to be honest, there are plenty of issues just keeping the Mercy Corps solution running. I wouldn't want to make you believe that this has been a dream installation where everything worked perfectly off the ground. Just the other day we ran into a problem where the feeds wouldn't sync.
BW: Sorry about that!
NS: Hey, no problem. It's now documented and it's part of the FAQ that'll go out when we release the tools.
JU: Barbara, how does Mercy Corps envision making use of the open toolkit which will be one of the outcomes of this project?
BW: There's been a cautious, wait-and-see approach from the beginning. Our IT has been a little like, oh, I don't know, this maybe could be interesting, let's see how it works. But now that people are understanding more, and seeing that this is not really a pilot any more, there's starting to be interest in ways we can be sharing information regionally, across offices, and how can other countries make use of the same technology. It's working its way into our lexicon.
JU: If there weren't the Internet problems you've had in Afghanistan, would there still be reasons to do things this way?
BW: Yes, I think there are other benefits. It's much closer to realtime, for one thing.
JU: The lightweight, near-realtime aspect is appealing even when there's enough bandwidth to do more heavyweight replication?
BW: Yes. And also, for us, going from one Access database to another identical database is one thing. But some of our initial discussions were about sharing across platforms that may not be identical, but use common variables. Across regions or countries, we all need to report certain pieces of information, but we collect it in different ways, and store it in different ways. If there's a system where we can upload it similarly, that would be a huge benefit.
JU: Great point. You could define a neutral common ground for data exchange.
NS: For Mercy Corp, there's a whole pile of options they should consider when doing sync. FeedSync isn't the be-all and end-all, it's got some particular things it seems to be good for, but that last point about interoperability is really important.
From the start, we've been interested in how to use this to link up disparate systems. Sometimes that might be an Excel spreadsheet to a database. But also different organizations or, in Mercy Corps' case, different countries where they work run slightly different database designs.
That's where I think the real strength of the system will lie. We've got a lot of work to do thinking about how to make that better.
One of the things we're really concerned about is that, if Mercy Corps or another group wanted to roll this out, there would be the support to do that. I don't think we've got that perfect yet by any means, but we showed what we're doing to a bunch of other NGOs, and afterward a number of them were interested in taking this kind of tool -- be it FeedSync or some other -- and using it for their programs.
The best example might be where you've got a whole pile of agencies implementing a program. In Afghanistan there's a thing called the National Stability Program, and it's run all over the country. All the reporting happens in a standard format, but every organization has its own way of managing the process. The challenges are to integrate the data, and pass back success and lessons learned. The big NGOs have their own systems, almost all in Access, all with some of the same schemas because the ministry says this is how you will report, but no way to aggregate all that nicely.
JU: This makes good sense. Thanks!
NS: Yeah, thanks Barbara. And let me know how Faheem's getting along. There may be some issues, but I hope everything's OK.
BW: Yeah, it's good, thanks so much.