This was an interview I did with New Scientist after my presentation for the ISNTD  Bites conference in London earlier this year.

I wanted to highlight that there is no need to for the so called “Prediction” of Big Data, just a look at the synonyms of predict shows exactly how accurate it is likely to be.

predict
prɪˈdɪkt/
verb
  1. say or estimate that (a specified thing) will happen in the future or will be a consequence of something.
    “it is too early to predict a result”
    synonyms: forecast, foretell, foresee, prophesy, divine, prognosticate, anticipate, see, say, tell in advance, project, speculate, envision, envisage, imagine, picture, estimate, conjecture,guess, hazard a guess;

    archaicaugur, previse, presage, foreshow;
    archaicspae;
    rarevaticinate, auspicate
    “it is difficult to predict what the outcome will be”

I don’t know of a weekly weather report that you would say is 80% accurate, but somehow, by using monthly weather predictions, certain parties claim they can be more than 90% accurate!  Of course, it depends on what it is you are accurately predicting, if you are predicting one or more cases, within 6 months, in a 5km area of a city which has many cases already, then their accuracy would be high. But then you also wouldn’t need any form of big data or analysis. It’s like predicting there will be some warmer weather this summer, warmer than the winter weather, then patting yourself on the back on hot days, and keeping quiet on cold ones, like clairvoyants do.

I digress, this article was about the huge amount of data that Google keep if you have it’s location services enabled, and that you can see for yourself here https://www.google.com/maps/timeline it really is interesting to look at.

However, using this “Small Data” of those people infected with Dengue, Zika and many other diseases, cross referenced with each other, would very quickly build up a map of likely breading grounds for the disease.

fightdengue-paths2

So, rather than basing research on “Big Data Imagination” surely it would be better to base it on the factual data of where the patients have actually been, and therefore the locations where they could have been infected.

The beauty of this approach is that, the data is being collected anyway to help with journey time estimations etc, and so you don’t have to install an app before being infected, you can install the app upon diagnosis, and then anonymously share your past 14 days  or so location data, maybe less if the doctor can see at what point in the incubation period the virus is currently at.

Here’s the article

https://www.newscientist.com/article/2082645-dear-google-please-help-us-use-our-data-to-beat-dengue-and-zika/