How does Resbita predict the chances of getting food poisoning for every restaurant?
This post will explain how Resbita predicts the chances of getting food poisoning at any restaurant, where actual application may be found at Resbita.
- How I Do It
By using AI (including machine learning and other techniques) and the great amount of information on the internet from sources such as social media, I developed a system to automatically detect venues that pose a public hazard.
In 2013 the CDC (Center for Disease Control and Prevention) estimated that almost 17% of people in the US are effected by foodborne diseases every year. That's 47.8 million Americans, 128,000 of which were hospitilized, including 3,000 cases which resulted in death.
If that wasn't bad enough, those numbers aren't even comprehensive. There are many cases of food poisoning that go unreported, including symptoms such as nausea, vomiting, diarrhea, and others in which medical treatment was not sought out, or was misdiagnosed as acute gastroenteritis.
There is a lot of information on social networks and other websites about bad experiences at restaurants that poison their guests, but much of it is scattered - left in the public posts and other communications about bad experiences. Resbita compiles all of this data into one easy to use resource that reliably predicts where food poisoning is likely or unlikely to occur.
The way of building epidemic prevention warning system is to apply machine learning to big data and to develop a system that automatically detects venues likely to pose a public health hazard.
Achrekar, H., Gandhe, A., Lazarus, R., Yu, S., and Liu, B. 2012. Twitter improves seasonal inﬂuenza prediction. Fifth Annual International Conference on Health Informatics.
Anderson, R., and May, R. 1979. Population biology of infectious diseases: Part I. Nature 280(5721):361.
Attenberg, J., and Provost, F. 2010. Why label when you can search?: Alternatives to active learning for applying human resources to build classiﬁcation models under extreme class imbalance. In SIGKDD, 423–432. ACM.
Brennan, S., Sadilek, A., and Kautz, H. 2013. Towards understanding global spread of disease from everyday interpersonal interactions. In Twenty-Third International Conference on Artiﬁcial Intelligence (IJCAI).
Broniatowski, D. A., and Dredze, M. 2013. National and local inﬂuenza surveillance through twitter: An analysis of the 2012-2013 inﬂuenza epidemic. PLoS ONE 8(12).
Brownstein, J. S., Freifeld, B. S., and Madoff, L. C. 2009. Digital disease detection - harnessing the web for public health surveillance. N Engl J Med 260(21):2153–2157.
Brownstein, J., Wolfe, C., and Mandl, K. 2006. Empirical evidence for the effect of airline travel on interregional inﬂuenza
CDC. 2013. Surveillance for foodborne disease outbreaks united states, 2013: Annual report. Technical report, Centers for Disease Control and Prevention National Center for Emerging and Zoonotic Infectious Diseases.
Chawla, N., Japkowicz, N., and Kotcz, A. 2004. Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter 6(1):1–6.
Chen, P., David, M., and Kempe, D. 2010. Better vaccination strategies for better people. In Proceedings of the 11th ACM conference on Electronic commerce, 179–188. ACM.
Cortes, C., and Vapnik, V. 1995. Support-vector networks. Machine learning 20(3):273–297.
Culotta, A. 2010. Towards detecting inﬂuenza epidemics by analyzing Twitter messages. In Proceedings of the First Workshop on Social Media Analytics, 115–122. ACM.
De Choudhury, M., Gamon, M., Counts, S., and Horvitz, E. 2013. Predicting depression via social media. AAAI Conference on Weblogs and Social Media.
Eubank, S., Guclu, H., Anil Kumar, V., Marathe, M., Srinivasan, A., Toroczkai, Z., and Wang, N. 2004. Modelling disease outbreaks in realistic urban social networks. Nature 429(6988):180–184.
FDA. 2012. Bad Bug Book. U.S. Food and Drug Administration, 2nd edition.
Ginsberg, J., Mohebbi, M., Patel, R., Brammer, L., Smolinski, M., and Brilliant, L. 2008. Detecting inﬂuenza epidemics using search engine query data. Nature 457(7232):1012–1014.
Golder, S., and Macy, M. 2011. Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science 333(6051):1878–1881.
Grassly, N., Fraser, C., and Garnett, G. 2005. Host immunity and synchronized epidemics of syphilis across the united states. Nature 433(7024):417–421.
Grenfell, B., Bjornstad, O., and Kappey, J. 2001. Travelling waves and spatial hierarchies in measles epidemics. Nature 414(6865):716–723.
Harrison, C., Jorder, M., Stern, H., Stavinsky, F., Reddy, V., Hanson, H., Waechter, H., Lowe, L., Gravano, L., and Balter, S. 2014. Using a restaurant review website to identify unreported complaints of foodborne illness. Morb Mortal Wkly Rep 63(20):441–445.
Heymann, D. L. 2004. Control of communicable diseases manual: an ofﬁcial report of the American Public Health Association. American Public Health Association, 18th edition.
J Glenn Morris, J., and Potter, M. 2013. Foodborne Infections and Intoxications. Food Science and Technology. Elsevier Science.
Japkowicz, N., et al. 2000. Learning from imbalanced data sets: a comparison of various strategies. In AAAI workshop on learning from imbalanced data sets, volume 68.
Joachims,T. 2005. A support vector method for multivariate performance measures. In ICML 2005, 377–384. ACM.
Joachims, T. 2006. Training linear svms in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 217–226. ACM.
Landis, J. R., and Koch, G. G. 1977. The measurement of observer agreement for categorical data. biometrics 159– 174.
Lane, N. D., Miluzzo, E., Lu, H., Peebles, D., Choudhury, T., and Campbell, A. T. 2010. A survey of mobile phone sensing. Communications Magazine, IEEE 48(9):140–150.
Mann, H., and Whitney, D. 1947. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18:50–60.
Mason, W., and Suri, S. 2012. Conducting behavioral research on amazons mechanical turk. Behavior research methods 44(1):1–23.
Newman, M.2002. Spread of epidemic disease on networks. Physical Review E 66(1):016128.
Paul, M., and Dredze, M. 2011a. A model for mining public health topics from Twitter. Technical Report. Johns Hopkins University. 2011.
Paul, M., and Dredze, M. 2011b. You are what you tweet: Analyzing Twitter for public health. In Fifth International AAAI Conference on Weblogs and Social Media.
Sadilek, A., and Kautz, H. 2013. Modeling the impact of lifestyle on health at scale. In Sixth ACM International Conference on Web Search and Data Mining.
Sadilek, A., Brennan, S., Kautz, H., and Silenzio, V. 2013. nEmesis: Which restaurants should you avoid today? In AAAI Conference on Human Computation and Crowdsourcing.
Sadilek, A., Kautz, H., and Silenzio, V. 2012. Predicting disease transmission from geo-tagged micro-blog data. In Twenty-Sixth AAAI Conference on Artiﬁcial Intelligence.
Sadilek, A., Kautz, H., DiPrete, L., Labus, B., Portman, E., Teitel, J., and Silenzio, V. 2016. nEmesis: Preventing Foodborne Illness by Data Mining Social Media. In Twenty-Eighth Annual Conference on Innovative Applications of Artificial Intelligence.
Scharff, R. L. 2012. Economic burden from health losses due to foodborne illness in the United States. Journal of food protection 75(1):123–131.
Sculley, D., Otey, M., Pohl, M., Spitznagel, B., Hainsworth, J., and Yunkai, Z. 2011. Detecting adversarial advertisements in the wild. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM.
Shalev-Shwartz, S., Singer, Y., and Srebro, N. 2007. Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of the 24th international conference on Machine learning, 807–814. ACM.
Snow, J. 1855. On the mode of communication of cholera. John Churchill.
Ugander, J., Backstrom, L., Marlow, C., and Kleinberg, J. 2012. Structural diversity in social contagion. Proceedings of the National Academy of Sciences 109(16):5962–5966.
White, R., and Horvitz, E. 2008. Cyberchondria: Studies of the escalation of medical concerns in web search. Technical Report MSR-TR-2008-177, Microsoft Research. Appearing in ACM Transactions on Information Systems, 27(4), Article 23, November 2009, DOI 101145/1629096.1629101.