Upload a new file

Risk Map ProjectVersion Alpha 2


The Polio Risk Map Project allows you to upload case data and run them through our workflow in order to generate a risk map.

The workflow will go through the following steps:

Support files

Below you will find example files for each of the supported countries. Also the hierarchy table is provided for your convenience and includes all the names expected for the country locations.

Afghanistan Example file
    568 districts (75 infected) - 241 cases - 7 years of data - Endemic
Hierarchy file
Chad Example file
    61 districts (48 infected) - 344 cases - 13 years of data - Endemic
Hierarchy file
Democratic Republic of the Congo Example file
    509 districts (98 infected) - 298 cases - 11 years of data - Outbreaks
Hierarchy file
Ethiopia Example file
    78 districts (76 infected) - 3025 cases - 11 years of data - Endemic, then stop
Hierarchy file
Guinea Example file
    37 districts (30 infected) - 180 cases - 13 years of data - Outbreaks
Hierarchy file
Haiti Example file
    41 districts (30 infected) - 4746 cases - 22 years of data - Endemic to extinction
Hierarchy file
India Example file
    659 districts (240 infected) - 5176 cases - 12 years of data - Endemic to extinction
Hierarchy file
Liberia Example file
    15 districts (15 infected) - 1512 cases - 22 years of data - Endemic to Flare-up
Hierarchy file
Nigeria Example file
    774 districts (57 infected) - 1076 cases - 14 years of data - Endemic, very local
Hierarchy file
Pakistan Example file
    163 districts (144 infected) - 1305 cases - 16 years of data - Endemic with Flare-ups
Hierarchy file
Sierra Leone Example file
    14 districts (14 infected) - 1529 cases - 22 years of data - Endemic to Flare-up
Hierarchy file
South Africa Example file
    53 districts (53 infected) - 9400 cases - 27 years of data - Endemic
Hierarchy file
South Sudan Example file
    78 districts (33 infected) - 7717 cases - 22 years of data - Endemic
Hierarchy file
United Republic of Tanzania Example file
    198 districts (168 infected) - 50595 cases - 27 years of data - Endemic
Hierarchy file
Zambia Example file
    74 districts (60 infected) - 4213 cases - 22 years of data - Endemic
Hierarchy file

Input file format

The system is expecting a specific file format for the file that you wish to upload. The requirements are:

For example the following could be an example of correct Nigeria file format:

            PolIS Case ID, Case_Date, admin0,  admin1,  admin2
            NGA10-353,     9/10/2010, NIGERIA, BORNO,   MAIDUGURI
            NGA10-4312,    27/09/2010 NIGERIA, KANO,    DAMBATTA
            NGA11-1372,    29/11/2011 NIGERIA, JIGAWA,  BABURA
            NGA11-1387,    29/10/2011 NIGERIA, JIGAWA,  BIRNIN KUDU
            NGA11-1564,    28/07/2011 NIGERIA, KANO,    DAWAKIN KUDU
            NGA11-1641,    8/6/2011,  NIGERIA, KEBBI,   BIRNIN KEBBI
            NGA11-1787,    29/11/2011 NIGERIA, KATSINA, MANI
            NGA11-1796,    2/10/2011, NIGERIA, KATSINA, MASHI
            NGA11-1733,    27/08/2011 NIGERIA, KANO,    NASSARAWA
            NGA12-6291,    27/03/2012 NIGERIA, KATSINA, BATSARI
            NGA11-3897,    25/08/2011 NIGERIA, JIGAWA,  RINGIM
        

Visualizations

The AUC is a common evaluation metric for binary classification problems. Consider a plot of the true positive rate vs the false positive rate as the threshold value for classifying an item as “True” or “False” is increased from 0 to 1.
If the classifier is very good, the true positive rate will increase quickly and the area under this curve will be close to 1.
If the classifier is no better than random guessing, the true positive rate will increase linearly with the false positive rate and the area under this curve will be around 0.5.


For more details, see: https://en.wikipedia.org/wiki/Receiver_operating_characteristic

Models

Spatial Binomial Model

The probability of at least one case in a district during a 6-month period is modeled as a function of an overall level of risk as well as a set of independent and spatially structured random effects, also known as the convolution model.1 In the first stage of this hierarchical model, we assume the presence or absence of cases in district i and period t (Xit) is distributed Xit~Bern(qit) where qit is the underlying rate of interest. We consider the logit linear model:

logit(qit)=μ+βiXi,t-12Zi, t-1ii

where μ is the overall risk level, βi is the coefficient for at least one case in district i in the previous period t-1, β2 is the coefficient for the indicator Zi, t-1 = I[0<∑i~jXj,t-1] where j~i denotes the districts that have a shared boundary with district i, the binary variable describing if any districts neighboring district i had at least one case in t-1, θi is a spatially structured effect of district i, and ϕi is the independent effect of space.

At the second stage of the hierarchical model, we assign priors to the random effects. The independent effects are assigned the prior ϕiϕ2~N(0,σϕ2) for i=1,…,I. The spatially structured effect is assigned the intrinsic conditional autoregressive prior (ICAR)2 where θi-iθ2∼N(∑j~iθj/miθ2/mi), θ-i is the vector of θs excluding θi, and mi is the number of districts that share a boundary with district i. The summation of the Additional details about the spatially structured priors can be found in Guassian Markov Random Fields3.

This model was fit in R4 using the Integrated Nested Laplace Approximation (INLA)5,6 as implemented in the INLA package.7

Bibliography

  1. Besag J, York J, Mollié A. Bayesian image restoration with two applications in spatial statistics. Ann Inst Stat Math 1991; 43: 1–59.
  2. Besag J. Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc Ser B 1974; 36: 192–236.
  3. Rue H, Held L. Gaussian Markov Random Fields: Theory and Application. Boca Raton: Chapman and Hall/CRC Press, 2005.
  4. R Core Development Team. R: a language and environment for statistical computing, 3.2.1. Doc. Free. available internet http//www.r-project.org. 2016. DOI:10.1017/CBO9781107415324.004.
  5. Rue H., Martino S., Chopin N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B 2009; 71: 319–92.
  6. Lindgren F, Rue H, Linström J. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic differential equation approach (with discussion). J R Stat Soc Ser B 2011; 73: 423–98.
  7. Lindgren F, Rue H. Bayesian spatial modelling with R-INLA. J Stat Softw 2015; 63.

Logistic Regression

The probability of at least one case in a district during a 6-month period is modeled as a function of an overall level of risk as well as the presence of cases in the previous period. The presence of a case in district i and period t(Xit) is distributed Xit~Bern(qit) where qit is the underlying rate of interest. We consider the logit linear model

logit(qit)=μ+β1Xi,t-12Zi,t-1

where μ is the overall risk level, β1 is the coefficient for at least one case in district i in the previous period t-1, and β2 is the coefficient for Zi,t-1= ∑i~jXj,t-1 the total number of districts neighboring district i with at least one case in t-1.

This model was fit in R1 using the Integrated Nested Laplace Approximation (INLA)2,3 as implemented in the INLA package.4

Bibliography

  1. R Core Development Team. R: a language and environment for statistical computing, 3.2.1. Doc. Free. available internet http//www.r-project.org. 2016. DOI:10.1017/CBO9781107415324.004.
  2. Rue H., Martino S., Chopin N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B 2009; 71: 319–92.
  3. Lindgren F, Rue H, Linström J. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic differential equation approach (with discussion). J R Stat Soc Ser B 2011; 73: 423–98.
  4. Lindgren F, Rue H. Bayesian spatial modelling with R-INLA. J Stat Softw 2015; 63.

Random Forest Model

The probability of at least one case in a district during the upcoming 6-month period is modeled using a random forest classifier1. Seven covariates are available to the ensemble: the total case count in the previous time period in the district and in its neighbors, the total and average historical case counts in the district and in its neighbors, and a dummy variable for whether the time period is the first or second half of the year as a proxy for seasonality. The model was fit in R2, using the randomForest package3.

Bibliography

  1. Breiman, Leo (2001). "Random Forests". Machine Learning. 45 (1): 5–32. doi:10.1023/A:1010933404324
  2. R Core Development Team. R: a language and environment for statistical computing, 3.2.1. Doc. Free. available internet: http//www.r-project.org. 2016. DOI:10.1017/CBO9781107415324.004.
  3. Liaw, A. and Wiener, M. Classification and Regression by randomForest. R News 2(3), 18-22, 2002. Doc. Free. available internet: https://cran.r-project.org/web/packages/randomForest/randomForest.pdf.

Software

This software is distributed as is, completely without warranty or service support. Institute for Disease Modeling and its employees are not liable for the condition or performance of the software.

This software is leveraging the following technologies and libraries: