This document is intended to give you a brief introduction to the basic process of pulling data from NSSP ESSENCE using APIs. It doesn’t cover everything you might ever need to know, but it does cover a lot of the most important bits. If you have additional questions, please feel free to reach out to the RHINO team at our shared mailbox, RHINO@doh.wa.gov.
Because RHINO data is so identifiable and contains many, many, many sensitive conditions, the data are carefully protected. Just as you have to log into ESSENCE when you’re using the interface directly, API calls will require you to verify your identity.
The first way you can choose to set your credentials is to just enter your username in the code and then enter your password in the popup window.
key_set(service = "essence", username = "amorse01")
The second option is to set your credentials in the Windows Credential Manager. I like this method because I’m lazy and hate entering my password every time I run a script.
This process can be a bit tricky and you need to remember to update your password every 90 days when you update it in the AMC or your scripts will throw errors. To do this, I just open each of the listed credentials and click “edit” to update my password. You shouldn’t need to create completely new credentials.
Because I use both ESSENCE and ESSENCE 2, I have credentials set for them both in my credential manager. To set for “normal ESSENCE” just alter Step 2 to be :essence: instead of :essence2:
Open Windows Credential Manager and click the windows credentials tab (not the web credentials tab)
Internet or network address (use your own username!) –> :essence2:amorse01
Password –> your password exactly as you entered it in the AMC
Persistence –> Enterprise
Now that you have your credentials set, you can start pulling data! The basic structure for an API call looks like this
url <- "https://essence2.syndromicsurveillance.org/nssp_essence/api/tableBuilder/csv?endDate=31Jul2021&percentParam=ccddCategory&datasource=va_hosp&startDate=1Jan2020&medicalGroupingSystem=essencesyndromes&userId=611&site=934&hospFacilityType=emergency%20care&aqtTarget=TableBuilder&ccddCategory=all%20traffic%20related%20v2&geographySystem=hospital&detector=nodetectordetector&timeResolution=monthly&hasBeenE=1&rowFields=sex&rowFields=age&rowFields=hasBeenI&columnField=site"
api_response <- GET(url, authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_response_csv <- content(api_response, by= "csv/text")
visit_table <- read_csv(api_response_csv)
##
## -- Column specification --------------------------------------------------------
## cols(
## sex = col_character(),
## age = col_character(),
## hasBeenI = col_character(),
## Washington = col_double(),
## `Washington Data Count` = col_double(),
## `Washington All Count` = col_double()
## )
visit_table <- visit_table %>%
rename(visit_count = `Washington Data Count`,
percent = Washington,
all_count = `Washington All Count`) %>%
add_column(rate = NA) %>%
mutate(rate = (visit_count/all_count) * 100000)
There may be times when you want to pull down time series graphs directly from ESSENCE rather than creating your own using a package like ggplot. The process is very similar to pulling a csv.
url <- "https://essence2.syndromicsurveillance.org/nssp_essence/api/timeSeries/graph?endDate=24Jul2021&percentParam=ccddCategory&datasource=va_hosp&startDate=3Jan2021&medicalGroupingSystem=essencesyndromes&userId=611&hospFacilityType=emergency%20care&aqtTarget=TimeSeries&ccddCategory=all%20traffic%20related%20v2&geographySystem=hospital&detector=nodetectordetector&timeResolution=weekly&hasBeenE=1&graphTitle=Sample%20Time%20Series%20of%20the%20Weekly%20Percentage%20of%20Visits%20for%20a%20Condition&xAxisLabel=CDC%20Week&yAxisLabel=Percentage&yAxisLabelRight=&yAxisAutoScale=true&yAxisMin=0.00&yAxisMax=2.69"
api_response <- GET(url, authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])),
write_disk("sample.png", overwrite=TRUE))
##Include image
knitr::include_graphics('sample.png')
If you would like to be able to run your script regularly with new dates (e.g., a weekly/monthly/quarterly report), you can create “objects” of your desired start and end points. You’ll probably want these at the top of your script and in your setup chunk (if you’re using Markdown).
R will apply these objects to your query within the command gsub() and then your queries will be altered to meet your specifications.
If you would like a rolling 90 day period, you can use code like what’s below to set your end date and the current date (Sys.Date()) and the starting date as 90 days prior to the current date (Sys.Date() - 90).
# your query's end date --> the current system date
end_date <- format(Sys.Date(), "%d%b%Y")
# your query's start date --> 90 days prior to the current system date
start_date <- format((Sys.Date() - 90), "%d%b%Y")
# API URL
url <- "https://essence2.syndromicsurveillance.org/nssp_essence/api/tableBuilder/csv?endDate=24Jul2021&percentParam=ccddCategory&datasource=va_hosp&startDate=3Jan2021&medicalGroupingSystem=essencesyndromes&userId=611&hospFacilityType=emergency%20care&aqtTarget=TableBuilder&ccddCategory=all%20traffic%20related%20v2&geographySystem=hospital&detector=nodetectordetector&timeResolution=weekly&hasBeenE=1&rowFields=age&columnField=hasBeenE"
# Update original URL with specified dates
url <- gsub(pattern="endDate=[0-9]{1,2}[A-Z][a-z][a-z][0-9]{2,4}", replace=paste0('endDate=', end_date), x=url)
url <- gsub(pattern="startDate=[0-9]{1,2}[A-Z][a-z][a-z][0-9]{2,4}", replace=paste0('startDate=', start_date), x=url)
# Run API call
api_response <- GET(url,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_data <- content(api_response, by = "csv/text") %>%
read_csv()
##
## -- Column specification --------------------------------------------------------
## cols(
## age = col_character(),
## Yes = col_double(),
## `Yes Data Count` = col_double(),
## `Yes All Count` = col_double()
## )
This method uses a similar structure to the rolling 90 day window, but with both the start and end point specified. This is great for tasks like running pulls for very long timeframes for which you just want to tell it the exact timeframe.
# start date of 2020 timeseries
s_start_date <- "21Jul2019"
# end date (Saturday of the ending week)
s_end_date <- "24Jul2021"
# API URL
url <- "https://essence2.syndromicsurveillance.org/nssp_essence/api/tableBuilder/csv?endDate=24Jul2021&percentParam=ccddCategory&datasource=va_hosp&startDate=3Jan2021&medicalGroupingSystem=essencesyndromes&userId=611&hospFacilityType=emergency%20care&aqtTarget=TableBuilder&ccddCategory=all%20traffic%20related%20v2&geographySystem=hospital&detector=nodetectordetector&timeResolution=weekly&hasBeenE=1&rowFields=age&columnField=hasBeenE"
# Update original URL with specified dates
url <- gsub(pattern="endDate=[0-9]{1,2}[A-Z][a-z][a-z][0-9]{2,4}", replace=paste0('endDate=', end_date), x=url)
url <- gsub(pattern="startDate=[0-9]{1,2}[A-Z][a-z][a-z][0-9]{2,4}", replace=paste0('startDate=', start_date), x=url)
# Run API call
api_response <- GET(url,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_data <- content(api_response, by = "csv/text") %>%
read_csv()
##
## -- Column specification --------------------------------------------------------
## cols(
## age = col_character(),
## Yes = col_double(),
## `Yes Data Count` = col_double(),
## `Yes All Count` = col_double()
## )
This method comes from our friend Sara Chronister in Arizona and requires you to pull down information from her GitHub. She’s amazing and we love her forever.
# GET FUNCTIONS FROM GITHUB
# Call in set-dates function
#source("https://raw.githubusercontent.com/sara-chronister/syndromic-surveillance/master/API/Set-Dates")
# Call in get csv function
#source("https://raw.githubusercontent.com/sara-chronister/syndromic-surveillance/master/API/Get-CSV")
# Call in looping date function from Github repo
#source("https://raw.githubusercontent.com/sara-chronister/syndromic-surveillance/master/API/Loop-Successive-Time-Periods")
#lstartdate <- format(Sys.Date()-90, "%Y-%m-%d")
#lenddate <- format(Sys.Date(), "%Y-%m-%d")
#vis <- return_longterm_query(url2, loop_start = lstartdate, loop_end = lenddate, by = 30) #RUN QUERY IN 30 DAY CHUNKS
There are times when you might run the same report or product for a single county. In these case, you can specify the “region” in ESSENCE you’d like to include in your pull.
Important Notes:
Your county names must be lowercase or ESSENCE will get angry.
If your county has a space in the name, you will need to include a %20 in the space. For example San Juan County becomes san%20juan.
# specify your single county
LHJ_selection <- "snohomish"
# API URL
url <- "https://essence2.syndromicsurveillance.org/nssp_essence/api/tableBuilder/csv?endDate=24Jul2021&percentParam=ccddCategory&datasource=va_hosp&startDate=3Jan2021&medicalGroupingSystem=essencesyndromes&userId=611&hospFacilityType=emergency%20care&geography=wa_adams&aqtTarget=TableBuilder&ccddCategory=all%20traffic%20related%20v2&geographySystem=hospital&detector=nodetectordetector&timeResolution=weekly&hasBeenE=1&rowFields=age&columnField=hasBeenE"
## Include the desired dates and LHJ set above
url <- gsub(pattern= "geography=wa_adams", replace=paste0('geography=wa_', LHJ_selection), x=url)
# Run API call
api_response <- GET(url,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_data <- content(api_response, by = "csv/text") %>%
read_csv()
##
## -- Column specification --------------------------------------------------------
## cols(
## age = col_character(),
## Yes = col_double(),
## `Yes Data Count` = col_double(),
## `Yes All Count` = col_double()
## )
There may be other times when you want to specify a group of counties. We use this most frequently for local health jurisdictions that include multiple counties and accountable communities of health.
Important Notes:
Your county names must be lowercase or ESSENCE will get angry.
If your county has a space in the name, you will need to include a %20 in the space. For example San Juan County becomes san%20juan.
# specify your group of counties
LHJ_selection <- "island&geography=wa_san%20juan&geography=wa_skagit&geography=wa_snohomish&geography=wa_whatcom"
# API URL
url <- "https://essence2.syndromicsurveillance.org/nssp_essence/api/tableBuilder/csv?endDate=24Jul2021&percentParam=ccddCategory&datasource=va_hosp&startDate=3Jan2021&medicalGroupingSystem=essencesyndromes&userId=611&hospFacilityType=emergency%20care&aqtTarget=TableBuilder&ccddCategory=all%20traffic%20related%20v2&geography=wa_adams&geographySystem=hospital&detector=nodetectordetector&timeResolution=weekly&hasBeenE=1&rowFields=age&columnField=hasBeenE"
## Include the desired dates and LHJ set above
url <- gsub(pattern= "geography=wa_adams", replace=paste0('geography=wa_', LHJ_selection), x=url)
# Run API call
api_response <- GET(url,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_data <- content(api_response, by = "csv/text") %>%
read_csv()
##
## -- Column specification --------------------------------------------------------
## cols(
## age = col_character(),
## Yes = col_double(),
## `Yes Data Count` = col_double(),
## `Yes All Count` = col_double()
## )
When you are pulling data from the data details output, there will likely be times when you don’t want or need every field that ESSENCE will want to send you. In these cases, you can specify just the fields that you want and pull only those.
# Specify the data fields you want to pull
fieldlist <- list("HospitalName","Date", "Time","CCDDCategory_flat","C_BioSense_ID","HospitalZip","ZipCode","Sex", "HospitalRegion","Age", "AgeGroup", "Region", "Race_flat", "Ethnicity_flat", "C_Death","SubCategory_flat", "Category_flat", "ChiefComplaintUpdates", "Diagnosis_Combo", "DischargeDiagnosis", "TriageNotesOrig", "ClinicalImpression", "Admit_Reason_Combo", "DischargeDisposition", "DispositionCategory", "Discharge_Date_Time", "MinutesFromVisitToDischarge")
# Add &field= before each field
field <- paste(paste0("&field=",fieldlist), collapse="")
# API URL
url <- "https://essence2.syndromicsurveillance.org/nssp_essence/api/dataDetails/csv?endDate=24Jul2021&percentParam=ccddCategory&datasource=va_hosp&startDate=3Jan2021&medicalGroupingSystem=essencesyndromes&userId=611&hospFacilityType=emergency%20care&aqtTarget=TableBuilder&ccddCategory=all%20traffic%20related%20v2&geographySystem=hospital&detector=nodetectordetector&timeResolution=weekly&hasBeenE=1&rowFields=age&columnField=hasBeenE"
# Paste URL and field list together
url2 <- paste0(url, field)
# Run API call
api_response <- GET(url2,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_data <- content(api_response, by = "csv/text") %>%
read_csv()
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_character(),
## Time = col_time(format = ""),
## HospitalZip = col_double(),
## Age = col_double()
## )
## i Use `spec()` for the full column specifications.
This method is also taken from our friend Sara Chronister in Arizona and calls to her GitHub. This type of call may be particularly useful if you are, for example, pulling hospitalization records from WDRS and need to cross-reference ESSENCE.
# GET FUNCTIONS FROM GITHUB
# CALL DOWN SPECIFIC ESSENCE IDS
#library(glue)
#source("https://raw.githubusercontent.com/PHSKC-APDE/DOHdata/master/essence/essence_query_functions.R")
# PULL IN DATA FOR LINKED RECORDS FROM ESSENCE
#essenceresults <- bind_rows(event_query(event_id = api_data$C_BioSense_ID, bulk = T, group_size = 1000))