In this RMarkdown document, we will provide an overview of the ESSENCE APIs and how to access them through Rstudio. We will begin with a very brief explanation of application programming interfaces (APIs), then list what APIs are available in ESSENCE, and finally expose you to basic examples of R code and packages so that you can start using RStudio to access ESSENCE data, create your own R Markdown reports and Shiny applications, or do exploratory analyses not possible within ESSENCE.
An application programming interface, or API, is a structured and consistent way for one machine to exchange information with another machine. ESSENCE has APIs that allow you to programmatically access and further manipulate your data from outside the system. You may select the CSV format to export data, and in some instances JSON is also supported. More information about the APIs can be found in ESSENCE under "More", then "User Guide", and then API Documentation. You may write API URL syntax on your own after reading the documentation, or you can let ESSENCE create the API URL by clicking the "API URL" button on an ESSENCE page after completing a query.
At the time of this writing, ESSENCE offered seven APIs:
Time series data table,
Time series png image,
Table builder results,
Data details (line level),
Summary stats on the number of unique facilities or regions in your query results,
Alert list detection table,
Time series data table with stratified, historical alerts (New - from ESSENCE2)
The first step is to create an R script or R Markdown file and load the necessary packages. By default, some packages might already be in your system library. But if not, you can install them yourself by clicking on the packages tab in the bottom-right quadrant of the RStudio interface and selecting the Install tab. Here are the package names and configuration statement you will need to get started:
library(tidyverse)
library(httr)
library(jsonlite)
library(keyring)
# key_set(service = "essence", username = "msheppardoa01")
library(Rnssp)
To extract data from ESSENCE, you start by passing authentication information from your RStudio session to ESSENCE so that it knows what data you are allowed to see. It is bad practice to explicitly include your username and password in your code. In order to provide flexibility to the user we provide two examples using either the Rnssp
or keyring
package to do this work for us. Whenever possible, we prefer users adopt the more secure methodology presented in Rnssp
(option 1 below).
Rnssp
As of June 2021, the Rnssp
library is installed system wide on the instance of RStudio Server Pro hosted on the BioSense Platform. If you are using a local instance of RStudio, you can currently install the development version of Rnssp
from GitHub by running devtools::install_github("cdcgov/Rnssp")
in the console. The Rnssp
GitHub repository can be accessed at https://github.com/CDCgov/Rnssp, with additional documentation and vignettes located at https://cdcgov.github.io/Rnssp/. Rnssp
provides functionality to securely save AMC credentials and interact with ESSENCE APIs. When you run the following code, a pop-up will appear in RStudio where you will need to enter your AMC username and password. This will create a user profile object of the class Credentials
, designed with the R6 object system which integrates classical object oriented programming concepts into R. For the purpose of knitting an RMarkdown document, you will need to save the myProfile
object as an .rds
file to your home directory. Note that this only needs to be done once. The following code chunk (save credentials) is presented for demonstrative purposes only and does not need to be included in your actual RMarkdown code.
library(Rnssp)
myProfile <- Credentials$new(
username = askme("Enter your username: "),
password = askme()
)
save(myProfile, file = "~/myProfile.rds")
myProfile.rds
can then be loaded by including the following in your introductory code chunks
load("~/myProfile.rds")
Note that your username and password are fully encrypted in your user profile and are not visible when viewing or inspecting
myProfile
## <NSSPCredentials>
## Public:
## clone: function (deep = FALSE)
## get_api_data: function (url, fromCSV = FALSE)
## get_api_response: function (url)
## get_api_tsgraph: function (url)
## initialize: function (username, password)
## Private:
## ..__: NSSPContainer, R6
## ..password: NSSPContainer, R6
## ..username: NSSPContainer, R6
The myProfile
object comes with the following methods:
$get_api_response()
: Retrieves requested information specified in the API URL from ESSENCE$get_api_data()
: Extracts the content (data) from the API response and parses into an R data frame$get_api_tsgraph()
: Retrieves an ESSENCE timeseries graph and saves as a PNG to a temporary directorykeyring
If you are using a local instance of RStudio, we suggest that you use the keyring
library to save your AMC credentials. keyring
will save your credentials to hidden, background environment variables that will persist for the duration of your R session. When you run the following line of code (with your username entered in the username quotes) a pop-up will appear in RStudio where you will need to enter your password. Note that you only need to save your credentials once per session. Once you enter your password in the pop-up, be sure to “comment out” this line of code by adding a hash mark before it, like you see in the code chunk loading libraries above.
key_set(service = "essence", username = "msheppardoa01")
In each example that follows, you will see a common pattern emerge. First, define the URL as an object in your RStudio session; and then retrieve the API response and extract the content to an R data frame using either the Rnssp
$get_api_data()
method or GET
from httr
for further analysis. By using this approach, R Markdown will give you an easily reproducible workflow where you integrate report text with code that reads in data, manipulates data as needed, and produces analyses and visualizations in such a way that can be handed off to colleagues without having to document manual actions (point/click, etc).
In this example, we will show you how to pull a time series data table from ESSENCE into RStudio. (Note. All examples use the limited details data sources available in NSSP - ESSENCE.) In the code below, the first object created is the URL for the ESSENCE API endpoint of interest, which for this example is a national trend of the injury syndrome. The second object, API_response, gets data from the ESSENCE URL and then passes your credentials so that ESSENCE knows you are permitted to access these data. The next two objects are processing JSON-formatted data into an R data frame. The glimpse function (from dplyr
) provides a quick sense of every variable in the data frame:
url <- "https://essence.syndromicsurveillance.org/nssp_essence/api/timeSeries?endDate=9Feb2021&medicalGrouping=injury&percentParam=noPercent&geographySystem=hospitaldhhsregion&datasource=va_hospdreg&detector=probrepswitch&startDate=11Nov2020&timeResolution=daily&medicalGroupingSystem=essencesyndromes&userId=455&aqtTarget=TimeSeries"
# Rnssp option
api_response <- myProfile$get_api_response(url)
api_response_json <- content(api_response, as = "text")
api_data <- fromJSON(api_response_json) %>%
extract2("timeSeriesData")
# keyring option
api_response <- GET(url,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_response_json <- content(api_response, as = "text")
api_data <- fromJSON(api_response_json) %>%
extract2("timeSeriesData")
glimpse(api_data)
## Rows: 91
## Columns: 8
## $ date <chr> "2020-11-11", "2020-11-12", "2020-11-13", "2020-11-14", "202…
## $ count <dbl> 40173, 38805, 38920, 37441, 35059, 40392, 37783, 37295, 3732…
## $ expected <chr> "40693.321", "40691.679", "40675.857", "40601.071", "40525.7…
## $ levels <chr> "0.226", "0.731", "0.912", "0.991", "1", "0.615", "0.952", "…
## $ colorID <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ color <chr> "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blu…
## $ altText <chr> "Data: Date: 11Nov20, Level: 0.226, Count: 40173, Expected: …
## $ details <chr> "/nssp_essence/api/dataDetails?medicalGrouping=injury&percen…
Alternatively, with Rnssp
we can pull this data with 2 lines of code by using the $get_api_data()
method, which implicitly pulls and extracts the data by using the steps outlined in the example above. There may be scenarios in which it is beneficial to retrieve the API response first, such as when the API response status code needs to be inspected if the data are not pulling as expected.
api_data <- myProfile$get_api_data(url) %>%
extract2("timeSeriesData")
This example shows how to retrieve the ESSENCE graph itself instead of the underlying data for the graph, as shown previously. Here, we use the same national injury syndrome trend as before, but notice the URL now includes .."api/timeSeries/graph?…". You can add a title or axis labels by adding other parameters to the URL: "&graphTitle=Injury%20Syndrome&xAxisLabel=Date&yAxisLabel=Count".
url <- "https://essence.syndromicsurveillance.org/nssp_essence/api/timeSeries/graph?endDate=9Feb2021&medicalGrouping=injury&percentParam=noPercent&geographySystem=hospitaldhhsregion&datasource=va_hospdreg&detector=probrepswitch&startDate=11Nov2020&timeResolution=daily&medicalGroupingSystem=essencesyndromes&userId=455&aqtTarget=TimeSeries&graphTitle=National%20-%20Injury%20Syndrome%20Daily%20Counts&xAxisLabel=Date&yAxisLabel=Count"
# Rnssp option
api_png <- myProfile$get_api_tsgraph(url)
knitr::include_graphics(api_png$tsgraph)
# keyring option
api_response <- GET(url, authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])),
write_disk("timeseries1.png", overwrite=TRUE))
knitr::include_graphics('timeseries1.png')
The table builder API is well suited for having ESSENCE do the heavy lifting of summarizing your query and presenting results in tabular format where you can define rows, nested rows, and column variables for output. So long as the query is supported in the ESSENCE query manager, you should be able to use table builder to summarize the output, which is usually more efficient than manipulating large amounts of line-level data yourself. In this example, we use the CDC Opioid Overdose v3 CCDD Category and create a table of counts per month by U.S. Department of Health & Human Services (HHS) region. Visits are limited to emergency department visits by selecting Has been Emergency = “Yes”. Output formats for table builder results are CSV and JSON. The CSV option will pull in data that matches the tabular format seen in the ESSENCE interface, while the JSON option will pull in data that is transformed to a long, pivoted format. The later is recommended for circumventing initial data transformations for conversion to a long format that is compatible with functions/libraries based on tidyverse
principles (i.e. ggplot2
).
url <- "https://essence.syndromicsurveillance.org/nssp_essence/api/tableBuilder/csv?endDate=31Dec2020&ccddCategory=cdc%20opioid%20overdose%20v3&percentParam=noPercent&geographySystem=hospitaldhhsregion&datasource=va_hospdreg&detector=nodetectordetector&startDate=1Oct2020&ageNCHS=11-14&ageNCHS=15-24&ageNCHS=25-34&ageNCHS=35-44&ageNCHS=45-54&ageNCHS=55-64&ageNCHS=65-74&ageNCHS=75-84&ageNCHS=85-1000&ageNCHS=unknown&timeResolution=monthly&hasBeenE=1&medicalGroupingSystem=essencesyndromes&userId=455&aqtTarget=TableBuilder&rowFields=timeResolution&rowFields=geographyhospitaldhhsregion&columnField=ageNCHS"
# Rnssp option
api_data <- myProfile$get_api_data(url, fromCSV = TRUE)
# keyring option
api_response <- GET(url,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_response_csv <- content(api_response, by = "text/csv")
api_data <- read_csv(api_response_csv)
glimpse(api_data)
## Rows: 33
## Columns: 12
## $ timeResolution <chr> "2020-10", "2020-10", "2020-10", "2020-10…
## $ geographyhospitaldhhsregion <chr> "OTHER_REGION", "Region 1", "Region 10", …
## $ `11-14` <dbl> 0, 1, 2, 0, 3, 6, 7, 1, 1, 2, 5, 0, 0, 4,…
## $ `15-24` <dbl> 0, 129, 118, 133, 246, 602, 347, 158, 46,…
## $ `25-34` <dbl> 0, 478, 277, 426, 803, 1921, 1153, 326, 8…
## $ `35-44` <dbl> 0, 408, 163, 316, 660, 1567, 822, 273, 46…
## $ `45-54` <dbl> 0, 327, 125, 287, 491, 909, 669, 150, 33,…
## $ `55-64` <dbl> 0, 261, 167, 264, 464, 895, 679, 149, 36,…
## $ `65-74` <dbl> 0, 70, 145, 87, 193, 441, 294, 66, 15, 55…
## $ `75-84` <dbl> 0, 25, 45, 13, 39, 211, 52, 32, 11, 24, 4…
## $ `85+` <dbl> 0, 39, 18, 11, 51, 82, 35, 9, 4, 6, 14, 0…
## $ Unknown <dbl> 0, 4, 10, 6, 6, 37, 19, 4, 3, 1, 3, 0, 8,…
The following example demonstrates the necessary data transformations to achieve the long format that is output from the JSON option by default.
api_data_long <- api_data %>%
pivot_longer(cols = -c(timeResolution, geographyhospitaldhhsregion), names_to = "ageNCHS", values_to = "count")
api_data_long
## # A tibble: 330 x 4
## timeResolution geographyhospitaldhhsregion ageNCHS count
## <chr> <chr> <chr> <dbl>
## 1 2020-10 OTHER_REGION 11-14 0
## 2 2020-10 OTHER_REGION 15-24 0
## 3 2020-10 OTHER_REGION 25-34 0
## 4 2020-10 OTHER_REGION 35-44 0
## 5 2020-10 OTHER_REGION 45-54 0
## 6 2020-10 OTHER_REGION 55-64 0
## 7 2020-10 OTHER_REGION 65-74 0
## 8 2020-10 OTHER_REGION 75-84 0
## 9 2020-10 OTHER_REGION 85+ 0
## 10 2020-10 OTHER_REGION Unknown 0
## # … with 320 more rows
url <- "https://essence2.syndromicsurveillance.org/nssp_essence/api/tableBuilder?endDate=31Dec2020&ccddCategory=cdc%20opioid%20overdose%20v3&percentParam=noPercent&geographySystem=hospitaldhhsregion&datasource=va_hospdreg&detector=nodetectordetector&startDate=1Oct2020&ageNCHS=11-14&ageNCHS=15-24&ageNCHS=25-34&ageNCHS=35-44&ageNCHS=45-54&ageNCHS=55-64&ageNCHS=65-74&ageNCHS=75-84&ageNCHS=85-1000&ageNCHS=unknown&timeResolution=monthly&hasBeenE=1&medicalGroupingSystem=essencesyndromes&userId=2362&aqtTarget=TableBuilder&rowFields=timeResolution&rowFields=geographyhospitaldhhsregion&columnField=ageNCHS"
# Rnssp option
api_data <- myProfile$get_api_data(url)
# keyring option
api_response <- GET(url,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_response_json <- content(api_response, as = "text")
api_data <- fromJSON(api_response_json)
glimpse(api_data)
## Rows: 330
## Columns: 4
## $ timeResolution <chr> "2020-10", "2020-10", "2020-10", "2020-10…
## $ geographyhospitaldhhsregion <chr> "OTHER_REGION", "OTHER_REGION", "OTHER_RE…
## $ ageNCHS <chr> "11-14", "15-24", "25-34", "35-44", "45-5…
## $ count <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 129, 478…
You may have noticed that the table builder in the ESSENCE user interface is limited to creating tables of up to 30,000 cells. This is pretty large and should suffice in most cases. However, it may be useful to know that the table builder API does not impose a limit on the output table size. You will need to create the API instead of having ESSENCE create it for you using the "API URLs" button. To create your own API, start by familiarizing yourself with the structure of the API, and then add parameters that follow the structure. To help with this, look at some examples where ESSENCE has created API URLs for you using the available buttons.
Sometimes you just need line-level data, and this example describes how to extract those data from ESSENCE. Obviously, this can create a very large data set quickly. As with any query with potential to create large data sets, first test the query on a small amount of data. Consider multiple calls of smaller time ranges then combine the separate data frames to create a final data set. This API gives you options. You can specify the variables to include and whether you want a data set with raw or reference values. Downloads with reference values take longer to stream into RStudio because ESSENCE has to create those reference values. You can reduce the file size by specifying only the variables you need and by adding additional parameters into the URL, like, for example, "&field=age&field=ChiefComplaintParsed." Output formats for data details are CSV and JSON.
url <- "https://essence.syndromicsurveillance.org/nssp_essence/api/dataDetails/csv?medicalGrouping=injury&geography=region%20i&percentParam=noPercent&geographySystem=hospitaldhhsregion&datasource=va_hospdreg&detector=probrepswitch&timeResolution=daily&medicalGroupingSystem=essencesyndromes&userId=455&aqtTarget=TimeSeries&startDate=31Jan2021&endDate=31Jan2021"
# Rnssp option
api_data <- myProfile$get_api_data(url, fromCSV = TRUE)
# keyring option
api_response <- GET(url,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_data <- content(api_response, by = "csv/text") %>%
read_csv()
glimpse(api_data)
## Rows: 2,049
## Columns: 19
## $ Date <chr> "01/31/2021", "01/31/2021", "01/31/2021", "01/3…
## $ Category_flat <chr> ";Injury;", ";Injury;", ";Injury;", ";Injury;",…
## $ SubCategory_flat <chr> ";Electrocution;", ";Fall;", ";BiteOrSting;", "…
## $ Patient_Class <chr> "E", "E", "E", "E", "E", "E", "E", "E", "E", "E…
## $ HospitalDHHSRegion <chr> "Region I", "Region I", "Region I", "Region I",…
## $ dhhsregion <chr> "Region I", "Region I", "Region I", "OTHER_REGI…
## $ AgeGroup <chr> "Unknown", "Unknown", "Unknown", "Unknown", "00…
## $ Sex <chr> "F", "F", "F", "M", "M", "F", "M", "M", "F", "F…
## $ DispositionCategory <chr> "DISCHARGED", "DISCHARGED", "none", "DISCHARGED…
## $ AdmissionTypeCategory <chr> "E", "E", "NR", "NR", "NR", "NR", "NR", "NR", "…
## $ HasBeenE <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ HasBeenI <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ HasBeenO <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ DDAvailable <dbl> 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ DDInformative <dbl> 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ CCAvailable <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ CCInformative <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ FirstDateTimeAdded <dttm> 2021-01-31 23:12:22, 2021-01-31 23:12:22, 2021…
## $ HasBeenAdmitted <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
url <- "https://essence.syndromicsurveillance.org/nssp_essence/api/dataDetails?endDate=31Jan2021&medicalGrouping=injury&percentParam=noPercent&geographySystem=hospitaldhhsregion&datasource=va_hospdreg&detector=probrepswitch&startDate=31Jan2021&timeResolution=daily&medicalGroupingSystem=essencesyndromes&userId=455&aqtTarget=DataDetails"
# Rnssp option
api_data <- myProfile$get_api_data(url)
# keyring option
api_response <- GET(url,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_response_json <- content(api_response, as = "text")
api_data <- fromJSON(api_response_json) %>%
extract2("dataDetails")
glimpse(api_data)
## Rows: 31,884
## Columns: 19
## $ Date <chr> "01/31/2021", "01/31/2021", "01/31/2021", "01/3…
## $ Category_flat <chr> ";Injury;", ";Injury;", ";Injury;", ";Injury;",…
## $ SubCategory_flat <chr> ";Fall;", ";CutOrPierce;", ";SuicideOrSelfInfli…
## $ Patient_Class <chr> "I", "E", "E", "E", "E", "E", "E", "E", "E", "E…
## $ HospitalDHHSRegion <chr> "Region V", "Region V", "Region V", "Region V",…
## $ dhhsregion <chr> "Region V", "Region V", "Region V", "Region V",…
## $ AgeGroup <chr> "Unknown", "Unknown", "Unknown", "Unknown", "Un…
## $ Sex <chr> "F", "M", "F", "M", "M", "F", "M", "M", "M", "F…
## $ DispositionCategory <chr> "TRANSFERRED", "DISCHARGED", "DISCHARGED", "DIS…
## $ AdmissionTypeCategory <chr> "NR", "NR", "NR", "NR", "NR", "NR", "NR", "NR",…
## $ HasBeenE <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1…
## $ HasBeenI <chr> "1", "0", "0", "0", "0", "0", "0", "0", "0", "0…
## $ HasBeenO <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0…
## $ DDAvailable <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1…
## $ DDInformative <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1…
## $ CCAvailable <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1…
## $ CCInformative <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1…
## $ FirstDateTimeAdded <chr> "2021-02-01 06:13:51.63", "2021-02-01 06:13:51.…
## $ HasBeenAdmitted <chr> "1", "0", "0", "0", "0", "0", "0", "0", "0", "0…
The Summary Stats ESSENCE API counts regions (or, "counties," in ESSENCE) and facilities in your query by whatever time resolution you define (daily, weekly, monthly, quarterly, or yearly). One difference between this API and others is that it is only available on full details data sources (the only data sources that expose this level of information). This API is particularly useful for understanding the number of hospitals with results for your query.
url <- "https://essence.syndromicsurveillance.org/nssp_essence/api/summaryData?endDate=31Jan2021&medicalGrouping=injury&geography=region%20i&percentParam=noPercent&geographySystem=hospitaldhhsregion&datasource=va_hosp&detector=probrepswitch&startDate=29Jan2021&timeResolution=daily&medicalGroupingSystem=essencesyndromes&userId=455&aqtTarget=TimeSeries"
# Rnssp option
api_data <- myProfile$get_api_data(url)
# keyring option
api_response <- GET(url,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_response_json <- content(api_response, as = "text")
api_data <- fromJSON(api_response_json) %>%
extract2("summaryData")
glimpse(api_data)
## Rows: 3
## Columns: 5
## $ date <chr> "29Jan21", "30Jan21", "31Jan21"
## $ HospitalState <dbl> 6, 6, 6
## $ State <dbl> 16, 18, 19
## $ Region <dbl> 91, 93, 97
## $ Hospital <dbl> 204, 203, 200
The Alert List API gives you programmatic access to the Alert List table in the ESSENCE user interface. The results in this table are updated a few times daily and are run by patient region (or, "county" in ESSENCE) and by hospital. The first example below shows the results by patient region and the second by hospital.
url <- "https://essence2.syndromicsurveillance.org/nssp_essence/api/alerts/regionSyndromeAlerts?end_date=30Apr2021&start_date=28Apr2021"
# Rnssp option
api_data <- myProfile$get_api_data(url) %>%
extract2("regionSyndromeAlerts")
# keyring option
api_response <- GET(url,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_response_json <- content(api_response, as = "text")
api_data <- fromJSON(api_response_json) %>%
extract2("regionSyndromeAlerts")
glimpse(api_data)
## Rows: 24,684
## Columns: 12
## $ date <chr> "2021-04-30", "2021-04-30", "2021-04-28", "2021-0…
## $ datasource <chr> "va_er", "va_er", "va_er", "va_er", "va_er", "va_…
## $ age <chr> "65-1000", "65-1000", "65-1000", "all", "all", "a…
## $ sex <chr> "all", "all", "all", "all", "all", "all", "all", …
## $ detector <chr> "probrepswitch", "probrepswitch", "probrepswitch"…
## $ level <dbl> 0.005630151, 0.046790074, 0.026056932, 0.01475552…
## $ count <int> 2, 3, 3, 12, 5, 8, 11, 3, 21, 22, 6, 7, 3, 2, 1, …
## $ expected <dbl> 0.1785714, 1.3571429, 1.1785714, 5.6785714, 1.250…
## $ region <chr> "NC_Duplin", "NC_Duplin", "NC_Duplin", "NC_Duplin…
## $ syndrome <chr> "Rash", "Neuro", "Neuro", "Fever", "ILI", "Neuro"…
## $ timeResolution <chr> "daily", "daily", "daily", "daily", "daily", "dai…
## $ `observed/expected` <dbl> 11.200000, 2.210526, 2.545455, 2.113208, 4.000000…
url <- "https://essence2.syndromicsurveillance.org/nssp_essence/api/alerts/hospitalSyndromeAlerts?end_date=30Apr2021&start_date=28Apr2021"
# Rnssp option
api_response <- myProfile$get_api_response(url)
api_data <- myProfile$get_api_data(url) %>%
extract2("hospitalSyndromeAlerts")
# keyring option
api_response <- GET(url,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_response_json <- content(api_response, as = "text")
api_data <- fromJSON(api_response_json) %>%
extract2("hospitalSyndromeAlerts")
glimpse(api_data)
## list()
Prior to this update, there was not an efficient way of pulling in daily stratified alerts. As of August 2020, users can now pull historical alerts across stratifications in a long, tabular format. Note that this functionality is available from ESSENCE2: https://essence2.syndromicsurveillance.org/. As an example, a user could choose multiple CCDD categories, Has been Emergency = "Yes", CCDD Category for "As Percent Parameter", and within the time series interface choose a geography level such as Hospital HHS Region for “Across Graphs Stratification” and CCDD Category for “Within Graph Stratification”. After selecting these configurations, the corresponding API URL will populate the API URL box under the “Query Options” dropdown menu. Note: There is no need to select the “Update” button in the user interface to generate the stratified time series or to build the time series data table, the API URL is populated immediately. Additionally, if “As Percent Parameter” is specified, the data pulled into RStudio will contain p-values and alert indicators specific to both counts and percentages. Indicators specific to counts are specified with "_dataCount" tags in the column names.
url <- "https://essence2.syndromicsurveillance.org/nssp_essence/api/timeSeries?endDate=9Feb2021&ccddCategory=cdc%20pneumonia%20ccdd%20v1&ccddCategory=cdc%20coronavirus-dd%20v1&ccddCategory=cli%20cc%20with%20cli%20dd%20and%20coronavirus%20dd%20v2&percentParam=ccddCategory&geographySystem=hospitaldhhsregion&datasource=va_hospdreg&detector=probrepswitch&startDate=11Nov2020&timeResolution=daily&hasBeenE=1&medicalGroupingSystem=essencesyndromes&userId=2362&aqtTarget=TimeSeries&stratVal=ccddCategory&multiStratVal=geography&graphOnly=true&numSeries=3&graphOptions=multipleSmall&seriesPerYear=false&nonZeroComposite=false&removeZeroSeries=true&startMonth=January&stratVal=ccddCategory&multiStratVal=geography&graphOnly=true&numSeries=3&graphOptions=multipleSmall&seriesPerYear=false&startMonth=January&nonZeroComposite=false"
# Rnssp option
api_data <- myProfile$get_api_data(url) %>%
extract2("timeSeriesData")
# keyring option
api_response <- GET(url,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_response_json <- content(api_response, as = "text")
api_data<- fromJSON(api_response_json) %>%
extract2("timeSeriesData")
glimpse(api_data)
## Rows: 2,730
## Columns: 21
## $ date <chr> "2020-11-11", "2020-11-12", "2020-11-13", …
## $ count <dbl> 1.713413, 1.828619, 1.752491, 1.775689, 2.…
## $ expected <chr> "1.483", "1.492", "1.788", "1.731", "1.804…
## $ levels <chr> "0.216", "0.177", "0.524", "0.47", "0.316"…
## $ colorID <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ color <chr> "blue", "blue", "blue", "blue", "blue", "b…
## $ altText <chr> "Data: Date: 11Nov20, Level: 0.216, Count:…
## $ details <chr> "/nssp_essence/servlet/DataDetailsServlet?…
## $ graphType <chr> "percent", "percent", "percent", "percent"…
## $ dataCount <dbl> 303, 309, 292, 272, 304, 385, 360, 313, 30…
## $ expected_dataCount <dbl> 238.8571, 307.2781, 301.6989, 278.2173, 29…
## $ levels_dataCount <dbl> 0.001257941, 0.464353595, 0.688298523, 0.6…
## $ colorID_dataCount <int> 3, 1, 1, 1, 1, 3, 2, 1, 1, 1, 1, 1, 1, 1, …
## $ color_dataCount <chr> "red", "blue", "blue", "blue", "blue", "re…
## $ allCount <dbl> 17684, 16898, 16662, 15318, 14556, 17987, …
## $ lineLabel <chr> "CDC Pneumonia CCDD v1 - Region 1", "CDC P…
## $ title <chr> "CDC Pneumonia CCDD v1 - Region 1", "CDC P…
## $ ccddCategory_id <chr> "CDC Pneumonia CCDD v1", "CDC Pneumonia CC…
## $ ccddCategory_display <chr> "CDC Pneumonia CCDD v1", "CDC Pneumonia CC…
## $ hospitaldhhsregion_id <chr> "Region I", "Region I", "Region I", "Regio…
## $ hospitaldhhsregion_display <chr> "Region 1", "Region 1", "Region 1", "Regio…
names(api_data)
## [1] "date" "count"
## [3] "expected" "levels"
## [5] "colorID" "color"
## [7] "altText" "details"
## [9] "graphType" "dataCount"
## [11] "expected_dataCount" "levels_dataCount"
## [13] "colorID_dataCount" "color_dataCount"
## [15] "allCount" "lineLabel"
## [17] "title" "ccddCategory_id"
## [19] "ccddCategory_display" "hospitaldhhsregion_id"
## [21] "hospitaldhhsregion_display"
A benefit of pulling the data and alerts in this manner is that you can create customized figures with ggplot
or plotly
that can be incorporated into static or interactive RMarkdown reports:
hhs_region_data <- api_data %>%
select(
date,
hhs_region = hospitaldhhsregion_display,
ccdd_category = ccddCategory_display,
percent = count,
color
) %>%
mutate(
date = as.Date(date),
hhs_region = factor(hhs_region, levels = c("Region 1", "Region 2", "Region 3", "Region 4", "Region 5",
"Region 6", "Region 7", "Region 8", "Region 9", "Region 10"))
) %>%
filter(ccdd_category == "CLI CC with CLI DD and Coronavirus DD v2") %>%
arrange(date, hhs_region)
ggplot(hhs_region_data, aes(x = date, y = percent)) +
geom_line(size = 0.7, color = "#046C9A") +
geom_point(data = subset(hhs_region_data, color == "red"), color = "red", size = 0.5) +
geom_point(data = subset(hhs_region_data, color == "yellow"), color = "yellow", size = 0.5) +
theme_bw() +
labs(title = "CLI v2 by HHS Region",
x = "Date",
y = "Percent of ED Visits") +
facet_wrap(facets = ~hhs_region)
ESSENCE2 now includes facility county FIPS and patient county FIPS as available query fields (note that these are technically approximations since ESSENCE regions are populated by zip codes). This allows a user to choose FIPS codes as a row (or column) field in the table builder. The following example assumes that a user has chosen the Facility Location (Full Details) data source, the CDC Coronavirus-DD v1, CDC Pneumonia CCDD v1, and CLI CC with CLI DD and Coronavirus DD v2 CCDD categories, “Yes” for “Has Been Emergency”, “CC and DD Category” for “As Percent Query”, and all counties within their state for Facility County FIPS Approximation. By selecting Date and Facility County FIPS Approximation for row fields and CC and DD Category for column fields, the API URL generated in the user interface will have the following structure:
url <- "https://essence2.syndromicsurveillance.org/nssp_essence/api/tableBuilder/csv?endDate=9Feb2021&facilityfips=...&percentParam=ccddCategory&datasource=va_hosp&startDate=11Nov2020&medicalGroupingSystem=essencesyndromes&userId=2362&aqtTarget=TableBuilder&ccddCategory=cdc%20coronavirus-dd%20v1&ccddCategory=cdc%20pneumonia%20ccdd%20v1&ccddCategory=cli%20cc%20with%20cli%20dd%20and%20coronavirus%20dd%20v2&geographySystem=hospital&detector=nodetectordetector&timeResolution=daily&hasBeenE=1&rowFields=timeResolution&rowFields=facilityfips&columnField=ccddCategory"
After the specification of endDate, all facility FIPS codes will be defined with the following syntax: “&facilityfips=fipscode1&facilityfips=fipscode2&…&facilityfips=fipscodeN&”. Note: Currently, users need to manually insert “&refValues=false” after specification of facilityfips as a row field in order to pull in the actual codes instead of the county names. This URL and API pull should be defined as follows:
url <- "https://essence2.syndromicsurveillance.org/nssp_essence/api/tableBuilder/csv?endDate=9Feb2021&facilityfips=...&percentParam=ccddCategory&datasource=va_hosp&startDate=11Nov2020&medicalGroupingSystem=essencesyndromes&userId=2362&aqtTarget=TableBuilder&ccddCategory=cdc%20coronavirus-dd%20v1&ccddCategory=cdc%20pneumonia%20ccdd%20v1&ccddCategory=cli%20cc%20with%20cli%20dd%20and%20coronavirus%20dd%20v2&geographySystem=hospital&detector=nodetectordetector&timeResolution=daily&hasBeenE=1&rowFields=timeResolution&rowFields=facilityfips&refValues=false&columnField=ccddCategory"
# Rnssp option
api_data <- myProfile$get_api_data(url, fromCSV = TRUE)
# keyring option
api_response <- GET(url,
authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])))
api_response_csv <- content(api_response, as = "text")
api_data <- read_csv(api_response_csv)
Occasionally, the API URL you created for the ESSENCE query will be very, VERY long. For example, you might have reason to explicitly include all facilities, all counties, or all ZIP codes for a site. In this situation, the character length of your URL might be too long to assign to the "url <-" object as shown in the preceding examples. When this occurs, all you have to do is split the URL into two or more strings when creating objects, and then pull them together later. In the code chunk shown below, we start with one long URL (use your imagination here) and break it into two pieces. Then, to create the object "url3," we paste them together. Finally, to create the "url," we clean-up "url3" by removing the carriage return (added when you separated the original URL by pressing enter on your keyboard), and replace it with nothing. This final URL can be passed to ESSENCE along with your credentials to return your results.
url1 <- "https://essence.syndromicsurveillance.org/nssp_essence/api/very_very_very_long_URL_ver_long_use_your_imagination_here...."
url2 <- "still_going_even_longer_here..........."
url3 <- paste0(url1, url2)
#Resulting url
print(url)
## [1] "https://essence2.syndromicsurveillance.org/nssp_essence/api/tableBuilder/csv?endDate=9Feb2021&facilityfips=...&percentParam=ccddCategory&datasource=va_hosp&startDate=11Nov2020&medicalGroupingSystem=essencesyndromes&userId=2362&aqtTarget=TableBuilder&ccddCategory=cdc%20coronavirus-dd%20v1&ccddCategory=cdc%20pneumonia%20ccdd%20v1&ccddCategory=cli%20cc%20with%20cli%20dd%20and%20coronavirus%20dd%20v2&geographySystem=hospital&detector=nodetectordetector&timeResolution=daily&hasBeenE=1&rowFields=timeResolution&rowFields=facilityfips&columnField=ccddCategory"
For reports that are run weekly or daily, it is convenient to automate the setting of the start and end dates, rather than having to change the dates in the API URL manually prior to knitting. There are multiple ways of going about this - either by splitting the URL into 3 pieces in a similar fashion to how the URL in the previous example was split, or by using str_extract()
and str_replace()
to substitute the appropriate dates. For example, if the report is based on the most recent 90 days, the start and end date can be auto-determined by using base R's Sys.Date()
and format()
to ensure appropriate date formatting. format(Sys.Date(), "%d%b%Y")
will give today's date, 17Jun2021, while format(Sys.Date() - 90, %d%b%Y")
will give the start date of the recent 90 day period, 19Mar2021. To insert the dates by splitting, the URL can be split into three pieces and then pasted back together:
endDate <- format(Sys.Date(), "%d%b%Y")
startDate <- format(Sys.Date()- 90, "%d%b%Y")
url1 <- "https://essence.syndromicsurveillance.org/nssp_essence/api/timeSeries/graph?"
url2 <- paste0("endDate=", endDate, "&medicalGrouping=injury&percentParam=noPercent&geographySystem=hospitaldhhsregion&datasource=va_hospdreg&detector=probrepswitch&")
url3 <- paste0("startDate=", startDate, "&timeResolution=daily&medicalGroupingSystem=essencesyndromes&userId=455&aqtTarget=TimeSeries&graphTitle=National%20-%20Injury%20Syndrome%20Daily%20Counts&xAxisLabel=Date&yAxisLabel=Count")
url <- paste0(url1, url2, url3)
# Rnssp option
api_png <- myProfile$get_api_tsgraph(url)
knitr::include_graphics(api_png$tsgraph)
# keyring option
api_response <- GET(url, authenticate(key_list("essence")[1,2],
key_get("essence",
key_list("essence")[1,2])),
write_disk("timeseries2.png", overwrite=TRUE))
knitr::include_graphics('timeseries2.png')
Additionally, the start and end dates in the URL can remained fixed, and one can simply extract and replace the old dates with new dates. The following results in the same final URL that the above example produces.
url <- "https://essence.syndromicsurveillance.org/nssp_essence/api/timeSeries/graph?endDate=01Jun2020&medicalGrouping=injury&percentParam=noPercent&geographySystem=hospitaldhhsregion&datasource=va_hospdreg&detector=probrepswitch&startDate=04Mar2020&timeResolution=daily&medicalGroupingSystem=essencesyndromes&userId=455&aqtTarget=TimeSeries&graphTitle=National%20-%20Injury%20Syndrome%20Daily%20Counts&xAxisLabel=Date&yAxisLabel=Count"
endDateOld <- regmatches(url, regexpr('endDate=.+?&', url))
endDateOld <- str_extract(endDateOld, "[0-9]{1,2}[A-Z|a-z]{3}[0-9]{2,4}")
endDateNew <- format(Sys.Date(), "%d%b%Y")
startDateOld <- regmatches(url, regexpr("startDate=.+?&", url))
startDateOld <- str_extract(startDateOld, "[0-9]{1,2}[A-Z|a-z]{3}[0-9]{2,4}")
startDateNew <- format(Sys.Date() - 90, "%")
url <- str_replace(url, endDateOld, endDateNew)
url <- str_replace(url, startDateOld, startDateNew)
url
## [1] "https://essence.syndromicsurveillance.org/nssp_essence/api/timeSeries/graph?endDate=17Jun2021&medicalGrouping=injury&percentParam=noPercent&geographySystem=hospitaldhhsregion&datasource=va_hospdreg&detector=probrepswitch&startDate=%&timeResolution=daily&medicalGroupingSystem=essencesyndromes&userId=455&aqtTarget=TimeSeries&graphTitle=National%20-%20Injury%20Syndrome%20Daily%20Counts&xAxisLabel=Date&yAxisLabel=Count"
The preceding examples will help you start pulling data into your RStudio environment, but the next steps are really up to you. If you are unfamiliar with R and RStudio, here are some openly available resources that will help you to move forward in your analysis.