Working with individual mapping files
workWithOneMappingFile.Rmd
library(ROMOPMappingTools)
#> Warning: replacing previous import 'SqlRender::render' by 'rmarkdown::render'
#> when loading 'ROMOPMappingTools'
Intro
This vignette shows how to use some of the functions of the ROMOPMappingTools package to work with a single Usagi mapping file. Reading and Usagi file, validating its format, or updating it after a vocabulary update. For trasforming a single usagi file into C&CR tables of the OMOP vocabulary, we recommend follow the same steps as in the Work with multiple mapping files vignette. This is because the process need some other information that is not included in the Usagi file, but n the ‘vocabularies.csv’ file. For automating all the process in a github repository, please refer to the Work as a github repository vignette.
Example files are included in the package. In the
inst/testdata
folder you can find the files used in this
example.
Reading a Usagi file
For reading the Usagi file, we can use the readUsagiFile
function, which returns a tibble with the correct columns formated. It
can read a standard Usagi file or an extended Usagi file, see the Usagi file format vignette for more
details. In this example we will read a extended Usagi file, from the
test data. This file contains the mappings for the ICD10fi
vocabulary.
pathToUsagiFile <- system.file("testdata/VOCABULARIES/ICD10fi/ICD10fi.usagi.csv", package = "ROMOPMappingTools")
usagiTibble <- readUsagiFile(pathToUsagiFile)
usagiTibble |> dplyr::glimpse()
#> Rows: 3,945
#> Columns: 25
#> $ sourceCode <chr> "A01.0+G01", "A01.0+I39.8", "A01.0+J…
#> $ sourceName <chr> "Meningitis (in) typhoid fever", "En…
#> $ sourceFrequency <int> -1, -1, -1, -1, -1, -1, -1, -1, -1, …
#> $ sourceAutoAssignedConceptIds <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ `ADD_INFO:sourceConceptId` <dbl> 2000500101, 2000500102, 2000500103, …
#> $ `ADD_INFO:sourceName_fi` <chr> "Lavantautiin liittyvä aivokalvotule…
#> $ `ADD_INFO:sourceConceptClass` <chr> "ICD10fi Hierarchy", "ICD10fi Hierar…
#> $ `ADD_INFO:sourceDomain` <chr> "Condition", "Condition", "Condition…
#> $ `ADD_INFO:sourceValidStartDate` <date> 1900-01-01, 1900-01-01, 1900-01-01,…
#> $ `ADD_INFO:sourceValidEndDate` <date> 2099-12-31, 2099-12-31, 2099-12-31,…
#> $ `ADD_INFO:sourceParents` <chr> "A01|A01.0|G01", "A01|A01.0|I39.8", …
#> $ `ADD_INFO:sourceParentVocabulary` <chr> "ICD10|ICD10|ICD10", "ICD10|ICD10|IC…
#> $ matchScore <dbl> 0.00, 0.00, 0.00, 0.78, 0.00, 0.00, …
#> $ mappingStatus <chr> "APPROVED", "APPROVED", "APPROVED", …
#> $ equivalence <chr> "EQUAL", "EQUAL", "EQUAL", "EQUIVALE…
#> $ statusSetBy <chr> "PKo", "PKo", "PKo", "PKo", "PKo", "…
#> $ statusSetOn <dbl> 1.666794e+12, 1.666794e+12, 1.666794…
#> $ conceptId <int> 4100102, 4111401, 4166072, 80316, 43…
#> $ conceptName <chr> "Meningitis due to typhoid fever", "…
#> $ domainId <chr> "Condition", "Condition", "Condition…
#> $ mappingType <chr> "MAPS_TO", "MAPS_TO", "MAPS_TO", "MA…
#> $ comment <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ createdBy <chr> "TAYS", "TAYS", "TAYS", "PKo", "TAYS…
#> $ createdOn <dbl> 1.623974e+12, 1.623974e+12, 1.623974…
#> $ assignedReviewer <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
Validating a Usagi file
To validate if all the information in the Usagi file is correct, we
can use the validateUsagiFile
function. This function takes
an Usagi or Usagi-extended file and and performs a series of
validations, see the function help for more details
?validateUsagiFile
. The function also needs a connection to
the OMOP vocabulary database and schema to where the vocabulary tables
are stored in order to make some of the validations. The function also
need the number used to offset the source concept ids in the Usagi file.
The function returns a tibble with a summary of the validations
conducted and if error are found, a new Usagi file with the errors will
be created in the specified path.
For this example, we will use the test database in DuckDB format
included in the package. This test database contains only the ICD10
vocabulary with all the keys in other tables (see
inst/testdata/createTestData.R
for more details).
pathToOMOPVocabularyDuckDBfile <- helper_createATemporaryCopyOfTheOMOPVocabularyDuckDB()
connectionDetails <- DatabaseConnector::createConnectionDetails(
dbms = "duckdb",
server = pathToOMOPVocabularyDuckDBfile
)
connection <- DatabaseConnector::connect(connectionDetails)
#> Connecting using DuckDB driver
vocabularyDatabaseSchema <- "main"
pathToValidatedUsagiFile <- tempfile(fileext = "usagi_validated.csv")
validationsSummary <- validateUsagiFile(
pathToUsagiFile,
connection,
vocabularyDatabaseSchema,
pathToValidatedUsagiFile = pathToValidatedUsagiFile,
sourceConceptIdOffset = 2000500000
)
knitr::kable(validationsSummary)
type | step | message |
---|---|---|
SUCCESS | Missing default columns | |
SUCCESS | SourceCode is empty | |
SUCCESS | SourceCode and conceptId are not unique | |
SUCCESS | SourceName is empty | |
SUCCESS | SourceName is more than 255 characters | |
SUCCESS | SourceFrequency is not empty | |
SUCCESS | MappingStatus is empty | |
SUCCESS | MappingStatus is not valid | |
SUCCESS | APPROVED mappingStatus conceptId is 0 | |
SUCCESS | APPROVED mappingStatus with concepts outdated | |
SUCCESS | Not APPROVED mappingStatus with concepts outdated | |
SUCCESS | Missing C&CR columns | |
SUCCESS | SourceConceptId is empty | |
SUCCESS | SourceConceptId is not a number on the range | |
SUCCESS | SourceConceptClass is empty | |
SUCCESS | SourceConceptClass is more than 20 characters | |
SUCCESS | SourceDomain is empty | |
SUCCESS | SourceDomain is not a valid domain | |
SUCCESS | Not APPROVED mappingStatus with valid domain combination | |
SUCCESS | APPROVED mappingStatus with valid domain combination | |
SUCCESS | Missing date columns | |
SUCCESS | SourceValidStartDate is after SourceValidEndDate | |
SUCCESS | Missing parent columns | |
SUCCESS | Invalid parent concept code |
In this case the Usagi file is valid and no errors are found. Hence, the new validated Usagi file remains unchanged.
However, we can see what happens with a usagi file with errors. In this case we use an other Usagi file with all type of errors, wich is included in the package for unit testing purposes.
pathToUsagiFileWithErrors <- system.file("testdata/VOCABULARIES/ICD10fi/ICD10fi_with_errors.usagi.csv", package = "ROMOPMappingTools")
usagiTibbleWithErrors <- readUsagiFile(pathToUsagiFileWithErrors)
validationsSummaryWithErrors <- validateUsagiFile(
pathToUsagiFileWithErrors,
connection,
vocabularyDatabaseSchema,
pathToValidatedUsagiFile = pathToValidatedUsagiFile,
sourceConceptIdOffset = 2000500000
)
knitr::kable(validationsSummaryWithErrors)
type | step | message |
---|---|---|
SUCCESS | Missing default columns | |
ERROR | SourceCode is empty | Number of failed rules: 1 |
ERROR | SourceCode and conceptId are not unique | Number of failed rules: 2 |
ERROR | SourceName is empty | Number of failed rules: 1 |
ERROR | SourceName is more than 255 characters | Number of failed rules: 1 |
SUCCESS | SourceFrequency is not empty | |
SUCCESS | MappingStatus is empty | |
SUCCESS | MappingStatus is not valid | |
ERROR | APPROVED mappingStatus conceptId is 0 | Number of failed rules: 1 |
ERROR | APPROVED mappingStatus with concepts outdated | 3 conceptIds do not exist on the target vocabularies, 3 conceptNames are outdated, 3 domainIds are outdated, 8 standardConcepts have changed to non-standard |
WARNING | Not APPROVED mappingStatus with concepts outdated | 3 conceptIds do not exist on the target vocabularies, 1 conceptNames are outdated, 3 domainIds are outdated, 5 standardConcepts have changed to non-standard |
SUCCESS | Missing C&CR columns | |
ERROR | SourceConceptId is empty | Number of failed rules: 1 |
ERROR | SourceConceptId is not a number on the range | Number of failed rules: 1 |
ERROR | SourceConceptClass is empty | Number of failed rules: 1 |
ERROR | SourceConceptClass is more than 20 characters | Number of failed rules: 1 |
ERROR | SourceDomain is empty | Number of failed rules: 1 |
ERROR | SourceDomain is not a valid domain | Number of failed rules: 1 |
WARNING | Not APPROVED mappingStatus with valid domain combination | Found 1 codes with invalid domain combinations |
ERROR | APPROVED mappingStatus with valid domain combination | Found 1 codes with invalid domain combinations |
SUCCESS | Missing date columns | |
ERROR | SourceValidStartDate is after SourceValidEndDate | Number of failed rules: 1 |
SUCCESS | Missing parent columns | |
ERROR | Invalid parent concept code | Found 3 codes with invalid parent concept codes |
In this case, if we open the new validate Usagi with the Usagi
software these mapping with errors will appear as FLAGGED. Additionally,
the ADD_INFO:validationMessages
column will indicate the
exact error or errors found.

Updating a Usagi file
If the vocabulary has been updated since the Usagi file was created,
it may happen that some of the mappings are outdated. This will be
detected by the validateUsagiFile
and show as a “ConceptIds
outdated” error.
In this case we will use an other Usagi file with outdated concept ids, which is included in the package for unit testing purposes.
pathToOutdatedUsagiFile <- system.file("testdata/VOCABULARIES/ICD10fi/ICD10fi_outdated.usagi.csv", package = "ROMOPMappingTools")
validationsSummaryWithErrors <- validateUsagiFile(
pathToOutdatedUsagiFile,
connection,
vocabularyDatabaseSchema,
pathToValidatedUsagiFile = pathToValidatedUsagiFile,
sourceConceptIdOffset = 2000500000
)
knitr::kable(validationsSummaryWithErrors)
type | step | message |
---|---|---|
SUCCESS | Missing default columns | |
SUCCESS | SourceCode is empty | |
SUCCESS | SourceCode and conceptId are not unique | |
SUCCESS | SourceName is empty | |
SUCCESS | SourceName is more than 255 characters | |
SUCCESS | SourceFrequency is not empty | |
SUCCESS | MappingStatus is empty | |
SUCCESS | MappingStatus is not valid | |
SUCCESS | APPROVED mappingStatus conceptId is 0 | |
ERROR | APPROVED mappingStatus with concepts outdated | 122 conceptNames are outdated, 92 domainIds are outdated, 24 standardConcepts have changed to non-standard |
SUCCESS | Not APPROVED mappingStatus with concepts outdated | |
SUCCESS | Missing C&CR columns | |
SUCCESS | SourceConceptId is empty | |
SUCCESS | SourceConceptId is not a number on the range | |
SUCCESS | SourceConceptClass is empty | |
SUCCESS | SourceConceptClass is more than 20 characters | |
SUCCESS | SourceDomain is empty | |
SUCCESS | SourceDomain is not a valid domain | |
SUCCESS | Not APPROVED mappingStatus with valid domain combination | |
SUCCESS | APPROVED mappingStatus with valid domain combination | |
SUCCESS | Missing date columns | |
SUCCESS | SourceValidStartDate is after SourceValidEndDate | |
SUCCESS | Missing parent columns | |
SUCCESS | Invalid parent concept code |
If outdated error are detected, we can attempt to update the Usagi
file automatically using the updateUsagiFile
function. This
function takes an Usagi or Usagi-extended file, a connection to the
database, the schema with the vocabulary tables and a path to a file
where to store the updated Usagi file.
pathToUpdatedUsagiFile <- tempfile(fileext = "usagi_updated.csv")
updateSummary <- updateUsagiFile(
pathToOutdatedUsagiFile,
connection,
vocabularyDatabaseSchema,
pathToUpdatedUsagiFile,
skipValidation = TRUE
)
#> Note: method with signature 'DBIConnection#SQL' chosen for function 'dbQuoteIdentifier',
#> target signature 'DatabaseConnectorDbiConnection#SQL'.
#> "DatabaseConnectorConnection#character" would also be valid
knitr::kable(updateSummary)
type | step | message |
---|---|---|
INFO | Updated conceptIds | Updated 20 conceptIds that don’t need review |
WARNING | Updated conceptIds | 29 conceptIds could not be updated automatically, remapping needed |
INFO | Updated domains | Updated 38 domains |
INFO | Updated concept names | Updated 100 concept names |
This fuction updates changes in domain_id
,
concept_name
and if the mapped concept_id
point to a non-standard concept it will try to find a new mapping for it
(This is done by looking at the relationship table for relationships of
the old concept_id by “Maps to”, “Concept replaced by”, “Concept same_as
to” and “Concept poss_eq to” in that order). A new column
ADD_INFO:autoUpdatingInfo
is added to the updated Usagi
file to show the specific changes made to the file.
Some times, like in this case, a new concept_id can not be found, this is shown as a warning.
The new updates Usagi file can be validated again with the
validateUsagiFile
function to check if there are any
errors.
validationsSummaryWithErrors <- validateUsagiFile(
pathToUpdatedUsagiFile,
connection,
vocabularyDatabaseSchema,
pathToValidatedUsagiFile = pathToValidatedUsagiFile,
sourceConceptIdOffset = 2000500000
)
knitr::kable(validationsSummaryWithErrors)
type | step | message |
---|---|---|
SUCCESS | Missing default columns | |
SUCCESS | SourceCode is empty | |
SUCCESS | SourceCode and conceptId are not unique | |
SUCCESS | SourceName is empty | |
SUCCESS | SourceName is more than 255 characters | |
SUCCESS | SourceFrequency is not empty | |
SUCCESS | MappingStatus is empty | |
SUCCESS | MappingStatus is not valid | |
SUCCESS | APPROVED mappingStatus conceptId is 0 | |
SUCCESS | APPROVED mappingStatus with concepts outdated | |
SUCCESS | Not APPROVED mappingStatus with concepts outdated | |
SUCCESS | Missing C&CR columns | |
SUCCESS | SourceConceptId is empty | |
SUCCESS | SourceConceptId is not a number on the range | |
SUCCESS | SourceConceptClass is empty | |
SUCCESS | SourceConceptClass is more than 20 characters | |
SUCCESS | SourceDomain is empty | |
SUCCESS | SourceDomain is not a valid domain | |
SUCCESS | Not APPROVED mappingStatus with valid domain combination | |
ERROR | APPROVED mappingStatus with valid domain combination | Found 6 codes with invalid domain combinations |
SUCCESS | Missing date columns | |
SUCCESS | SourceValidStartDate is after SourceValidEndDate | |
SUCCESS | Missing parent columns | |
SUCCESS | Invalid parent concept code |
Unfortunatelly, sometimes the updateUsagiFile is introducing new errors, in this case updates in the vocabulary have introduced invalid domain combinations. Moreover, some of the mappings could not be updated because the new concept_id was not found. This need to be fixed by the user by reviewing the Usagi file.