Tables description and rules
filesFormat.Rmd
Usagi file format
Usagi default columns
One row per sourceCode + conceptId combination.
Name | Type | Description | Rules checked by validateUsagiFile |
---|---|---|---|
sourceCode | character | The source code to be mapped | not empty, unique sourceCode + conceptId combinations |
sourceName | character | The name/description of the source code | not empty, less than 255 characters |
sourceFrequency | integer | How frequently this code appears in the source data | not empty, set to -1 if not known |
sourceAutoAssignedConceptIds | integer | Automatically assigned concept IDs | |
matchScore | double | Score indicating quality of the mapping match (0-1) | |
mappingStatus | character | Status of the mapping (APPROVED, UNCHECKED, etc.) | Required, one of the following: APPROVED, UNCHECKED, FLAGED, INEXACT |
equivalence | character | Type of equivalence (EQUIVALENT, BROADER, etc.) | |
statusSetBy | character | User who set the mapping status | |
statusSetOn | double | Timestamp when status was set | |
conceptId | integer | Target OMOP concept ID | not empty, if 0 the mappingStatus cannot be APPROVED |
conceptName | character | Name of target OMOP concept | |
domainId | character | Domain of target concept (Condition, Drug, etc.) | |
mappingType | character | Type of mapping (MAPS_TO, etc.) | |
comment | character | Comments about the mapping | |
createdBy | character | User who created the mapping | |
createdOn | double | Timestamp when mapping was created | |
assignedReviewer | character | User assigned to review the mapping |
Usagi extended columns
The Usagi file is considered a C&CR file if it has the following columns: ADD_INFO:sourceConceptId, ADD_INFO:sourceConceptClass and ADD_INFO:sourceDomain.
The pair ADD_INFO:sourceValidStartDate and ADD_INFO:sourceValidEndDate are optional. If not included, the respective columns in the CONCEPT table will be set to the default values, which are 1900-01-01 and 2099-12-31.
The pair ADD_INFO:sourceParents and ADD_INFO:sourceParentVocabulary are optional. If included, they will be use to populate the CONCEPT_RELATIONSHIP table with the ‘Is a’ and ‘Subsumes’ relationships.
The ADD_INFO:validationMessages column is added by validateUsagiFile and contains the messages from the validation checks.
Name | Type | Description | Rules |
---|---|---|---|
ADD_INFO:sourceConceptId | double | Source vocabulary concept ID | not empty, number on the range given by sourceConceptIdOffset, must be unique per each sourceCode |
ADD_INFO:sourceConceptClass | character | Concept class in source vocabulary | not empty, less than 20 characters |
ADD_INFO:sourceDomain | character | Domain in source vocabulary | not empty, value exist in the DOMAIN table, when the code maps to more than one concept the combined domain is valid |
ADD_INFO:sourceValidStartDate | date | Start date of validity in source | if empty, the default value is 1900-01-01, the value must be before ADD_INFO:sourceValidEndDate |
ADD_INFO:sourceValidEndDate | date | End date of validity in source | if empty, the default value is 2099-12-31, the value must be after ADD_INFO:sourceValidStartDate |
ADD_INFO:sourceParents | character | Parent codes in source vocabulary | not empty, if more that one parent, separated by, combination of sourceParents and sourceParentVocabulary must exits in the CDM or in the usagi file |
ADD_INFO:sourceParentVocabulary | character | Vocabularies of parent codes | if empty, the vocabulary is itself, if more that one parent, separated by |
ADD_INFO:validationMessages | character | Column added by validateUsagiFile | Optional |
ADD_INFO:autoUpdatingInfo | character | Column added by updateUsagiFile | Optional |
vocabularies.csv file format
The vocabularies.csv
file is used to describe the
vocabularies to be processed. It is a csv file with the following
columns:
Name | Type | Description | Rules |
---|---|---|---|
source_vocabulary_id | character | The id of the vocabulary | not empty, less than 20 characters |
source_vocabulary_name | character | A description of the vocabulary | not empty, less than 255 characters |
source_concept_id_offset | integer | The offset of the source concept id | not empty, number over 2 billion |
path_to_usagi_file | character | The path to the vocabulary’s Usagi file | not empty, file must exist |
path_to_news_file | character | The path to the vocabulary’s news file | not empty, file must exist |
ignore | boolean | Indicates if the vocabulary should be ignored in processing | not empty |
SOURCE_TO_CONCEPT_MAP_EXTENDED table format
The SOURCE_TO_CONCEPT_MAP_EXTENDED is an extension of the SOURCE_TO_CONCEPT_MAP table, see CDM. It is used to store the source to concept map extended information.
The SOURCE_TO_CONCEPT_MAP_EXTENDED table has the following columns:
Name | Type | Description | Rules |
---|---|---|---|
source_code | character | Source code for the concept | not empty |
source_vocabulary_id | character | Source vocabulary the concept was mapped from | not empty, must exist in VOCABULARY table |
source_code_description | character | Description of source code | not empty |
target_concept_id | integer | Concept ID of the target concept | not empty, must exist in CONCEPT table |
target_vocabulary_id | character | Target vocabulary the concept was mapped to | not empty, must exist in VOCABULARY table |
valid_start_date | date | Date when mapping became valid | not empty, must be before valid_end_date |
valid_end_date | date | Date when mapping became invalid | not empty, must be after valid_start_date |
invalid_reason | character | Reason why mapping was invalidated | empty if valid_end_date is 2099-12-31 |
source_concept_id | integer | Source concept ID | not empty |
source_concept_class | character | Concept class in source vocabulary | not empty, less than 20 characters |
source_domain | character | Domain in source vocabulary | not empty, must exist in DOMAIN table |
source_parents_concept_ids | character | Parent concept IDs in source vocabulary | optional, comma-separated list |