Skip to contents

Usagi file format

Usagi default columns

One row per sourceCode + conceptId combination.

Name Type Description Rules checked by validateUsagiFile
sourceCode character The source code to be mapped not empty, unique sourceCode + conceptId combinations
sourceName character The name/description of the source code not empty, less than 255 characters
sourceFrequency integer How frequently this code appears in the source data not empty, set to -1 if not known
sourceAutoAssignedConceptIds integer Automatically assigned concept IDs
matchScore double Score indicating quality of the mapping match (0-1)
mappingStatus character Status of the mapping (APPROVED, UNCHECKED, etc.) Required, one of the following: APPROVED, UNCHECKED, FLAGED, INEXACT
equivalence character Type of equivalence (EQUIVALENT, BROADER, etc.)
statusSetBy character User who set the mapping status
statusSetOn double Timestamp when status was set
conceptId integer Target OMOP concept ID not empty, if 0 the mappingStatus cannot be APPROVED
conceptName character Name of target OMOP concept
domainId character Domain of target concept (Condition, Drug, etc.)
mappingType character Type of mapping (MAPS_TO, etc.)
comment character Comments about the mapping
createdBy character User who created the mapping
createdOn double Timestamp when mapping was created
assignedReviewer character User assigned to review the mapping

Usagi extended columns

The Usagi file is considered a C&CR file if it has the following columns: ADD_INFO:sourceConceptId, ADD_INFO:sourceConceptClass and ADD_INFO:sourceDomain.

The pair ADD_INFO:sourceValidStartDate and ADD_INFO:sourceValidEndDate are optional. If not included, the respective columns in the CONCEPT table will be set to the default values, which are 1900-01-01 and 2099-12-31.

The pair ADD_INFO:sourceParents and ADD_INFO:sourceParentVocabulary are optional. If included, they will be use to populate the CONCEPT_RELATIONSHIP table with the ‘Is a’ and ‘Subsumes’ relationships.

The ADD_INFO:validationMessages column is added by validateUsagiFile and contains the messages from the validation checks.

Name Type Description Rules
ADD_INFO:sourceConceptId double Source vocabulary concept ID not empty, number on the range given by sourceConceptIdOffset, must be unique per each sourceCode
ADD_INFO:sourceConceptClass character Concept class in source vocabulary not empty, less than 20 characters
ADD_INFO:sourceDomain character Domain in source vocabulary not empty, value exist in the DOMAIN table, when the code maps to more than one concept the combined domain is valid
ADD_INFO:sourceValidStartDate date Start date of validity in source if empty, the default value is 1900-01-01, the value must be before ADD_INFO:sourceValidEndDate
ADD_INFO:sourceValidEndDate date End date of validity in source if empty, the default value is 2099-12-31, the value must be after ADD_INFO:sourceValidStartDate
ADD_INFO:sourceParents character Parent codes in source vocabulary not empty, if more that one parent, separated by, combination of sourceParents and sourceParentVocabulary must exits in the CDM or in the usagi file
ADD_INFO:sourceParentVocabulary character Vocabularies of parent codes if empty, the vocabulary is itself, if more that one parent, separated by
ADD_INFO:validationMessages character Column added by validateUsagiFile Optional
ADD_INFO:autoUpdatingInfo character Column added by updateUsagiFile Optional

vocabularies.csv file format

The vocabularies.csv file is used to describe the vocabularies to be processed. It is a csv file with the following columns:

Name Type Description Rules
source_vocabulary_id character The id of the vocabulary not empty, less than 20 characters
source_vocabulary_name character A description of the vocabulary not empty, less than 255 characters
source_concept_id_offset integer The offset of the source concept id not empty, number over 2 billion
path_to_usagi_file character The path to the vocabulary’s Usagi file not empty, file must exist
path_to_news_file character The path to the vocabulary’s news file not empty, file must exist
ignore boolean Indicates if the vocabulary should be ignored in processing not empty

SOURCE_TO_CONCEPT_MAP_EXTENDED table format

The SOURCE_TO_CONCEPT_MAP_EXTENDED is an extension of the SOURCE_TO_CONCEPT_MAP table, see CDM. It is used to store the source to concept map extended information.

The SOURCE_TO_CONCEPT_MAP_EXTENDED table has the following columns:

Name Type Description Rules
source_code character Source code for the concept not empty
source_vocabulary_id character Source vocabulary the concept was mapped from not empty, must exist in VOCABULARY table
source_code_description character Description of source code not empty
target_concept_id integer Concept ID of the target concept not empty, must exist in CONCEPT table
target_vocabulary_id character Target vocabulary the concept was mapped to not empty, must exist in VOCABULARY table
valid_start_date date Date when mapping became valid not empty, must be before valid_end_date
valid_end_date date Date when mapping became invalid not empty, must be after valid_start_date
invalid_reason character Reason why mapping was invalidated empty if valid_end_date is 2099-12-31
source_concept_id integer Source concept ID not empty
source_concept_class character Concept class in source vocabulary not empty, less than 20 characters
source_domain character Domain in source vocabulary not empty, must exist in DOMAIN table
source_parents_concept_ids character Parent concept IDs in source vocabulary optional, comma-separated list