-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instructions for using sqlite DDLs for non-R users #693
Comments
Thanks for your script, saved me a lot of effort. I extended your script to replace tabs with spaces: #!/bin/bash
#
# File: update-ddl.sh
#
# Converts the OMOP-provided DDL database definition scripts into a format
# that can be loaded by sqlite3.
#
# This has been tested against the provided sqlite 5.4 DDL files.
#
# To run this, place copies of the provided .sql files into this directory
# and run the script. It will modify the sql files and create an empty
# database called cdm.db
#
set -e
set -o pipefail
set -u
# @thisismexp: replace \t with spaces
d="$(mktemp -d)"
for filename in *.sql; do
expand -t 4 "$filename" > "$d"/"$filename"
done
mv "$d"/* .
rmdir "$d"
# There should be only one DDL file in the local directory.
DDL=$( ls OMOPCDM*_ddl.sql )
[ ! -e ${DDL} ] && echo "Failed to find an OMOPCDM*_ddl.sql schema!" && exit 1
# Awk scriptlet to add a PRIMARY KEY declaration to a table.column in a table
# definition.
#
# Assumptions:
# 'table' is padded with spaces
# 'column' begins with a tab and ends with a space
#
AWK_PK='
/^CREATE TABLE / {
if ($0 ~ table) { target=1 } else { target=0 }
}
{
if (target && $0 ~ column && (!($0 ~ / PRIMARY KEY,/))) {
sub(/,/, " PRIMARY KEY,");
}
print $0
}
'
# Awk scriptlet to add a FOREIGN KEY declaration to a table.column in a table
# definition.
#
# Assumptions:
# 'table' is padded with spaces
# 'column' begins with a tab and ends with a space
# 'ref' is multiple words long
#
AWK_FK='
/^CREATE TABLE / {
if ($0 ~ table) { target=1 } else { target=0 }
}
{
if (target && $0 ~ column && (!($0 ~ / REFERENCES /))) {
if ($0 ~ /,$/) {
sub(/,$/, " REFERENCES " ref ",");
} else {
sub(/ );$/, " REFERENCES " ref ");");
}
}
print $0
}
'
##
# Gets the table names and PK columns from constraints.
#
function get_pks() {
grep '^ALTER TABLE ' OMOPCDM*_primary_keys.sql \
| sed -e 's:.* \(.*\) ADD CONSTRAINT .* PRIMARY KEY .\(.*\).;:\1 \2:'
}
##
# Decorate the PK columns with PRIMARY KEY if not already decorated.
#
function mark_pk() {
# Apply the change to the ddl file content, saving into a temp file.
cat ${DDL} | awk -v table=" ${1} " -v column=" ${2} " "${AWK_PK}" \
> ${DDL}.tmp
# diff the files to assert diff count is 1 changed line only.
(( $( diff -by --suppress-common-lines ${DDL} ${DDL}.tmp | wc -l ) == 1 )) || exit 1
# Apply the temp file to the DDL file.
mv ${DDL}.tmp ${DDL}
}
function get_fks() {
grep '^ALTER TABLE ' OMOPCDM*_constraints.sql \
| sed -e 's:ALTER TABLE \(.*\) ADD CONSTRAINT .* FOREIGN KEY .\([^ ]*\). REFERENCES:\1 \2:' \
| sed -e 's:;::'
}
##
# Decorate the PK columns with PRIMARY KEY if not already decorated.
#
function mark_fk() {
# Apply the change to the ddl file content, saving into a temp file.
cat ${DDL} \
| awk -v table=" ${1} " -v column=" ${2} " -v ref="${3}" "${AWK_FK}" \
> ${DDL}.tmp
# diff the files to assert diff count is 1 changed line only.
(( $( diff -by --suppress-common-lines ${DDL} ${DDL}.tmp | wc -l ) == 1 )) || exit 1
# Apply the temp file to the DDL file.
mv ${DDL}.tmp ${DDL}
}
# Remove the template schema reference, which isn't supported in sqlite.
sed -i -e 's:@cdmDatabaseSchema.::g' OMOPCDM*.sql && rm -f OMOPCDM*.sql-e
# Update the table definitions to insert the PRIMARY KEY declarations.
echo "Adding PRIMARY KEYS to DDL"
get_pks | while read table column; do
mark_pk ${table} ${column}
done
# Insert the FK constraints into the table.
echo "Adding FOREIGN KEYS to DDL"
get_fks | while read table column reference; do
mark_fk ${table} ${column} "${reference}"
done
# HACK: no PK in the cohort table, so this is needed to satisfy FK checks.
mark_pk cohort cohort_definition_id
# Populate the DB tables and indices.
echo "Creating empty cdm.db database"
rm -f cdm.db
touch cdm.db
sqlite3 cdm.db < ${DDL}
sqlite3 cdm.db < OMOPCDM*_indices.sql
sqlite3 cdm.db "pragma foreign_key_check"
# Success!
echo "DONE!"```
</details> |
This is at an early stage of development- but would be grateful for any feedback - but this might help efforts with SQLite. See https://github.com/wardle/iort - it can now dynamically generate and execute DDL and handle the vocab import and is working for SQLite and PostgreSQL. I need to add customisation to choose different databases in a similar way to schema in PostgreSQL for example, but might help? |
For those on this thread, or anyone who lands here. I'm working with a sqlite cdm database for a demo, which uses an experimental new Demo: https://github.com/barabo/fhir-to-omop-demo Both are still WIP, but this is what motivated me to update the DDL like this. |
Hi there!
I've been playing around trying to get the DDLs loaded for sqlite3, version 5.4 using the
sqlite3
command line and I've been having a pretty rough go of it using the provided definitions.I didn't find any documentation here or elsewhere about the
@cdmDatabaseSchema
notation in the ddl files, so I went ahead and removed them. I assume they are something used by the sqlrender utility - but I'm not an R user - I'm just trying to create a working OMOPCDM database in sqlite.I also discovered that sqlite does not allow you to apply primary or foreign keys to tables using the
ALTER TABLE
commands provided. So, I wrote a brittle little script to apply the PK and FK definitions to the main_ddl.sql
file.This allows me to use the patched
_ddl.sql
file to define tables with PKs and FKs. The_indices.sql
file seems fine as-is.I did notice that the
COHORT
table is missing aPRIMARY KEY
, so the foreign key check fails in a clean install for just that table. It almost seems likeCOHORT
should havecohort_id integer PRIMARY KEY
andCOHORT_DEFINITION
should havecohort_definition_id integer PRIMARY KEY
- and then theCOHORT
table should have a FK onCOHORT_DEFINITION.cohort_definition_id
(instead, it seems backwards). But as it is now, if I makeCOHORT.cohort_definition_id
the PK of theCOHORT
table - it gets me past the failing foreign key checks. Anyway, I expect some of that will change in future versions - my script includes a hack for this.For posterity, I'm providing my script here. However, are there instructions for people who don't want to use R but just want to create a working DB out there somewhere?
Click to see my script
This updates the
_ddl.sql
file to look like this:When the script runs correctly, you see output like this:
The text was updated successfully, but these errors were encountered: