Forward’s database

Relational databases are automatically built to hold analysis results for any experiment. This is done using SQLAlchemy an Object Relational Mapper for Python. This package makes it easy to use one of the many supported RDBMS backends while writing technology agnostic code. It also maps database entries to Python objects which makes the integration seamless and facilitates some technical aspects.

This section describes the different Python classes that will get translated to database tables.

Variant table

class forward.genotype.Variant(**kwargs)[source]

ORM Variant object.

Column Description Type
name The variant’s name (e.g. rs12356) String(25)
chrom The chromosome (e.g. ‘3’ or ‘X’) String(15)
pos The variant’s position on the chromosome Integer
mac Minor allele count (e.g. 125) Integer
minor Least common allele (e.g. ‘T’) String(10)
major Most common allele (e.g. ‘C’) String(10)
n_missing Number of missing genotypes for this variant Integer
n_non_missing Number of non-missing genotypes for this variant Integer

Computed fields:

  • maf: mac / (2 \(\cdot\) n non missing)
  • completion_rate: n non missing / (n non missing + n missing)

Results tables

class forward.experiment.ExperimentResult(**kwargs)[source]

SQLAlchemy class to handle experimental results.

Column Description Type
pk The primary key (autoincrement) Integer
tested_entity Either ‘variant’ or ‘snp-set’ depending on the test. Enum
results_type Polymorphic identity to identify (default: ‘GenericResults’) String(25)
task_name Name of the task String(25)
entity_name Variant or snp set identifier String(25)
phenotype Tested outcome String(30)
significance Significance value (e.g. p-value) Float
coefficient Effect coefficient (e.g. beta) Float
test_statistic Statistical test statistic (e.g. t-test statistic) Float
standard_error Standard error of the coefficient Float
confidence_interval_min Lower bound of the 95% CI on the coefficient Float
confidence_interval_max Higher bound of the 95% CI on the coefficient Float

Todo

This could potentially be refactored so that tested_entity is a foreign key to the forward.genotype.Variant class. This would require using concrete table inheritance.

class forward.tasks.LinearTestResults(**kwargs)[source]

Table for extra statistical reporting for linear regression.

Column Description Type
pk The primary key, the same as the experiment.ExperimentResult Integer
adjusted_r_squared The adjusted R squared as reported by statsmodels Float
std_beta The standardized effect size. This is for \(x, y \sim \mathcal{N}(0,1)\) Float
std_beta_min Lower bound of the 95% CI for the standardized Beta Float
std_beta_max Higher bound of the 95% CI Float

It is interesting to report the standardized beta to easily compare the effect size between different outcomes (that have different units). We will also report the coefficient of determination (R^2) that reports the fraction of explained variance.