Forward’s database¶
Relational databases are automatically built to hold analysis results for any experiment. This is done using SQLAlchemy an Object Relational Mapper for Python. This package makes it easy to use one of the many supported RDBMS backends while writing technology agnostic code. It also maps database entries to Python objects which makes the integration seamless and facilitates some technical aspects.
This section describes the different Python classes that will get translated to database tables.
Variant table¶
-
class
forward.genotype.
Variant
(**kwargs)[source]¶ ORM Variant object.
Column Description Type name The variant’s name (e.g. rs12356) String(25) chrom The chromosome (e.g. ‘3’ or ‘X’) String(15) pos The variant’s position on the chromosome Integer mac Minor allele count (e.g. 125) Integer minor Least common allele (e.g. ‘T’) String(10) major Most common allele (e.g. ‘C’) String(10) n_missing Number of missing genotypes for this variant Integer n_non_missing Number of non-missing genotypes for this variant Integer Computed fields:
maf
: mac / (2 \(\cdot\) n non missing)completion_rate
: n non missing / (n non missing + n missing)
Results tables¶
-
class
forward.experiment.
ExperimentResult
(**kwargs)[source]¶ SQLAlchemy class to handle experimental results.
Column Description Type pk The primary key (autoincrement) Integer tested_entity Either ‘variant’ or ‘snp-set’ depending on the test. Enum results_type Polymorphic identity to identify (default: ‘GenericResults’) String(25) task_name Name of the task String(25) entity_name Variant or snp set identifier String(25) phenotype Tested outcome String(30) significance Significance value (e.g. p-value) Float coefficient Effect coefficient (e.g. beta) Float test_statistic Statistical test statistic (e.g. t-test statistic) Float standard_error Standard error of the coefficient Float confidence_interval_min Lower bound of the 95% CI on the coefficient Float confidence_interval_max Higher bound of the 95% CI on the coefficient Float Todo
This could potentially be refactored so that tested_entity is a foreign key to the forward.genotype.Variant class. This would require using concrete table inheritance.
-
class
forward.tasks.
LinearTestResults
(**kwargs)[source]¶ Table for extra statistical reporting for linear regression.
Column Description Type pk The primary key, the same as the experiment.ExperimentResult
Integer adjusted_r_squared The adjusted R squared as reported by statsmodels Float std_beta The standardized effect size. This is for \(x, y \sim \mathcal{N}(0,1)\) Float std_beta_min Lower bound of the 95% CI for the standardized Beta Float std_beta_max Higher bound of the 95% CI Float It is interesting to report the standardized beta to easily compare the effect size between different outcomes (that have different units). We will also report the coefficient of determination (R^2) that reports the fraction of explained variance.