Available Components¶
Forward experiments use three modular components to standardize access to genetes, phenotypes and statistical testing. Some components are built-in with Forward and are compatible with common data formats (described here).
Also note that you can write and use your own implementations. Simply follow the instructions from the Extending Forward section.
Tasks¶
class | parameters | variant type | outcome type | reference |
---|---|---|---|---|
forward.tasks.LinearTest |
|
common (MAF < 0.05) | continuous | |
forward.tasks.LogisticTest |
|
common (MAF < 0.05) | discrete | |
forward.tasks.SKATTest |
|
Sets of variants. Can test rare or common. | discrete or continuous | website |
Genotype containers¶
class | parameters | file type | Notes |
---|---|---|---|
forward.genotype.MemoryImpute2Geno |
|
Small impute2 files | This container load the genotype file in memory. It is fast, but not suitable for large files. IMPUTE2 file parsing is done using gepyto |
forward.genotype.PlinkGenotypeDatabase |
|
Binary plink files (bed, bim , fam) | This container uses pyplink to parse the binary plink files. |
Phenotype containers¶
class | parameters | file_type | Notes |
---|---|---|---|
CSVPhenotypeDatabase |
|
delimited files (e.g. CSV, TSV) | This is an implementation of
forward.phenotype.db.PandasPhenotypeDatabase .
Most of the parameters are passed to the Pandas parser. You
can refer to
their docs for more information. |
ExcelPhenotypeDatabase |
|
Excel files | This is an implementation of
forward.phenotype.db.PandasPhenotypeDatabase . |
Python documentation¶
Tasks¶
This module provides actual implementations of the genetic tests.
Genotype containers¶
-
class
forward.genotype.
MemoryImpute2Geno
(filename, samples, filter_probability=0, **kwargs)[source]¶ Container for small(ish) IMPUTE2 files.
Parameters: - filename (str) – The filename for the IMPUTE2 file.
- samples – A list containing a single column and no header. The rows are the ordered sample IDs.
- filter_probability (float) – A cutoff for imputation probability. Only genotypes with an imputation probability above this threshold will be used for the analysis.
Warning
This implementation load the whole file in memory (hence the name). Be careful and make sure that you have enough RAM to hold everything.
It would be fairly easy to subclass this and support lazily reading genotype data from the disk. Feel free to contribute this feature if you need it.
-
exclude_samples
(samples_list)[source]¶ Exclude samples in the list.
Parameters: samples_list (list) – A list of samples to exclude. The returned genotype vectors will not have genotypes for excluded samples (i.e. they will be n elements shorter, n = len(samples_list))
This is a configuration option.
-
experiment_init
(experiment, batch_insert_n=100000)[source]¶ Experiment specific initialization.
This takes care of initializing the database and filtering variants. It is automatically called by the Experiment.
-
filter_completion
(rate)[source]¶ Apply a filter on completion rate.
Parameters: rate (float) – The minimum completion rate for inclusion. This is a configuration option.
-
filter_maf
(maf)[source]¶ Apply a filter on minor allele frequency.
Parameters: rate (float) – The minimum maf for inclusion. This is a configuration option.
-
class
forward.genotype.
PlinkGenotypeDatabase
(prefix, **kwargs)[source]¶ Container for binary PLINK files.
Parameters: prefix (str) – The prefix of the PLINK bed, bim, fam files. This container relies on pyplink.
-
experiment_init
(experiment)[source]¶ Initialization method called by the Experiment.
Applies filtering, creates and fills the database.
-
filter_completion
(rate)[source]¶ Filters variants by completion rate (rate of no-calls).
This is a configuration option.
-
Phenotype containers¶
This module is used to formalize the expected phenotype structure for forward. It’s role is to provide a reusable interface to feed phenotype (and covariate) data to the statistical engine.