Available Components¶

Forward experiments use three modular components to standardize access to genetes, phenotypes and statistical testing. Some components are built-in with Forward and are compatible with common data formats (described here).

Also note that you can write and use your own implementations. Simply follow the instructions from the Extending Forward section.

Tasks¶

class	parameters	variant type	outcome type	reference
`forward.tasks.LinearTest`	outcomes covariates variants correction alpha	common (MAF < 0.05)	continuous
`forward.tasks.LogisticTest`	outcomes covariates variants correction alpha	common (MAF < 0.05)	discrete
`forward.tasks.SKATTest`	outcomes covariates variants correction alpha snp_set_file	Sets of variants. Can test rare or common.	discrete or continuous	website

Genotype containers¶

class	parameters	file type	Notes
`forward.genotype.MemoryImpute2Geno`	filter_name filter_maf filter_completion filename samples filter_probability	Small impute2 files	This container load the genotype file in memory. It is fast, but not suitable for large files. IMPUTE2 file parsing is done using gepyto
`forward.genotype.PlinkGenotypeDatabase`	prefix filter_maf filter_completion	Binary plink files (bed, bim , fam)	This container uses pyplink to parse the binary plink files.

Phenotype containers¶

class	parameters	file_type	Notes
`CSVPhenotypeDatabase`	filename sample_column sep compression header skiprows names na_values decimal exclude_correlated	delimited files (e.g. CSV, TSV)	This is an implementation of `forward.phenotype.db.PandasPhenotypeDatabase`. Most of the parameters are passed to the Pandas parser. You can refer to their docs for more information.
`ExcelPhenotypeDatabase`	filename sample_column missing_values exclude_correlated	Excel files	This is an implementation of `forward.phenotype.db.PandasPhenotypeDatabase`.

Python documentation¶

Tasks¶

This module provides actual implementations of the genetic tests.

class forward.tasks.LinearTest(*args, **kwargs)[source]¶: Linear regression genetic test.

class forward.tasks.LogisticTest(*args, **kwargs)[source]¶

Logistic regression genetic test.

run_task(experiment, task_name, work_dir)[source]¶: Run the logistic regression.

class forward.tasks.SKATTest(*args, **kwargs)[source]¶

Binding to SKAT (using rpy2).

static check_skat()[source]¶: Check if SKAT is installed.

run_task(experiment, task_name, work_dir)[source]¶: Run the SKAT analysis.

Genotype containers¶

class forward.genotype.MemoryImpute2Geno(filename, samples, filter_probability=0, **kwargs)[source]¶

Container for small(ish) IMPUTE2 files.

Parameters:	filename (str) – The filename for the IMPUTE2 file. samples – A list containing a single column and no header. The rows are the ordered sample IDs. filter_probability (float) – A cutoff for imputation probability. Only genotypes with an imputation probability above this threshold will be used for the analysis.

Warning

This implementation load the whole file in memory (hence the name). Be careful and make sure that you have enough RAM to hold everything.

It would be fairly easy to subclass this and support lazily reading genotype data from the disk. Feel free to contribute this feature if you need it.

exclude_samples(samples_list)[source]¶

Exclude samples in the list.

Parameters:	samples_list (list) – A list of samples to exclude.

The returned genotype vectors will not have genotypes for excluded samples (i.e. they will be n elements shorter, n = len(samples_list))

This is a configuration option.

experiment_init(experiment, batch_insert_n=100000)[source]¶

Experiment specific initialization.

This takes care of initializing the database and filtering variants. It is automatically called by the Experiment.

filter_completion(rate)[source]¶

Apply a filter on completion rate.

Parameters:	rate (float) – The minimum completion rate for inclusion.

This is a configuration option.

filter_maf(maf)[source]¶

Apply a filter on minor allele frequency.

Parameters:	rate (float) – The minimum maf for inclusion.

This is a configuration option.

filter_name(names_list)[source]¶

Only includes variants in a list.

Parameters:	names_list (str) – Either a list of variant names or the path to a file containing a single column of variant names.

This is a configuration option.

get_genotypes(variant_name)[source]¶

Get a vector of genotypes for a variant.

Parameters:	variant_name (str) – The variant name (e.g. rs123456)
Returns:	A vector of genotypes (g = 0, 1 or 2; the number of non-reference alleles).
Return type:	np.nadarray

class forward.genotype.PlinkGenotypeDatabase(prefix, **kwargs)[source]¶

Container for binary PLINK files.

Parameters:	prefix (str) – The prefix of the PLINK bed, bim, fam files.

This container relies on pyplink.

experiment_init(experiment)[source]¶

Initialization method called by the Experiment.

Applies filtering, creates and fills the database.

filter_completion(rate)[source]¶

Filters variants by completion rate (rate of no-calls).

This is a configuration option.

filter_maf(maf)[source]¶

Filters variants by allele frequency (MAF).

This is a configuration option.

filter_name(variant_list)[source]¶

Filter by variant name.

This is a configuration option.

get_genotypes(variant_name)[source]¶: Returns a genotype vector for the given variant.

get_sample_order()[source]¶: Return a list of the (ordered) samples as represented in the database.

Phenotype containers¶

This module is used to formalize the expected phenotype structure for forward. It’s role is to provide a reusable interface to feed phenotype (and covariate) data to the statistical engine.

class forward.phenotype.db.CSVPhenotypeDatabase(filename, sample_column, **kwargs)[source]¶: Collection of phenotypes based on a CSV file.

class forward.phenotype.db.ExcelPhenotypeDatabase(filename, sample_column, missing_values=None, **kwargs)[source]¶

Collection of phenotypes based on an Excel file.

Only the first sheet is considered.

Table Of Contents

Previous topic

Next topic

This Page

Available Components¶

Tasks¶

Genotype containers¶

Phenotype containers¶

Python documentation¶

Tasks¶

Genotype containers¶

Phenotype containers¶