Available Components

Forward experiments use three modular components to standardize access to genetes, phenotypes and statistical testing. Some components are built-in with Forward and are compatible with common data formats (described here).

Also note that you can write and use your own implementations. Simply follow the instructions from the Extending Forward section.

Tasks

class parameters variant type outcome type reference
forward.tasks.LinearTest
  • outcomes
  • covariates
  • variants
  • correction
  • alpha
common (MAF < 0.05) continuous  
forward.tasks.LogisticTest
  • outcomes
  • covariates
  • variants
  • correction
  • alpha
common (MAF < 0.05) discrete  
forward.tasks.SKATTest
  • outcomes
  • covariates
  • variants
  • correction
  • alpha
  • snp_set_file
Sets of variants. Can test rare or common. discrete or continuous website

Genotype containers

class parameters file type Notes
forward.genotype.MemoryImpute2Geno
  • filter_name
  • filter_maf
  • filter_completion
  • filename
  • samples
  • filter_probability
Small impute2 files This container load the genotype file in memory. It is fast, but not suitable for large files. IMPUTE2 file parsing is done using gepyto
forward.genotype.PlinkGenotypeDatabase
  • prefix
  • filter_maf
  • filter_completion
Binary plink files (bed, bim , fam) This container uses pyplink to parse the binary plink files.

Phenotype containers

class parameters file_type Notes
CSVPhenotypeDatabase
  • filename
  • sample_column
  • sep
  • compression
  • header
  • skiprows
  • names
  • na_values
  • decimal
  • exclude_correlated
delimited files (e.g. CSV, TSV) This is an implementation of forward.phenotype.db.PandasPhenotypeDatabase. Most of the parameters are passed to the Pandas parser. You can refer to their docs for more information.
ExcelPhenotypeDatabase
  • filename
  • sample_column
  • missing_values
  • exclude_correlated
Excel files This is an implementation of forward.phenotype.db.PandasPhenotypeDatabase.

Python documentation

Tasks

This module provides actual implementations of the genetic tests.

class forward.tasks.LinearTest(*args, **kwargs)[source]

Linear regression genetic test.

class forward.tasks.LogisticTest(*args, **kwargs)[source]

Logistic regression genetic test.

run_task(experiment, task_name, work_dir)[source]

Run the logistic regression.

class forward.tasks.SKATTest(*args, **kwargs)[source]

Binding to SKAT (using rpy2).

static check_skat()[source]

Check if SKAT is installed.

run_task(experiment, task_name, work_dir)[source]

Run the SKAT analysis.

Genotype containers

class forward.genotype.MemoryImpute2Geno(filename, samples, filter_probability=0, **kwargs)[source]

Container for small(ish) IMPUTE2 files.

Parameters:
  • filename (str) – The filename for the IMPUTE2 file.
  • samples – A list containing a single column and no header. The rows are the ordered sample IDs.
  • filter_probability (float) – A cutoff for imputation probability. Only genotypes with an imputation probability above this threshold will be used for the analysis.

Warning

This implementation load the whole file in memory (hence the name). Be careful and make sure that you have enough RAM to hold everything.

It would be fairly easy to subclass this and support lazily reading genotype data from the disk. Feel free to contribute this feature if you need it.

exclude_samples(samples_list)[source]

Exclude samples in the list.

Parameters:samples_list (list) – A list of samples to exclude.

The returned genotype vectors will not have genotypes for excluded samples (i.e. they will be n elements shorter, n = len(samples_list))

This is a configuration option.

experiment_init(experiment, batch_insert_n=100000)[source]

Experiment specific initialization.

This takes care of initializing the database and filtering variants. It is automatically called by the Experiment.

filter_completion(rate)[source]

Apply a filter on completion rate.

Parameters:rate (float) – The minimum completion rate for inclusion.

This is a configuration option.

filter_maf(maf)[source]

Apply a filter on minor allele frequency.

Parameters:rate (float) – The minimum maf for inclusion.

This is a configuration option.

filter_name(names_list)[source]

Only includes variants in a list.

Parameters:names_list (str) – Either a list of variant names or the path to a file containing a single column of variant names.

This is a configuration option.

get_genotypes(variant_name)[source]

Get a vector of genotypes for a variant.

Parameters:variant_name (str) – The variant name (e.g. rs123456)
Returns:A vector of genotypes (g = 0, 1 or 2; the number of non-reference alleles).
Return type:np.nadarray
class forward.genotype.PlinkGenotypeDatabase(prefix, **kwargs)[source]

Container for binary PLINK files.

Parameters:prefix (str) – The prefix of the PLINK bed, bim, fam files.

This container relies on pyplink.

experiment_init(experiment)[source]

Initialization method called by the Experiment.

Applies filtering, creates and fills the database.

filter_completion(rate)[source]

Filters variants by completion rate (rate of no-calls).

This is a configuration option.

filter_maf(maf)[source]

Filters variants by allele frequency (MAF).

This is a configuration option.

filter_name(variant_list)[source]

Filter by variant name.

This is a configuration option.

get_genotypes(variant_name)[source]

Returns a genotype vector for the given variant.

get_sample_order()[source]

Return a list of the (ordered) samples as represented in the database.

Phenotype containers

This module is used to formalize the expected phenotype structure for forward. It’s role is to provide a reusable interface to feed phenotype (and covariate) data to the statistical engine.

class forward.phenotype.db.CSVPhenotypeDatabase(filename, sample_column, **kwargs)[source]

Collection of phenotypes based on a CSV file.

class forward.phenotype.db.ExcelPhenotypeDatabase(filename, sample_column, missing_values=None, **kwargs)[source]

Collection of phenotypes based on an Excel file.

Only the first sheet is considered.