Title: | Assessing Predisposition Between Phenotypes using Polygenic Scores |
---|---|
Description: | Using polygenic scores (PGS, or PRS/GRS for binary outcomes), this package allows to investigate shared predisposition between different conditions, and do fast association analysis, export plots and views of the PGS distribution using 'ggplot2' object. |
Authors: | Vincent Pascat [aut, cre] |
Maintainer: | Vincent Pascat <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.4.0 |
Built: | 2025-02-21 06:01:46 UTC |
Source: | https://github.com/vp-biostat/comorbidpgs |
assoc()
takes a distribution of PGS, a Phenotype and eventual Confounders.
Returns a data frame showing the association of PGS on the Phenotype
assoc( df = NULL, prs_col = "SCORESUM", phenotype_col = "Phenotype", scale = TRUE, covar_col = NA, verbose = TRUE, log = "" )
assoc( df = NULL, prs_col = "SCORESUM", phenotype_col = "Phenotype", scale = TRUE, covar_col = NA, verbose = TRUE, log = "" )
df |
a dataframe with individuals on each row, and at least the following columns:
|
prs_col |
a character specifying the PGS column name |
phenotype_col |
a character specifying the Phenotype column name |
scale |
a boolean specifying if scaling of PGS should be done before testing |
covar_col |
a character vector specifying the covariate column names (facultative) |
verbose |
a boolean (TRUE by default) to write in the console/log messages. |
log |
a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. |
return a data frame showing the association of the PGS on the Phenotype with the following columns:
PGS: the name of the PGS
Phenotype: the name of Phenotype
Phenotype_type: either 'Continuous'
, 'Ordered Categorical'
, 'Categorical'
or 'Cases/Controls'
Stat_method: association function detects what is the phenotype type and what is the best way to analyse it, either 'Linear regression'
, 'Binary logistic regression'
, 'Ordinal logistic regression'
or 'Multinomial logistic regression'
Covar: list all the covariates used for this association
N_cases: if Phenotype_type is Cases/Controls, gives the number of cases
N_controls: if Phenotype_type is Cases/Controls, gives the number of controls
N: the number of individuals/samples
Effect: if Phenotype_type is Continuous, it represents the Beta coefficient of linear regression; Otherwise, it is the OR of logistic regression
SE: standard error of the Beta coefficient (if Phenotype_type is Continuous)
lower_CI: lower confidence interval of the related Effect (Beta or OR)
upper_CI: upper confidence interval of the related Effect (Beta or OR)
P_value: associated P-value
results <- assoc( df = comorbidData, prs_col = "ldl_PGS", phenotype_col = "log_ldl", scale = TRUE, covar_col = c("age", "sex", "gen_array") ) print(results)
results <- assoc( df = comorbidData, prs_col = "ldl_PGS", phenotype_col = "log_ldl", scale = TRUE, covar_col = c("age", "sex", "gen_array") ) print(results)
assocplot()
takes a data frame of associations. Returns plot of the associations
from assoc()
(ggplot2 object or list of ggplot object)
assocplot(score_table = NULL, axis = "vertical", pval = FALSE)
assocplot(score_table = NULL, axis = "vertical", pval = FALSE)
score_table |
a dataframe with association results with at least the following columns:
|
axis |
a character, |
pval |
a parameter specifying information on how to display P-value
|
return either:
a ggplot object representing the association results.
a list of two ggplot objects, accessible by $continuous_phenotype and $discrete_phenotype, if there are both Continuous Phenotypes and Discrete Phenotypes (i.e. "Categorical" or "Cases/Controls")
centileplot()
takes a distribution of PGS, a Phenotype and eventual Confounders.
Returns a plot (ggplot2 object) with centiles (or deciles if not enough individuals)
of PGS in x and Prevalence/Median/Mean of the Phenotype in y
centileplot( df = NULL, prs_col = "SCORESUM", phenotype_col = "Phenotype", decile = FALSE, continuous_metric = NA )
centileplot( df = NULL, prs_col = "SCORESUM", phenotype_col = "Phenotype", decile = FALSE, continuous_metric = NA )
df |
a dataframe with individuals on each row, and at least the following columns:
|
prs_col |
a character specifying the PGS column name |
phenotype_col |
a character specifying the Phenotype column name |
decile |
a boolean specifying if centiles or deciles should be used |
continuous_metric |
a facultative character specifying what metric to
use for continuous Phenotype, only three options: |
return a figure of results in the format ggplot2 object
A dataset with sets of PGSs, Phenotypes and Covariates to demo the comorbidPGS package
comorbidData
comorbidData
who
A data frame with 10,000 rows (individuals) and 16 columns:
Individual's identifier, characters
Sex of the individuals, binary numeric values
Age of the individuals, numeric value
The genotypic array used for those individuals, factor values
The ethnicity of individuals, can be also used as Categorical Phenotype, factor values
Three distributions of PGS for Breast Cancer, Type 2 Diabetes and Hypertension respectively; numeric values
Three Cases/Controls Phenotypes, representing Breast Cancer, Type 2 Diabetes and Hypertension respectively; binary values
Three Continuous Phenotypes, representing low-density lipoprotein, body-mass index, and systolic blood pressure respectively; numeric values
A continuous Phenotype, based on log(ldl) to have a normal distribution; numeric values
An Ordered Categorical Phenotype, with 3 possible outcomes: low, normal or high systolic blood pressure; factor values
https://github.com/VP-biostat/comorbidPGS
decileboxplot()
takes a distribution of PGS, a Continuous Phenotype.
Returns a plot with deciles of PGS in x and Boxplot of the Phenotype in y
decileboxplot(df = NULL, prs_col = "SCORESUM", phenotype_col = "Phenotype")
decileboxplot(df = NULL, prs_col = "SCORESUM", phenotype_col = "Phenotype")
df |
a dataframe with individuals on each row, and at least the following columns:
|
prs_col |
a character specifying the PGS column name |
phenotype_col |
a character specifying the Continuous Phenotype column name |
return a ggplot object (ggplot2)
densityplot()
takes a distribution of PGS, a Phenotype and eventual Confounders.
Returns a plot with density of PGS in x by Categories of the Phenotype
densityplot( df = NULL, prs_col = "SCORESUM", phenotype_col = "Phenotype", scale = TRUE, threshold = NA )
densityplot( df = NULL, prs_col = "SCORESUM", phenotype_col = "Phenotype", scale = TRUE, threshold = NA )
df |
a dataframe with individuals on each row, and at least the following columns:
|
prs_col |
a character specifying the PGS column name |
phenotype_col |
a character specifying the Phenotype column name |
scale |
a boolean specifying if scaling of PGS should be done before plotting |
threshold |
a facultative numeric specifying for Continuous Phenotype the Threshold to consider individuals as Cases/Controls as following:
|
return a ggplot object (ggplot2)
mr_2sls()
takes a distribution of PGS, an Exposure (Phenotype), an Outcome (Phenotype).
Returns a data frame of the result of the Mendelian Randomization 2SLS methods using PGS
mr_2sls( df = NULL, prs_col = "SCORESUM", exposure_col = NA, outcome_col = NA, scale = TRUE, verbose = TRUE, log = "" )
mr_2sls( df = NULL, prs_col = "SCORESUM", exposure_col = NA, outcome_col = NA, scale = TRUE, verbose = TRUE, log = "" )
df |
a dataframe with individuals on each row, and at least the following columns:
|
prs_col |
a character specifying the PGS column name |
exposure_col |
a character specifying the Exposure (Phenotype) column name |
outcome_col |
a character specifying the Outcome (Phenotype) column name |
scale |
a boolean specifying if scaling of PGS should be done before testing |
verbose |
a boolean (TRUE by default) to write in the console/log messages. |
log |
a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. |
return a data frame with the Mendelian Randomization association result using 2SLS method with the following columns:
PGS: the name of the PGS used
Exposure: the name of Phenotype used as Exposure
Outcome: the name of Phenotype used as Outcome
Method: the MR method used (here 2SLS)
N_cases: if Phenotype_type is Cases/Controls, the number of cases
N_controls: if Phenotype_type is Cases/Controls, the number of controls
N: the number of individuals/samples
MR_estimate: the MR estimate (beta) using the ratio method
SE: the associated standard error (second order)
F_stat: the F-statistic of the Exposure ~ PGS association
result <- mr_2sls( df = comorbidData, prs_col = "ldl_PGS", exposure_col = "log_ldl", outcome_col = "bmi", scale = TRUE ) print(result)
result <- mr_2sls( df = comorbidData, prs_col = "ldl_PGS", exposure_col = "log_ldl", outcome_col = "bmi", scale = TRUE ) print(result)
mr_ratio()
takes a distribution of PGS, an Exposure (Phenotype), an Outcome (Phenotype).
Returns a data frame showing the Mendelian Randomization ratio methods using PGS
mr_ratio( df = NULL, prs_col = "SCORESUM", exposure_col = NA, outcome_col = NA, scale = TRUE, verbose = TRUE, log = "" )
mr_ratio( df = NULL, prs_col = "SCORESUM", exposure_col = NA, outcome_col = NA, scale = TRUE, verbose = TRUE, log = "" )
df |
a dataframe with individuals on each row, and at least the following columns:
|
prs_col |
a character specifying the PGS column name |
exposure_col |
a character specifying the Exposure (Phenotype) column name |
outcome_col |
a character specifying the Outcome (Phenotype) column name |
scale |
a boolean specifying if scaling of PGS should be done before testing |
verbose |
a boolean (TRUE by default) to write in the console/log messages. |
log |
a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. |
return a data frame with the Mendelian Randomization association result using the ratio method with the following columns:
PGS: the name of the PGS used
Exposure: the name of Phenotype used as Exposure
Outcome: the name of Phenotype used as Outcome
Method: the MR method used (here Ratio)
N_cases: if Phenotype_type is Cases/Controls, the number of cases
N_controls: if Phenotype_type is Cases/Controls, the number of controls
N: the number of individuals/samples
MR_estimate: the MR estimate (beta) using the ratio method
SE: the associated standard error (second order)
F_stat: the F-statistic of the Exposure ~ PGS association
result <- mr_ratio( df = comorbidData, prs_col = "ldl_PGS", exposure_col = "log_ldl", outcome_col = "bmi", scale = TRUE ) print(result)
result <- mr_ratio( df = comorbidData, prs_col = "ldl_PGS", exposure_col = "log_ldl", outcome_col = "bmi", scale = TRUE ) print(result)
multiassoc()
takes a data frame with distribution(s) of PGS and Phenotype(s),
and a table of associations to make from this data frame.
Returns a data frame showing the association results
multiassoc( df = NULL, assoc_table = NULL, scale = TRUE, covar_col = NA, verbose = TRUE, log = "", parallel = FALSE, num_cores = NA )
multiassoc( df = NULL, assoc_table = NULL, scale = TRUE, covar_col = NA, verbose = TRUE, log = "", parallel = FALSE, num_cores = NA )
df |
a dataframe with individuals on each row, and at least the following columns:
|
assoc_table |
a dataframe or matrix specifying the associations to make from df, with 2 columns: PGS and Phenotype (in this order) |
scale |
a boolean specifying if scaling of PGS should be done before testing |
covar_col |
a character vector specifying the covariate column names (facultative) |
verbose |
a boolean (TRUE by default) to write in the console/log messages. |
log |
a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. If parallel = TRUE, the log will be incomplete |
parallel |
a boolean, if TRUE, |
num_cores |
an integer, if parallel = TRUE (default), |
return a data frame showing the association of the PGS(s) on the Phenotype(s) with the following columns:
PGS: the name of the PGS
Phenotype: the name of Phenotype
Phenotype_type: either 'Continuous'
, 'Ordered Categorical'
, 'Categorical'
or 'Cases/Controls'
Stat_method: association function detects what is the phenotype type and what is the best way to analyse it, either 'Linear regression'
, 'Binary logistic regression'
, 'Ordinal logistic regression'
or 'Multinomial logistic regression'
Covar: list all the covariates used for this association
N_cases: if Phenotype_type is Cases/Controls, gives the number of cases
N_controls: if Phenotype_type is Cases/Controls, gives the number of controls
N: the number of individuals/samples
Effect: if Phenotype_type is Continuous, it represents the Beta coefficient of linear regression, OR of logistic regression otherwise
SE: standard error of the related Effect (Beta or OR)
lower_CI: lower confidence interval of the related Effect (Beta or OR)
upper_CI: upper confidence interval of the related Effect (Beta or OR)
P_value: associated P-value
assoc_table <- expand.grid( c("t2d_PGS", "ldl_PGS"), c("ethnicity","brc","t2d","log_ldl","sbp_cat") ) results <- multiassoc( df = comorbidData, assoc_table = assoc_table, covar_col = c("age", "sex", "gen_array"), parallel = FALSE, verbose = FALSE ) print(results)
assoc_table <- expand.grid( c("t2d_PGS", "ldl_PGS"), c("ethnicity","brc","t2d","log_ldl","sbp_cat") ) results <- multiassoc( df = comorbidData, assoc_table = assoc_table, covar_col = c("age", "sex", "gen_array"), parallel = FALSE, verbose = FALSE ) print(results)
multiphenassoc()
takes a distribution of PGS and multiple Phenotypes and eventual confounders.
Returns a data frame showing the association results
multiphenassoc( df = NULL, prs_col = "SCORESUM", phenotype_col = "Phenotype", scale = TRUE, covar_col = NA, verbose = TRUE, log = "" )
multiphenassoc( df = NULL, prs_col = "SCORESUM", phenotype_col = "Phenotype", scale = TRUE, covar_col = NA, verbose = TRUE, log = "" )
df |
a dataframe with individuals on each row, and at least the following columns:
|
prs_col |
a character specifying the PGS column name |
phenotype_col |
a character vector specifying the Phenotype column names |
scale |
a boolean specifying if scaling of PGS should be done before testing |
covar_col |
a character vector specifying the covariate column names (facultative) |
verbose |
a boolean (TRUE by default) to write in the console/log messages. |
log |
a connection, or a character string naming the file to print to. If "" (by default), it prints to the standard output connection, the console unless redirected by sink. |
return a data frame showing the association of the PGS on the Phenotypes with the following columns:
PGS: the name of the PGS
Phenotype: the name of Phenotype
Phenotype_type: either 'Continuous'
, 'Ordered Categorical'
, 'Categorical'
or 'Cases/Controls'
Stat_method: association function detects what is the phenotype type and what is the best way to analyse it, either 'Linear regression'
, 'Binary logistic regression'
, 'Ordinal logistic regression'
or 'Multinomial logistic regression'
Covar: list all the covariates used for this association
N_cases: if Phenotype_type is Cases/Controls, gives the number of cases
N_controls: if Phenotype_type is Cases/Controls, gives the number of controls
N: the number of individuals/samples
Effect: if Phenotype_type is Continuous, it represents the Beta coefficient of linear regression; Otherwise, it is the OR of logistic regression
SE: standard error of the Beta coefficient (if Phenotype_type is Continuous)
lower_CI: lower confidence interval of the related Effect (Beta or OR)
upper_CI: upper confidence interval of the related Effect (Beta or OR)
P_value: associated P-value