Title: | 'Shiny' Application for Whole Genome Duplication Analysis |
---|---|
Description: | Provides a comprehensive 'Shiny' application for analyzing Whole Genome Duplication ('WGD') events. This package provides a user-friendly 'Shiny' web application for non-experienced researchers to prepare input data and execute command lines for several well-known 'WGD' analysis tools, including 'wgd', 'ksrates', 'i-ADHoRe', 'OrthoFinder', and 'Whale'. This package also provides the source code for experienced researchers to adjust and install the package to their own server. Key Features 1) Input Data Preparation This package allows users to conveniently upload and format their data, making it compatible with various 'WGD' analysis tools. 2) Command Line Generation This package automatically generates the necessary command lines for selected 'WGD' analysis tools, reducing manual errors and saving time. 3) Visualization This package offers interactive visualizations to explore and interpret 'WGD' results, facilitating in-depth 'WGD' analysis. 4) Comparative Genomics Users can study and compare 'WGD' events across different species, aiding in evolutionary and comparative genomics studies. 5) User-Friendly Interface This 'Shiny' web application provides an intuitive and accessible interface, making 'WGD' analysis accessible to researchers and 'bioinformaticians' of all levels. |
Authors: | Jia Li [aut, cre], Zhen Li [ctb], Arthur Zwaenepoel [ctb] |
Maintainer: | Jia Li <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2025-02-17 05:32:19 UTC |
Source: | https://github.com/li081766/shinywgd |
This function performs synteny analysis for clusters identified by hierarchical clustering.
analysisEachCluster( segmented_file, segmented_anchorpoints_file, genes_file, cluster_info_file, identified_cluster_file, hcheight = 0.3 )
analysisEachCluster( segmented_file, segmented_anchorpoints_file, genes_file, cluster_info_file, identified_cluster_file, hcheight = 0.3 )
segmented_file |
The path to the segmented chromosome file. |
segmented_anchorpoints_file |
The path to the segmented anchorpoints file. |
genes_file |
genes.txt created by i-ADHoRe. |
cluster_info_file |
The path to the clustering information file. |
identified_cluster_file |
The path to the output file for identified clusters. |
hcheight |
The cutoff height for cluster identification (default: 0.3). |
A list containing information about identified clusters and their p-values.
This function performs bootstrapping on a given Ks (synonymous substitution rates) distribution to estimate peaks within the distribution.
bootStrapPeaks( ksRaw, binWidth = 0.1, maxK = 5, m = 3, peak.index = 1, peak.maxK = 2, spar = 0.25, rep = 1000, from = 0, to = maxK )
bootStrapPeaks( ksRaw, binWidth = 0.1, maxK = 5, m = 3, peak.index = 1, peak.maxK = 2, spar = 0.25, rep = 1000, from = 0, to = maxK )
ksRaw |
A numeric vector representing the raw Ks distribution to be bootstrapped. |
binWidth |
A numeric value indicating the bin width for histogram calculation. |
maxK |
A numeric value indicating the maximum Ks value to consider in the distribution. |
m |
An integer specifying the parameter for peak detection. |
peak.index |
An integer indicating the index of the peak to be estimated. |
peak.maxK |
A numeric value indicating the maximum Ks value for peak estimation. |
spar |
A numeric value controlling the smoothness of spline fitting. |
rep |
An integer specifying the number of bootstrap repetitions. |
from |
A numeric value indicating the lower bound for peak estimation. |
to |
A numeric value indicating the upper bound for peak estimation. |
A numeric vector containing bootstrapped peak estimates.
This function takes a list of data files, calculates the Ks distribution, and returns the results.
calculateKsDistribution4wgd_multiple( files_list, binWidth = 0.1, maxK = 5, plot.mode = "weighted", include.outliers = FALSE, minK = 0, minAlnLen = 0, minIdn = 0, minCov = 0 )
calculateKsDistribution4wgd_multiple( files_list, binWidth = 0.1, maxK = 5, plot.mode = "weighted", include.outliers = FALSE, minK = 0, minAlnLen = 0, minIdn = 0, minCov = 0 )
files_list |
A list of file paths containing Ks data. |
binWidth |
The width of Ks bins for the distribution. |
maxK |
The maximum Ks value to consider. |
plot.mode |
The mode for plotting ("weighted", "average", "min", or "pairwise"). |
include.outliers |
Whether to include outliers in the calculation. |
minK |
The minimum Ks value to include in the distribution. |
minAlnLen |
The minimum alignment length to include in the distribution. |
minIdn |
The minimum alignment identity to include in the distribution. |
minCov |
The minimum alignment coverage to include in the distribution. |
A list containing two data frames: "bar" for Ks distribution and "density" for density data.
This function calculates the -log10 of the p-value of a Poisson distribution given the parameters.
CalHomoConcentration(m, n, q, k)
CalHomoConcentration(m, n, q, k)
m |
The total number of trials. |
n |
The total number of possible outcomes. |
q |
The observed number of successful outcomes. |
k |
The expected number of successful outcomes. |
The -log10 of the p-value.
This function computes the P-value of a cluster using the Poisson distribution.
CalPvalue(m, n, q, k)
CalPvalue(m, n, q, k)
m |
The total number of all anchored points. |
n |
The product of the remapped gene number of the query species and subject species. |
q |
The number of anchored points in the cluster. |
k |
The product of the remapped gene number of the segmented chromosomes of the query species and subject species. |
The computed P-value.
This function checks the type of GFF input file specified by its path and processes it accordingly.
check_gff_from_file(gff_input_name, gff_input_path, working_wd)
check_gff_from_file(gff_input_name, gff_input_path, working_wd)
gff_input_name |
The informal name of the GFF input file. |
gff_input_path |
The path to the GFF input file. |
working_wd |
A character string specifying the working directory to be used. |
A string containing the processed GFF file's path.
This function checks the file format of a GFF/GTF input file and prepares it for analysis. It can handle both uncompressed and compressed formats.
check_gff_input(gff_input_name, gff_input_path, working_wd)
check_gff_input(gff_input_name, gff_input_path, working_wd)
gff_input_name |
A descriptive name for the GFF/GTF file. |
gff_input_path |
The file path to the GFF/GTF file. |
working_wd |
A character string specifying the working directory to be used. |
The path to the prepared GFF file for analysis.
This function checks the type of proteome input file and processes it accordingly.
check_proteome_from_file(proteome_name, proteome_input, working_wd)
check_proteome_from_file(proteome_name, proteome_input, working_wd)
proteome_name |
The informal name of the proteome input file. |
proteome_input |
The proteome input data. |
working_wd |
A character string specifying the working directory to be used. |
A string containing the processed proteome file's path.
This function checks the type of proteome input file and processes it accordingly.
check_proteome_input(proteome_name, proteome_input, working_wd)
check_proteome_input(proteome_name, proteome_input, working_wd)
proteome_name |
The informal name of the proteome input file. |
proteome_input |
The proteome input data. |
working_wd |
A character string specifying the working directory to be used. |
A string containing the processed proteome file's path.
This function checks the existence of files specified in a data table.
checkFileExistence(data_table, working_wd)
checkFileExistence(data_table, working_wd)
data_table |
A data table with file paths in columns V2 and V3. |
working_wd |
A path of the working directory |
This function has no return value. It prints messages to the console.
This function clusters synteny data based on calculated p-values and generates trees for both column-based and row-based clustering. It then saves the cluster information and trees to output files.
cluster_synteny( segmented_file, segmented_anchorpoints_file, genes_file, out_file )
cluster_synteny( segmented_file, segmented_anchorpoints_file, genes_file, out_file )
segmented_file |
A character string specifying the file path for segmented data. |
segmented_anchorpoints_file |
A character string specifying the file path for segmented anchorpoints. |
genes_file |
A character string specifying the file path for genes information created by i-ADHoRe. |
out_file |
A character string specifying the output file path for saving cluster information. |
NULL (output files are generated with the specified information).
This function calculates the depth of anchored points based on the provided parameters.
computing_depth( anchorpoint_ks_file, multiplicon_id, selected_query_chr, selected_subject_chr = NULL )
computing_depth( anchorpoint_ks_file, multiplicon_id, selected_query_chr, selected_subject_chr = NULL )
anchorpoint_ks_file |
The file containing anchorpoint and Ks data. |
multiplicon_id |
The ID of the multiplicon to consider. |
selected_query_chr |
A list of selected query chromosomes. |
selected_subject_chr |
A list of selected subject chromosomes (optional). |
A list containing depth data frames, including "query_depth" and "subject_depth" if subject chromosomes are specified, or "depth" if not.
This function computes the depth of anchored points in a paranome comparison based on the provided parameters.
computing_depth_paranome( anchorpoint_ks_file, multiplicon_id, selected_query_chr )
computing_depth_paranome( anchorpoint_ks_file, multiplicon_id, selected_query_chr )
anchorpoint_ks_file |
The file containing anchor point and Ks value data. |
multiplicon_id |
The IDs of the multiplicons to consider. |
selected_query_chr |
The list of selected query chromosomes. |
A list containing the depth dataframe.
This function counts ortholog genes in a given species based on input data.
CountOrthologs(atomic.df, species)
CountOrthologs(atomic.df, species)
atomic.df |
A data frame containing information about ortholog genes. It should have the following columns: - multiplicon: The multiplicon identifier. - geneX: The gene identifier in speciesX. - speciesX: The species name for geneX. - listX: The chromosome or list identifier for geneX. - coordX: The coordinate information for geneX. - geneY: The gene identifier in speciesY. - speciesY: The species name for geneY. - listY: The chromosome or list identifier for geneY. - coordY: The coordinate information for geneY. - level: The orthology level. - num_anchors: The number of anchors. - is_real: A flag indicating if the data is real. - Ks: The Ks value. |
species |
The species for which ortholog gene counts should be computed. |
A data frame summarizing the counts of ortholog genes for each chromosome.
Create Ksrates Command Files from Shiny Input
create_ksrates_cmd(input, ksratesconf, cmd_file)
create_ksrates_cmd(input, ksratesconf, cmd_file)
input |
The Input object of Shiny. |
ksratesconf |
The path to the Ksrates configuration file. |
cmd_file |
The path to the main Ksrates command file to be generated. |
This function generates command files for running Ksrates and related analyses based on a data table and configuration file.
create_ksrates_cmd_from_table(data_table, ksratesconf, cmd_file, focal_species)
create_ksrates_cmd_from_table(data_table, ksratesconf, cmd_file, focal_species)
data_table |
The data table containing information about species. |
ksratesconf |
The path to the Ksrates configuration file. |
cmd_file |
The path to the main Ksrates command file to be generated. |
focal_species |
The name of the focal species. |
This function generates a Ksrates configuration file based on a data table and other parameters.
create_ksrates_configure_file_based_on_table( data_table, focal_species, newick_tree_file, ksrates_conf_file, species_info_file, working_wd )
create_ksrates_configure_file_based_on_table( data_table, focal_species, newick_tree_file, ksrates_conf_file, species_info_file, working_wd )
data_table |
The data table containing information about species, proteomes, and GFF files. |
focal_species |
The name of the focal species. |
newick_tree_file |
The path to the Newick tree file. |
ksrates_conf_file |
The path to the Ksrates configuration file to be generated. |
species_info_file |
The path to the species information file. |
working_wd |
A character string specifying the working directory to be used. |
This function generates a configuration file for the Ksrates pipeline based on Shiny input.
create_ksrates_configure_file_v2(input, ksrates_conf_file, species_info_file)
create_ksrates_configure_file_v2(input, ksrates_conf_file, species_info_file)
input |
The Input object of Shiny. |
ksrates_conf_file |
The path to the Ksrates configuration file. |
species_info_file |
The path to the species information file. |
Create ksrates Expert Parameter File
create_ksrates_expert_parameter_file(ksrates_expert_parameter_file)
create_ksrates_expert_parameter_file(ksrates_expert_parameter_file)
ksrates_expert_parameter_file |
The file is used to store the ksrates expert parameter |
This function computes the default bandwidth range for kernel density estimation.
dfltBWrange(x, tau)
dfltBWrange(x, tau)
x |
The input data, which can be a numeric vector or matrix. |
tau |
A parameter used in bandwidth calculation. |
A list of bandwidth ranges for each dimension of the input data.
This function bins the input data into a regular grid.
dfltCounts( x, gridsize = rep(64, NCOL(x)), h = rep(0, NCOL(x)), supp = 3.7, range.x, w )
dfltCounts( x, gridsize = rep(64, NCOL(x)), h = rep(0, NCOL(x)), supp = 3.7, range.x, w )
x |
The input data, which should be a numeric matrix. |
gridsize |
A vector specifying the number of bins along each dimension. |
h |
A vector specifying the bandwidth (smoothing parameter) along each dimension. |
supp |
A parameter for determining the range of the bins. |
range.x |
A list specifying the range of values for each dimension. |
w |
A vector of weights for the data points. |
A list containing the binned counts and the range of values for each dimension.
Use this function to create a custom download button or link. When clicked, it will initiate a browser download. The filename and contents are specified by the corresponding downloadHandler() defined in the server function.
downloadButton_custom( outputId, label = "Download", class = NULL, status = "primary", ..., icon = shiny::icon("download") )
downloadButton_custom( outputId, label = "Download", class = NULL, status = "primary", ..., icon = shiny::icon("download") )
outputId |
The name of the output slot that the downloadHandler is assigned to. |
label |
The label that should appear on the button. |
class |
Additional CSS classes to apply to the tag, if any. Default NULL. |
status |
The status of the button; default is "primary." |
... |
Other arguments to pass to the container tag function. |
icon |
An icon() to appear on the button; default is icon("download"). |
An HTML tag to allow users to download the object.
Compute the mth derivative of a binned d-variate kernel density estimate based on grid counts.
drvkde(x, drv, bandwidth, gridsize, range.x, binned = FALSE, se = TRUE, w)
drvkde(x, drv, bandwidth, gridsize, range.x, binned = FALSE, se = TRUE, w)
x |
The input data. |
drv |
The order of the derivative to compute. |
bandwidth |
The bandwidth (smoothing parameter) along each dimension. |
gridsize |
The size of the grid. |
range.x |
A list specifying the range of values for each dimension. |
binned |
A logical indicating whether the input data is already binned. |
se |
A logical indicating whether to compute standard errors. |
w |
A vector of weights for the data points. |
A list containing the estimated density or derivative, and optionally, standard errors.
This function takes a string and splits it at tab characters. It then returns the first part of the resulting character vector.
extract_first_part(name)
extract_first_part(name)
name |
The input string to be split. |
Returns the first part of the input string.
This function extracts clusters based on the specified scaffolds for both query and subject species. It filters the data frames containing segment information and atomic anchorpoints to retain only the relevant clusters.
extractCluster(segs.df, atomic.df, scaf.bycol, scaf.byrow)
extractCluster(segs.df, atomic.df, scaf.bycol, scaf.byrow)
segs.df |
A data frame containing segment information. |
atomic.df |
A data frame containing atomic anchorpoints. |
scaf.bycol |
A character vector specifying scaffolds for the query species. |
scaf.byrow |
A character vector specifying scaffolds for the subject species. |
A list containing two data frames: "segs" for segment information and "atomic" for atomic anchorpoints.
This function identifies peaks in a numeric vector by analyzing the shape of the curve.
find_peaks(x, m = 3)
find_peaks(x, m = 3)
x |
A numeric vector in which peaks will be identified. |
m |
An integer indicating the half-width of the neighborhood to consider when identifying peaks. A larger value of |
A numeric vector containing the indices of the identified peaks in the input vector x
.
This function generates Kernel Density Estimates (KDE) for the Ks (synonymous substitution rates) distribution.
generate_ksd(ks_df, bin_width = 0.01, maxK = 5)
generate_ksd(ks_df, bin_width = 0.01, maxK = 5)
ks_df |
A data frame containing Ks values. |
bin_width |
The width of each bin for KDE calculation. |
maxK |
The maximum Ks value for the distribution. |
A list containing the following components:
Ks
: A numeric vector representing the KDE values.
bin_width
: The width of each bin used for KDE calculation.
maxK
: The maximum Ks value for the distribution.
This function generates a Ks (synonymous substitution rates) distribution from raw Ks values.
generateKsDistribution(ksraw, speciesName = NULL, maxK = 5)
generateKsDistribution(ksraw, speciesName = NULL, maxK = 5)
ksraw |
A numeric vector containing raw Ks values. |
speciesName |
(Optional) A character string specifying the species name associated with the Ks values. |
maxK |
A numeric value indicating the maximum Ks value to consider in the distribution. |
A numeric vector containing the binned Ks distribution.
This function extracts segmented data from anchorpoints and Ks (synonymous substitution rate) values, based on specified criteria, and writes the results to output files.
get_segments( genes_file, anchors_ks_file, multiplicons_file, segmented_file, segmented_anchorpoints_file, num_anchors = 10 )
get_segments( genes_file, anchors_ks_file, multiplicons_file, segmented_file, segmented_anchorpoints_file, num_anchors = 10 )
genes_file |
A character string specifying the file path for genes information created by i-ADHoRe. |
anchors_ks_file |
A character string specifying the file path for anchorpoints Ks values data. |
multiplicons_file |
A character string specifying the file path for multiplicons information created by i-ADHoRe. |
segmented_file |
A character string specifying the output file path for segmented data. |
segmented_anchorpoints_file |
A character string specifying the output file path for segmented anchorpoints. |
num_anchors |
An integer specifying the minimum number of anchorpoints required. |
NULL (output files are generated with the specified information).
This function checks whether a given file is in FASTA format with cds sequences.
is_fasta_cds(file_path)
is_fasta_cds(file_path)
file_path |
The path to the input file. |
TRUE if the file is in FASTA format with cds sequences, FALSE otherwise.
This function checks if the provided object is of class "ksv."
is.ksv(x) is.ksv(x)
is.ksv(x) is.ksv(x)
x |
The object to be checked. |
Returns TRUE if the object is of class "ksv"; otherwise, returns FALSE.
This function checks if an object is not NULL.
is.not.null(x)
is.not.null(x)
x |
An R object to check. |
A logical value indicating whether the object is not NULL.
A wrapper to run emmix modeling using the mclust package.
ks_mclust_v2(input_data)
ks_mclust_v2(input_data)
input_data |
The input data for clustering and modeling. |
A data frame containing clustering and modeling results.
This function reads information from an Excel file (XLS) containing columns "latin_name," "informal_name," and "gff." It extracts the "latin_name" and "informal_name" columns, performs some data manipulation, and returns a data frame with these two columns.
map_informal_name_to_latin_name(sp_gff_info_xls)
map_informal_name_to_latin_name(sp_gff_info_xls)
sp_gff_info_xls |
The path to the Excel file containing species information. |
A data frame with "latin_name" and "informal_name" columns.
Log-Normal mixturing analyses of a Ks distributions for the whole paranome
mix_logNormal_Ks(ksv, G = 1:5, k.nstart = 500, maxK = 5)
mix_logNormal_Ks(ksv, G = 1:5, k.nstart = 500, maxK = 5)
ksv |
A |
G |
An integer vector specifying the range of the mixtured components. A BIC is calculated for each component. The default is G=1:5. For a formal analysis, it is recommended to use 1:10. |
k.nstart |
How many random sets should be chosen in the k-means clustering. For a formal analysis, it is recommended to use 500. |
maxK |
Maximum Ks values used in the mixture modeling analysis. |
A data frame with seven variables.
Find the mode (peak) of a univariate distribution.
modeFinder(x, bw = 0.1, from = 0, to = 5)
modeFinder(x, bw = 0.1, from = 0, to = 5)
x |
A numeric vector or a kernel density estimate (KDE). |
bw |
Bandwidth for the KDE. Default is 0.1. |
from |
Starting point for mode search. Default is 0. |
to |
Ending point for mode search. Default is 5. |
The mode (peak) of the distribution.
Process species information file and extract chromosome lengths and mRNA counts from GFF files.
obtain_chromosome_length(species_info_file)
obtain_chromosome_length(species_info_file)
species_info_file |
A character string specifying the path to the species information file. |
A list containing two data frames: len_df for chromosome lengths and num_df for mRNA counts.
Process a data frame containing species information and extract chromosome lengths and mRNA counts from GFF files.
obtain_chromosome_length_filter(species_info_df)
obtain_chromosome_length_filter(species_info_df)
species_info_df |
A data frame containing species information with columns "sp," "cds," and "gff." |
A list containing two data frames: len_df for chromosome lengths and num_df for mRNA counts.
This function takes a file containing anchorpoints, GFF files for two species, and species names, and retrieves the coordinates of anchorpoints and associated genes from the GFF files.
obtain_coordiantes_for_anchorpoints( anchorpoints, species1, gff_file1, out_file, species2 = NULL, gff_file2 = NULL )
obtain_coordiantes_for_anchorpoints( anchorpoints, species1, gff_file1, out_file, species2 = NULL, gff_file2 = NULL )
anchorpoints |
A file containing anchorpoints information with columns like gene_x, gene_y, and other relevant data. |
species1 |
The name of the first species. |
gff_file1 |
The path to the GFF file for the first species. |
out_file |
The output file where the results will be saved. |
species2 |
(Optional) The name of the second species. Specify this parameter and gff_file2 if working with two species. |
gff_file2 |
(Optional) The path to the GFF file for the second species. |
None. The function saves the results to the specified out_file.
This function extracts coordinates and Ks (synonymous substitution rate) values for anchorpoints from input data and merges them into a single output file.
obtain_coordiantes_for_anchorpoints_ks( anchorpoints, anchorpoints_ks, genes_file, out_file, out_ks_file, species )
obtain_coordiantes_for_anchorpoints_ks( anchorpoints, anchorpoints_ks, genes_file, out_file, out_ks_file, species )
anchorpoints |
A character string specifying the file path for anchorpoints data. |
anchorpoints_ks |
A character string specifying the file path for anchorpoints Ks values data. |
genes_file |
A character string specifying the file path for genes information. |
out_file |
A character string specifying the output file path for coordinates. |
out_ks_file |
A character string specifying the output file path for Ks values. |
species |
A character string specifying the species name. |
NULL (output files are generated with the specified information).
This function retrieves the coordinates for segments in a comparison based on the provided parameters.
obtain_coordiantes_for_segments( seg_file, sp1, gff_file1, out_file, sp2 = NULL, gff_file2 = NULL )
obtain_coordiantes_for_segments( seg_file, sp1, gff_file1, out_file, sp2 = NULL, gff_file2 = NULL )
seg_file |
The file containing segment data. |
sp1 |
The species name for the first genome. |
gff_file1 |
The GFF file for the first genome. |
out_file |
The output file to store the merged position data. |
sp2 |
The species name for the second genome (optional). |
gff_file2 |
The GFF file for the second genome (optional). |
NULL (the results are saved in the output file).
This function extracts coordinates for segments within multiple synteny blocks based on input dataframes.
obtain_coordinates_for_segments_multiple(seg_df, gff_df, input, out_file)
obtain_coordinates_for_segments_multiple(seg_df, gff_df, input, out_file)
seg_df |
A dataframe containing information about synteny segments. |
gff_df |
A dataframe containing GFF (General Feature Format) information. |
input |
A list containing input data, typically multiple synteny query chromosomes. |
out_file |
A character string specifying the output file path. |
A dataframe with coordinates for segments within multiple synteny blocks.
This function takes as input a multiplicon file, an anchorpoint file, Ks values, and other relevant information. It calculates the mean of Ks values for each multiplicon and associates them with the corresponding data.
obtain_mean_ks_for_each_multiplicon( multiplicon_file, anchorpoint_file, species1, ks_file, outfile, anchorpointout_file, species2 = NULL )
obtain_mean_ks_for_each_multiplicon( multiplicon_file, anchorpoint_file, species1, ks_file, outfile, anchorpointout_file, species2 = NULL )
multiplicon_file |
A file containing multiplicon information. |
anchorpoint_file |
A file containing anchorpoints information with columns like geneX, geneY, and other relevant data. |
species1 |
The name of the first species. |
ks_file |
A file containing Ks values. |
outfile |
The output file where the results will be saved. |
anchorpointout_file |
The output file for anchorpoint data with Ks values. |
species2 |
(Optional) The name of the second species. Specify this parameter and ks_file if working with two species. |
None. The function saves the results to the specified outfile and anchorpointout_file.
Read the EMMIX output for a range of components
parse_EMMIX(emmix.out, G = 1:3)
parse_EMMIX(emmix.out, G = 1:3)
emmix.out |
The output file from EMMIX software. |
G |
An integer vector specifying the range of the mixture components. The default is G=1:3. |
A data frame with seven variables.
Read the EMMIX output for a specify number of components
parse_one_EMMIX(emmix.out, ncomponent = 3)
parse_one_EMMIX(emmix.out, ncomponent = 3)
emmix.out |
The output file from EMMIX software. |
ncomponent |
Number of components to read from the file. |
A data frame with seven variables.
This function identifies peaks in a distribution of Ks (synonymous substitution rates) values.
PeaksInKsDistributionValues( ks, binWidth = 0.1, maxK = 5, m = 3, peak.maxK = 2, spar = 0.25 )
PeaksInKsDistributionValues( ks, binWidth = 0.1, maxK = 5, m = 3, peak.maxK = 2, spar = 0.25 )
ks |
A numeric vector containing Ks values for which peaks will be identified. |
binWidth |
A numeric value specifying the bin width for creating the histogram. |
maxK |
A numeric value indicating the maximum Ks value to consider. |
m |
An integer indicating the half-width of the neighborhood to consider when identifying peaks. A larger value of |
peak.maxK |
A numeric value specifying the maximum Ks value to consider when identifying peaks. |
spar |
A numeric value controlling the smoothness of the spline fit. Higher values make the fit smoother. |
A numeric vector containing the identified peaks in the Ks distribution.
This function reads data from an uploaded file in a Shiny application and returns it as a data frame.
read_data_file(uploadfile)
read_data_file(uploadfile)
uploadfile |
The object representing the uploaded file obtained through the Shiny upload function. |
A data frame containing the data from the uploaded file.
Read the output file of wgd ksd
read.wgd_ksd( file, include_outliers = FALSE, min_ks = 0, min_aln_len = 0, min_idn = 0, min_cov = 0 )
read.wgd_ksd( file, include_outliers = FALSE, min_ks = 0, min_aln_len = 0, min_idn = 0, min_cov = 0 )
file |
The output file of |
include_outliers |
Include outliers or not, default FALSE. |
min_ks |
Minimum Ks value, default 0. |
min_aln_len |
Minimum alignment length, default 0. |
min_idn |
Minimum alignment identity, default 0. |
min_cov |
Minimum alignment coverage, default 0. |
A ksv
object, which is a list including:
ks_df
: the data frame that used for following analysis
ks_dist
: a list including a vector of Ks values in the distribution
raw_df
: raw data
filters
: filters that applied to the raw data
Compute relative rates using input data files and statistical computations.
relativeRate( ksv2out_1_file, ksv2out_2_file, ksv_between_file, KsMax, low = 0.025, up = 0.975, bs = 1000 )
relativeRate( ksv2out_1_file, ksv2out_2_file, ksv_between_file, KsMax, low = 0.025, up = 0.975, bs = 1000 )
ksv2out_1_file |
A character string specifying the path to the first input data file. |
ksv2out_2_file |
A character string specifying the path to the second input data file. |
ksv_between_file |
A character string specifying the path to the third input data file. |
KsMax |
A numeric value representing a maximum threshold for Ks values. |
low |
A numeric value specifying the lower quantile for bootstrapping. Default is 0.025. |
up |
A numeric value specifying the upper quantile for bootstrapping. Default is 0.975. |
bs |
An integer specifying the number of bootstrap iterations. Default is 1000. |
A list containing computed relative rates and their confidence intervals.
This function removes the gene contains stop codons (TAA, TAG, TGA, taa, tag, tga) within its sequence.
remove_inner_stop_codon_sequence(sequence)
remove_inner_stop_codon_sequence(sequence)
sequence |
A nucleotide sequence as a character string. |
A character string or NULL.
This function removes directories in the specified base directory that are older than a specified maximum age in days. It logs the removed directories and any errors encountered during removal.
remove_old_dirs( base_dir, max_age_in_days = 3, log_file = "remove_old_dirs.log", verbose = FALSE )
remove_old_dirs( base_dir, max_age_in_days = 3, log_file = "remove_old_dirs.log", verbose = FALSE )
base_dir |
The base directory to search for old directories. |
max_age_in_days |
The maximum age (in days) for directories to be considered old. |
log_file |
The name of the log file to store information about removed directories and errors. |
verbose |
A logical value indicating whether to print messages to the console. |
The function does not return anything. It logs information about removed directories and errors.
This function takes a data frame names_df
containing "latin_name" and "informal_name" columns and an input
string as input. It replaces informal species names in the input
string with their corresponding Latin names based on the information in names_df
. If the input
string contains underscores ("_"), it assumes a comparison between two species and replaces both informal names. Otherwise, it replaces the informal name in the input
string.
replace_informal_name_to_latin_name(names_df, input)
replace_informal_name_to_latin_name(names_df, input)
names_df |
A data frame with "latin_name" and "informal_name" columns. |
input |
The input string that may contain informal species names. |
A modified input string with informal names replaced by Latin names.
This function resamples a given Ks (synonymous substitution rates) distribution.
resampleKsDistribution(ks, maxK = 5)
resampleKsDistribution(ks, maxK = 5)
ks |
A numeric vector representing the Ks distribution to be resampled. |
maxK |
A numeric value indicating the maximum Ks value to consider in the distribution. |
A numeric vector containing a resampled Ks distribution.
A wrapper to run EM analysis of \(ln\) Ks values with k-means
run_emmix_kmeas(v, k.centers = 2, k.nstart = 500)
run_emmix_kmeas(v, k.centers = 2, k.nstart = 500)
v |
A list include a vector of Ks values namely |
k.centers |
Number of k-means centers, default 2. |
k.nstart |
Number of random start of k-means clustering, default 10. For a formal analysis, it is recommended to use 500. |
A list, i.e., the original output of mclust::emV
The main function to launch the Shiny application for whole genome duplication analysis. This function initializes the app and opens a Shiny interface that allows users to interactively analyze whole-genome duplication data.
runshinyWGD()
runshinyWGD()
No return value. This function is called for side effects, which include starting the Shiny application. The function launches a Shiny app in a web browser, where users can interact with the whole genome duplication analysis.
This function computes the significance of features based on gradient and curvature analysis.
SignifFeatureRegion( n, d, gcounts, gridsize, dest, bandwidth, signifLevel, range.x, grad = TRUE, curv = TRUE, neg.curv.only = TRUE )
SignifFeatureRegion( n, d, gcounts, gridsize, dest, bandwidth, signifLevel, range.x, grad = TRUE, curv = TRUE, neg.curv.only = TRUE )
n |
The sample size. |
d |
The dimensionality of the data. |
gcounts |
A numeric vector representing data counts. |
gridsize |
A numeric vector specifying the grid size. |
dest |
A kernel density estimate. |
bandwidth |
The bandwidth parameter. |
signifLevel |
The significance level. |
range.x |
The range of x values. |
grad |
A logical value indicating whether to compute the gradient significance. |
curv |
A logical value indicating whether to compute the curvature significance. |
neg.curv.only |
A logical value indicating whether to consider negative curvature only. |
A list containing the significance results for gradient and curvature.
The SiZer (Significant Zero Crossings) method is a technique used for assessing the statistical significance of zero crossings in data density estimation.
SiZer(x, bw, gridsize, signifLevel = 0.05)
SiZer(x, bw, gridsize, signifLevel = 0.05)
x |
A numeric vector containing the data for which you want to calculate SiZer. |
bw |
Bandwidth parameter for kernel density estimation. If not provided, default values are used. |
gridsize |
A vector specifying the grid size for SiZer. Default is c(401, 151). |
signifLevel |
The significance level for SiZer. Default is 0.05. |
A list containing SiZer results, including the SiZer curve, the SiZer map, and the bandwidth.
Perform symmetric convolution using FFT.
symconv.ks(rr, ss, skewflag)
symconv.ks(rr, ss, skewflag)
rr |
The first input vector. |
ss |
The second input vector. |
skewflag |
A scalar value to apply skew correction. |
A vector representing the result of the symmetric convolution.
Perform symmetric 2D convolution using FFT.
symconv2D.ks(rr, ss, skewflag = rep(1, 2))
symconv2D.ks(rr, ss, skewflag = rep(1, 2))
rr |
The first input matrix. |
ss |
The second input matrix. |
skewflag |
A vector of two scalar values for skew correction along each dimension. |
A matrix representing the result of the symmetric 2D convolution.
Perform symmetric 3D convolution using FFT.
symconv3D.ks(rr, ss, skewflag = rep(1, 3))
symconv3D.ks(rr, ss, skewflag = rep(1, 3))
rr |
The first input 3D array. |
ss |
The second input 3D array. |
skewflag |
A vector of three scalar values for skew correction along each dimension. |
A 3D array representing the result of the symmetric 3D convolution.
Perform symmetric 4D convolution using FFT.
symconv4D.ks(rr, ss, skewflag = rep(1, 4), fftflag = rep(TRUE, 2))
symconv4D.ks(rr, ss, skewflag = rep(1, 4), fftflag = rep(TRUE, 2))
rr |
The first input 4D array. |
ss |
The second input 4D array. |
skewflag |
A vector of four scalar values for skew correction along each dimension. |
fftflag |
A vector of two Boolean values for FFT flag. |
A 4D array representing the result of the symmetric 4D convolution.
This function takes a file with species names as input and a prefix to define the output.
TimeTreeFecher(input_file, prefix)
TimeTreeFecher(input_file, prefix)
input_file |
A character string specifying the path to the file containing species names. |
prefix |
A character string providing the prefix for the output file. |
A timetree object representing the estimated divergence times between species.