find.points {seq2R}R Documentation

Simple method to detect compositional changes in genomic sequences

Description

find is used to detect changes at genomic sequences composition. The method is based on fitting nonparametric models by using local linear kernel smoothers.

Usage

find.points(x,kbin= 300, p= 3, bandwidth=-1, weights= 1, nboot=100, kernel="gaussian",
n.bandwidths= 20, seed = NULL, ...)

Arguments

x

Sequences in binary system (by using change.binary function previously) are to be analyzed from.

kbin

The number of binning nodes over which the function is to be estimated.

p

Degree of the polynomial. By default p=3.

bandwidth

The kernel bandwidth or smoothing parameter. Large values of bandwidth make smoother estimates, smaller values of bandwidth make less smooth estimates. The default h=-1 is a bandwidth compute by cross validation.

weights

Weights.

nboot

Number of bootstrap repeats.

kernel

Character which denotes the kernel function (a symmetric density). By default kernel = "gaussian", this is, the Gaussian density function. Also, other types of kernel functions can be used: Epanechnikov and triangle, kernel="Epanech" and kernel="triang", respectively.

n.bandwidths

Number that it will be used to calculate the grid of bandwidths in a range between 0 and 1. In this grid, it will be selected the optimum bandwidth by cross-validation.If the optimum bandwidth value is close to 0, we will obtain rough estimates; when it is close to 1, we will obtain smooth estimates.

seed

Seed to be used in the bootstrap procedure.

...

Other options.

Details

For each genomic sequence the AT and CG skews profiles were calculated as A vs. T = (A-T)/(A+T) and C vs. G = (C-G)/(C+G).

Value

The function computes and returns a list of short information for a fitted change.points object.

Number of A-T base pairs

The returned value is the total nucleotide (adenine and thymine) contained at the sequence analyzed.

Number of C-G base pairs

In this case, the returned value is the sum of cytosine and guanine contained at the sequence.

Number of binning nodes

The number of binning nodes over which the function is to be estimated.

Number of bootstrap repeats

Number of bootstrap repeats.

Bandwidth

Value of the kernel bandwidth or smoothing parameter used in the fitting for A vs. T and C vs. G.

Exists any critical point

Emphasize if there is or not any critical.

Author(s)

Nora M. Villanueva and Marta Sestelo.

References

N. M. Villanueva, M. Sestelo, M. M. Fonseca and J. Roca-Pardinas (2023). seq2R: An R package to detect change points in DNA sequences. Mathematics, 11 (10), 2299.

Examples

library(seq2R)


#mtDNAhum <- read.genbank("NC_012920")
data(mtDNAhum)
DNA <- transform(mtDNAhum)
seq1<-find.points(DNA)
seq1

[Package seq2R version 2.0.1 Index]