Package 'cord'

Title: Community Estimation in G-Models via CORD
Description: Partition data points (variables) into communities/clusters, similar to clustering algorithms, such as k-means and hierarchical clustering. This package implements a clustering algorithm based on a new metric CORD, defined for high dimensional parametric or semi-parametric distributions. Read http://arxiv.org/abs/1508.01939 for more details.
Authors: Xi (Rossi) LUO, Florentina Bunea, Christophe Giraud
Maintainer: Xi (Rossi) LUO <[email protected]>
License: GPL-3
Version: 0.1.1
Built: 2024-10-24 03:42:59 UTC
Source: https://github.com/cran/cord

Help Index


Community estimation in G-models via CORD

Description

Partition data points (variables) into clusters/communities. Reference: Bunea, F., Giraud, C., & Luo, X. (2015). Community estimation in GG-models via CORD. arXiv preprint arXiv:1508.01939. http://arxiv.org/abs/1508.01939.

Usage

cord(X, tau = 2 * sqrt(log(ncol(X))/nrow(X)), kendall = T,
  input = c("data", "cor", "dist"))

Arguments

X

Input data matrix. It should be an n (samples) by p (variables) matrix when input is set to the value "data" by default. It can also be a p by p symmetric matrix when X is a correlation matrix or a distance matrix if input is set accordingly.

tau

Threshold to use at each iteration. A theoretical choice is about 2n1/2log1/2p2n^{-1/2}\log^{1/2} p.

kendall

Whether to compute Kendall's tau correlation matrix from X, when input is set to "data". If FALSE, Pearson's correlation will be computed, usually faster for large p.

input

Type of input X. It should be set to "data" when X is an n (samples) by p (variables) matrix. If X is a correlation matrix or a distance matrix, it should be set to "cor" or "dist" respectively.

Value

list with one element: a vector of integers showing which cluster/community each point is assigned to.

Examples

set.seed(100)
X <- 2*matrix(rnorm(200*2), 200, 10)+matrix(rnorm(200*10), 200, 10)
cord(X)