Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correspondence Analysis Implementation? #110

Open
Heladio-ac opened this issue Oct 13, 2019 · 6 comments
Open

Correspondence Analysis Implementation? #110

Heladio-ac opened this issue Oct 13, 2019 · 6 comments

Comments

@Heladio-ac
Copy link

Are there any plans on implementing Correspondence Analysis and Multiple Correspondence Analysis?

@Heladio-ac Heladio-ac changed the title Correspondence Analysis Implementation Correspondence Analysis Implementation? Oct 13, 2019
@wildart
Copy link
Collaborator

wildart commented Oct 14, 2019

Not in plans, as I have no knowledge of it. Is this some kind of variation of PCA? If you have some something written, your PR is always welcome.

@Heladio-ac
Copy link
Author

It is related to PCA, it allows to apply PCA to categorical data by using contingency tables. I'll try to work on it!

@atantos
Copy link

atantos commented May 24, 2021

Any news on that front?

@ZekeMarshall
Copy link

ZekeMarshall commented Feb 7, 2024

I'm working on this at the moment and will submit a pull request when I find the time to finish it off, here is a barebones function in the meantime. It follows the computational algorithm outlined in appendix A of Greenacre (2017) and implemented in the R function ca::ca() (Nenadic and Greenacre, 2007).

I've checked that the standard coordinates of this function are equal to those produced in ca::ca() in a Quarto notebook with base::all.equal() using the dune dataset bundled with the R package vegan.

using NamedArrays
using LinearAlgebra

function correspondence_analysis(N::NamedMatrix)
  
  # A.1 Create the correspondence matrix
  P = N / sum(N)

  # A.2 Calculate column and row masses
  r = vec(sum(P, dims = 2))
  c = vec(sum(P, dims = 1))

  # A.3 Diagonal matrices of row and column masses
  Dr = Diagonal(r)
  Dc = Diagonal(c)

  # A.4 Calculate the matrix of standardized residuals
  SR = Dr^(-1/2) * (P - r * transpose(c)) * Dc^(-1/2)

  # A.5 Calculate the Singular Value Decomposition (SVD) of S
  svd = LinearAlgebra.svd(SR)
  U = svd.U
  V = svd.V
  S = svd.S
  D = Diagonal(S)

  # A.6 Standard coordinates Φ of rows
  Φ_rownames = names(N)[1]
  Φ_colnames = vec(["Dim"].*string.([1:1:size(N,1);]))
  Φ = NamedArray(Dr^(-1/2) * U, names = (Φ_rownames, Φ_colnames), dimnames = ("Row", "Dimension"))
  
  # A.7 Standard coordinates Γ of columns
  Γ_rownames = names(N)[2]
  Γ_colnames = vec(["Dim"].*string.([1:1:size(N,1);]))
  Γ = NamedArray(Dc^(-1/2) * V, names = (Γ_rownames, Γ_colnames), dimnames = ("Column", "Dimension"))
  
  # A.8 Principal coordinates F of rows
  F = Dr^(-1/2) * U * D
  
  # A.9 Principal coordinates G of columns
  G = Dc^(-1/2) * V * D

  results = (sv = D,
             rownames = names(N)[1],
             rowmass = r,
             rowcoord = Φ,
             colnames = names(N)[2],
             colmass = c,
             colcoord = Γ
            )

  return results

end

References

Greenacre, Michael. 2017. Correspondence Analysis in Practice, Third Edition. CRC Press.
Nenadic, Oleg, and Michael Greenacre. 2007. “Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The Ca Package.” Journal of Statistical Software 20 (February): 1–13. https://doi.org/10.18637/jss.v020.i03.

@atantos
Copy link

atantos commented Feb 11, 2024

Lookin forward to seeing the commit! Until then, i will be experimenting with the function here. Thanks!

@FlyingWorkshop
Copy link

I developed a package (ExpFamilyPCA.jl) with Trevor Hastie and Mykel Kochenderfer that implements exponential family PCA which is similar to correspondence analysis when used with Poisson loss. The Poisson EPCA objective is the generalized KL divergence, making it appropriate for compressing frequency data, data in the form of percentages, and discrete distribution data (i.e., probability profiles)--similar to correspondence analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants