Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

showRaw() not working when rows argument is vector containing row names #18

Open
ellisifnygaard opened this issue Nov 11, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@ellisifnygaard
Copy link
Collaborator

When I use showRaw() with a rows argument consisting of a vector containing row names, I get the following error:

Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent

Running traceback() results in:

2: rownames<- (*tmp*, value = data_summary$rownames[rows])

It appears to stem from this line:
https://github.com/jromanowska/HaplinMethyl/blob/7fc4afdc0c45f38321896fb55323bf1d54dc9372/R/showRaw.R#L98C50-L98C54

There are no issues when rows is integers, or when columns is integers or a vector with column names.

Perhaps rownames(show_matrix) <- data_summary$rownames[rows] should be replaced by an if-else statement conditional on whether all(is.character(rows)) is TRUE or FALSE. If TRUE, use rownames(show_matrix) <- rows, and if FALSE, then use rownames(show_matrix) <- data_summary$rownames[rows].

@ellisifnygaard ellisifnygaard added the bug Something isn't working label Nov 11, 2023
@ellisifnygaard
Copy link
Collaborator Author

It looks like the problem might stem from how Haplin's internal f.get.gen.data.cols() works, which is used on line 86. It returns an ff_matrix that has column names, but no row names.
https://github.com/cran/Haplin/blob/8bb309fc06002300850c4cf0a2237867e39eae69/R/f.get.gen.data.cols.R

It makes sense for Haplin:::f.get.gen.data.cols() to do this, considering that the ff reference manual states that

It is recommended not to assign row.names to a large ffdf object.

and I assume this may apply to ff objects as well.

@jromanowska I'm guessing that it is simpler and faster to make changes to functions in HaplinMethyl rather than make changes to Haplin, so perhaps we should either

  1. tweak showRaw so that it can manage to select rows when the user supplies a vector with row names (if possible - I can't think of a specific solution at present), or
  2. change the documentation/description of showRaw to state that you have to use row numbers when extracting particular rows and not row names.

@ellisifnygaard
Copy link
Collaborator Author

I have now done some tweaks locally so that I can use showRaw in one of preprocessing stages of the pipeline that I'm developing.

I customised Haplin:::f.get.gen.data.cols() so that it is capable of returning an ff_matrix with row names when desirable instead of just column names. I added an argument called include.rownames.

In showRaw, I commented out line 98,

rownames(show_matrix) <- data_summary$rownames[rows] ,

so that the function does not break when rows is a vector with row names/CpG IDs.

Let me know if you would like this fix implemented in Haplin and HaplinMethyl. Alternatively (if it is all right in terms of copyright), we can fix this bug without requiring changes to be made to the Haplin package by adding my customised version of Haplin:::f.get.gen.data.cols() as an internal function in HaplinMethyl.

@jromanowska
Copy link
Owner

We need to wait for the new Haplin release to be able to fix that... :-/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants