-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Make explicit in pandas IO doc the imports and options #28089
Conversation
- Removes code header in IO doc page. - Uses explicit imports in the IO page of user guide on first occurence of requirement. - In reference to pandas-dev#28038
…c-remove-header
Thanks for the work on this @TanyaaCJain, looks great. Did you check if the pandas and numpy options are being used? It'd probably be useful to generate a diff of the current (master) html generated for that page, and the one with this changes, and without the options. So we know for sure what's being affected by this change. @TomAugspurger @jorisvandenbossche @jreback this is the option that makes more sense to me. I think having a block with the imports and the options at the top of every file doesn't really look great. Thought? |
This (suppressing the imports) seems fine for rendered HTML on the majority of the pages. I think we intentionally don't suppress them on the first few of the getting started. For the binder version, presumably these will be visible / executed? Or are we not thinking that far ahead yet? |
The main motivation of this change is not having hidden code, so the integration with Binder is easier and there is no magic for the users. For the API pages I guess it will be more controversial. But in my opinion the changes to the user guide are very small and surely worth. |
Note that in the current diff of this PR, it is still "hidden" to the user for the online html docs |
doc/source/user_guide/io.rst
Outdated
.. ipython:: python | ||
:suppress: | ||
|
||
import pandas as pd | ||
pd.options.display.max_rows = 15 | ||
clipdf = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': ['p', 'q', 'r']}, | ||
index=['x', 'y', 'z']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't realize this was hidden, thanks @jorisvandenbossche for pointing it out.
@TanyaaCJain this block is not shown to the user, can you move the clipdf
data where it's first used, and the import to the first next block (assuming there are no more hidden blocks before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the changes in the commit 18a9b7c.
- imported pandas and added its options in the first visible code block - moved the suppressed or invisible code block using clipdf just before the code block using the clipdf variable - imported `BytesIO` in its first occurrence of requirement unlike previously imported along with `StringIO` - imported os again for adding it on its first occurrence of requirement in a visible code block and is earlier imported in a suppressed code block.
…c-remove-header
The new commit does this:
I need help with:
Please suggest if there are any other changes as well. |
doc/source/user_guide/io.rst
Outdated
@@ -137,7 +128,9 @@ usecols : list-like or callable, default ``None`` | |||
|
|||
.. ipython:: python | |||
|
|||
from io import StringIO, BytesIO | |||
import pandas as pd | |||
pd.options.display.max_rows = 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can actually remove this line.
I quickly checked and there are only a few longer dataframes shown in this page, and they are all longer. So with the current default of only showing first/last 5 rows when truncated, this option has no effect (it would only have the effect that dataframes longer than 15 but shorter than 60 would still be truncated, but such dataframes are not present in this file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the changes in commit d3036f3.
In general, do we care much about setting this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a preference regarding the seed. I think for the long term would be nice to avoid examples with random data. But for now I think it's perfectly fine to remove the seed and have different examples every time (I don't think anybody cares, including search engines, and we the production docs are frozen between releases).
It'd also be ok to me if we decide to remove the numpy printooptions. I think data looks nicer with it, but if setting it can cause confusion to users, not a problem for me to get rid of it.
doc/source/user_guide/io.rst
Outdated
:suppress: | ||
|
||
clipdf = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': ['p', 'q', 'r']}, | ||
index=['x', 'y', 'z']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is really needed. Do you mind having a look @TanyaaCJain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes.. I missed mentioning this. Thanks @datapythonista for the reminder! Even I couldn't find this assignment be of use anywhere in the file. So, should we go ahead with getting rid of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have he feeling that the read_clipboard
was an ipython
block before, and we had this clipdf
and a to_clipboard
before reading from the clipboard. But since not it's a python block and the code is written manually, I guess we can simply get rid of this.
Also, @datapythonista can you help me with this? |
My idea was to go to an updated master branch The, go to your branch, generate that page again and do a That should highlight everything that will be different for the user. Better to do it wihout removing the seed, otherwise all the data will be different. Does that make sense? |
This is the diff.txt generated, can be viewed as html if the extension is changed. The file compared to original did not have the |
didn't think about that, but the changes make the number of the |
@@ -137,7 +128,8 @@ usecols : list-like or callable, default ``None`` | |||
|
|||
.. ipython:: python | |||
|
|||
from io import StringIO, BytesIO | |||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize I missed the original discussion but just to be clear, is the assumption here that we only show the import the first time it shows up one of the rst files and that the rest of the code blocks use it from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's an assumption. It is implicitly being proposed, for simplicity and for inertia on how the header worked, but it's surely open to discussion. I thought about that too, and I was also wondering if it'd make sense to split the long pages we have now in the user guide in shorter pages. I guess that could make things easier if we finally make the code runnable.
Answering this comment from @WillAyd on the issue: #28038 (comment) This is still open to discussion, but personally I much like this approach than the header. There are some advantages:
I agree with your point that in a huge documentation page, having the import just at the beginning doesn't necessarily make sense, the document will often not be accessed sequentially. But discussed with @jorisvandenbossche, and seems reasonable to split very long pages. And with binder in mind, probably worth having the imports repeated at the beginning of main sections, rather than hidden. So, while this PR may not be a perfect solution yet, I think it's a step forward in the right direction. |
Agreed with Marc.
…On Tue, Sep 10, 2019 at 10:41 AM Marc Garcia ***@***.***> wrote:
Answering this comment from @WillAyd <https://github.com/WillAyd> on the
issue: #28038 (comment)
<#28038 (comment)>
This is still open to discussion, but personally I much like this approach
than the header. There are some advantages:
- Simpler
- Runnable code with no magic for users
- If we make the code runnable with binder, not having headers/hidden
code will probably help
- Sphinx will report the correct line on errors
- If what we did in this PR can extrapolate to the rest, there is not
so much needed in the header anyway
I agree with your point that in a huge documentation page, having the
import just at the beginning doesn't necessarily make sense, the document
will often not be accessed sequentially. But discussed with
@jorisvandenbossche <https://github.com/jorisvandenbossche>, and seems
reasonable to split very long pages. And with binder in mind, probably
worth having the imports repeated at the beginning of main sections, rather
than hidden.
So, while this PR may not be a perfect solution yet, I think it's a step
forward in the right direction.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28089?email_source=notifications&email_token=AAKAOIXHWWUQ3ALT6M46GLLQI65YZA5CNFSM4IOU7FTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6LRWUQ#issuecomment-529996626>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIQIAFX4ZXEVNEXKWOTQI65YZANCNFSM4IOU7FTA>
.
|
Would you all prefer to split up this file into smaller sections too or just want it for way more lengthier ones? Though I think splitting doc into sections would help to understand difference in file much easily, if preferred.
I tried for a few other ways that could possibly generate a difference ignoring the ipython lines but none generated the desired output. I’ll update if I do find one. |
If runnable code with binder is the main draw here (that's what I am inferencing at least, perhaps incorrectly) is there any reason why we don't defer moving things around like this until something like binder would actually get implemented? |
Not sure where Marc is at on that, but I stopped my prototype at
https://github.com/TomAugspurger/pandas-binder when I noticed that the `{{
header }}` stuff wasn't working. So from that attempt's POV, this is a
blocker.
…On Tue, Sep 10, 2019 at 2:58 PM William Ayd ***@***.***> wrote:
If runnable code with binder is the main draw here (that's what I am
inferencing at least, perhaps incorrectly) is there any reason why we don't
defer moving things around like this until something like binder would
actually get implemented?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28089?email_source=notifications&email_token=AAKAOIT3J3HUHTBE6H4BZYDQI735VA5CNFSM4IOU7FTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6MJYLQ#issuecomment-530095150>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIRBFLK5GSV6QPQWRCLQI735VANCNFSM4IOU7FTA>
.
|
Sounds good. I'll defer to you all then as I think you have a better big picture view of this and I don't think anything here is a showstopper. Just wanted to offer an alternative view - thanks! |
@WillAyd to come back to your concern (as it was not fully addressed IMO):
Currently, all standard imports (pandas, numpy) are at the beginning of each rst file, but hidden for the user. Now we still do them at the top of the file, but visible. So a net improvement, and not arbitrary. Or at least, that's the case for the pandas import. But you are right that if the
And as Marc said, going to smaller files will also help with this. |
@TanyaaCJain I think we want to work towards that, but that's certainly for another PR (for the IO page we have #10446 about this) |
status on this PR @jorisvandenbossche @datapythonista |
If nobody objects to remove the header in the docs, the seeds, and make the imports explicit, this is ready to be merged, and we can start working on the rest of the docs. |
I am fully behind removing the header and moving the imports in actual code-blocks. One thing that might be good to decide on before starting other PRs on the rest of the docs, is where to put those imports. I think that is fine for less common imports, but it could be an option to still put eg numpy and pandas (and matplotlib.pyplot for those rst files that use it) in a visible code block in the beginning of each file. (but also fine to first try out with the approach as in this PR, and evaluate later about that question) |
@datapythonista @jorisvandenbossche still want to merge? |
Yeah, I think so.
Another point, maybe already mentioned, is that explicit imports will fix
the line-number reported by sphinx in errors.
…On Thu, Nov 7, 2019 at 3:11 PM William Ayd ***@***.***> wrote:
@datapythonista <https://github.com/datapythonista> @jorisvandenbossche
<https://github.com/jorisvandenbossche> still want to merge?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28089?email_source=notifications&email_token=AAKAOIV7TW5CQC5DA6CRXS3QSSAABA5CNFSM4IOU7FTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDN2M4Y#issuecomment-551265907>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIXHK67JQH2H4M4JQT3QSSAABANCNFSM4IOU7FTA>
.
|
Thanks @TanyaaCJain |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Reference @datapythonista - 149