DOC: Make explicit in pandas IO doc the imports and options #28089

TanyaaCJain · 2019-08-22T13:25:11Z

works on DOC: Remove docs code header and use explicit imports in the user guide #28038
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
Removes code header from IO doc page.
Uses explicit imports in the IO doc page of user guide on first occurrence of requirement.

- Removes code header in IO doc page. - Uses explicit imports in the IO page of user guide on first occurence of requirement. - In reference to pandas-dev#28038

…c-remove-header

datapythonista · 2019-08-22T13:41:50Z

Thanks for the work on this @TanyaaCJain, looks great. Did you check if the pandas and numpy options are being used? It'd probably be useful to generate a diff of the current (master) html generated for that page, and the one with this changes, and without the options. So we know for sure what's being affected by this change.

@TomAugspurger @jorisvandenbossche @jreback this is the option that makes more sense to me. I think having a block with the imports and the options at the top of every file doesn't really look great. Thought?

TomAugspurger · 2019-08-22T13:46:23Z

This (suppressing the imports) seems fine for rendered HTML on the majority of the pages. I think we intentionally don't suppress them on the first few of the getting started.

For the binder version, presumably these will be visible / executed? Or are we not thinking that far ahead yet?

datapythonista · 2019-08-22T13:54:27Z

The main motivation of this change is not having hidden code, so the integration with Binder is easier and there is no magic for the users.

For the API pages I guess it will be more controversial. But in my opinion the changes to the user guide are very small and surely worth.

jorisvandenbossche · 2019-08-22T14:02:28Z

The main motivation of this change is not having hidden code,

Note that in the current diff of this PR, it is still "hidden" to the user for the online html docs

datapythonista · 2019-08-22T14:08:53Z

doc/source/user_guide/io.rst

 .. ipython:: python
   :suppress:

+   import pandas as pd
+   pd.options.display.max_rows = 15
   clipdf = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': ['p', 'q', 'r']},
                         index=['x', 'y', 'z'])


Didn't realize this was hidden, thanks @jorisvandenbossche for pointing it out.

@TanyaaCJain this block is not shown to the user, can you move the clipdf data where it's first used, and the import to the first next block (assuming there are no more hidden blocks before.

Made the changes in the commit 18a9b7c.

- imported pandas and added its options in the first visible code block - moved the suppressed or invisible code block using clipdf just before the code block using the clipdf variable - imported `BytesIO` in its first occurrence of requirement unlike previously imported along with `StringIO` - imported os again for adding it on its first occurrence of requirement in a visible code block and is earlier imported in a suppressed code block.

…c-remove-header

TanyaaCJain · 2019-08-22T21:31:22Z

The new commit does this:

imported pandas and added its options in the first visible code block
moved the suppressed or invisible code block using clipdf just before the code block using the clipdf variable
imported BytesIO in its first occurrence of requirement unlike previously imported along with StringIO
imported os again for adding it on its first occurrence of requirement in a visible code block and is earlier imported in a suppressed code block.

I need help with:

np.random.seed(123456) is currently (in this PR) being called just before the first occurrence of np.random.randn() which is in a suppressed block. What is to be done?
- Add it again in a visible code block with first occurrence of np.random.randn() as done with import os?
- Let it only be in the suppressed code block?
I am not able to get any diff while comparing the current (master) html with the one in this PR. Can you help me with the code. Probably, not writing the right one.

Please suggest if there are any other changes as well.

jorisvandenbossche · 2019-08-23T08:00:16Z

doc/source/user_guide/io.rst

@@ -137,7 +128,9 @@ usecols : list-like or callable, default ``None``

  .. ipython:: python

-     from io import StringIO, BytesIO
+     import pandas as pd
+     pd.options.display.max_rows = 15


I think you can actually remove this line.

I quickly checked and there are only a few longer dataframes shown in this page, and they are all longer. So with the current default of only showing first/last 5 rows when truncated, this option has no effect (it would only have the effect that dataframes longer than 15 but shorter than 60 would still be truncated, but such dataframes are not present in this file)

Made the changes in commit d3036f3.

jorisvandenbossche · 2019-08-23T08:02:50Z

np.random.seed(123456)

In general, do we care much about setting this?

…the moment

…c-remove-header

datapythonista

I don't have a preference regarding the seed. I think for the long term would be nice to avoid examples with random data. But for now I think it's perfectly fine to remove the seed and have different examples every time (I don't think anybody cares, including search engines, and we the production docs are frozen between releases).

It'd also be ok to me if we decide to remove the numpy printooptions. I think data looks nicer with it, but if setting it can cause confusion to users, not a problem for me to get rid of it.

datapythonista · 2019-08-23T13:18:21Z

doc/source/user_guide/io.rst

+   :suppress:
+
+   clipdf = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': ['p', 'q', 'r']},
+                         index=['x', 'y', 'z'])


I'm not sure if this is really needed. Do you mind having a look @TanyaaCJain

Oh yes.. I missed mentioning this. Thanks @datapythonista for the reminder! Even I couldn't find this assignment be of use anywhere in the file. So, should we go ahead with getting rid of this?

I have he feeling that the read_clipboard was an ipython block before, and we had this clipdf and a to_clipboard before reading from the clipboard. But since not it's a python block and the code is written manually, I guess we can simply get rid of this.

TanyaaCJain · 2019-08-23T15:10:18Z

I am not able to get any diff while comparing the current (master) html with the one in this PR. Can you help me with the code. Probably, not writing the right one.

Also, @datapythonista can you help me with this?

datapythonista · 2019-08-23T15:20:50Z

Also, @datapythonista can you help me with this?

My idea was to go to an updated master branch git checkout master && git fetch upstream && git merge upstream/master, then generate the documentation (just this page if you want to save time) ./doc/make.py --single=user_guide/io.rst and then save somewhere the resulting html file doc/build/html/user_guide/io.html.

The, go to your branch, generate that page again and do a diff -u io_from_master.html io_without_header.html.

That should highlight everything that will be different for the user. Better to do it wihout removing the seed, otherwise all the data will be different.

Does that make sense?

TanyaaCJain · 2019-08-23T16:40:48Z

This is the diff.txt generated, can be viewed as html if the extension is changed. The file compared to original did not have the clipdf assignment along, but the numpy print option and seed. I'll commit the new changes after this.

…c-remove-header

datapythonista · 2019-08-23T20:30:38Z

didn't think about that, but the changes make the number of the In [x] and Out [x] of jupyter change, and the diff is huge, not so easy to see what changed. But thank you for it.

WillAyd · 2019-08-27T22:05:43Z

doc/source/user_guide/io.rst

@@ -137,7 +128,8 @@ usecols : list-like or callable, default ``None``

  .. ipython:: python

-     from io import StringIO, BytesIO
+     import pandas as pd


I realize I missed the original discussion but just to be clear, is the assumption here that we only show the import the first time it shows up one of the rst files and that the rest of the code blocks use it from there?

I don't think that's an assumption. It is implicitly being proposed, for simplicity and for inertia on how the header worked, but it's surely open to discussion. I thought about that too, and I was also wondering if it'd make sense to split the long pages we have now in the user guide in shorter pages. I guess that could make things easier if we finally make the code runnable.

datapythonista · 2019-09-10T15:40:52Z

Answering this comment from @WillAyd on the issue: #28038 (comment)

This is still open to discussion, but personally I much like this approach than the header. There are some advantages:

Simpler
Runnable code with no magic for users
If we make the code runnable with binder, not having headers/hidden code will probably help
Sphinx will report the correct line on errors
If what we did in this PR can extrapolate to the rest, there is not so much needed in the header anyway

I agree with your point that in a huge documentation page, having the import just at the beginning doesn't necessarily make sense, the document will often not be accessed sequentially. But discussed with @jorisvandenbossche, and seems reasonable to split very long pages. And with binder in mind, probably worth having the imports repeated at the beginning of main sections, rather than hidden.

So, while this PR may not be a perfect solution yet, I think it's a step forward in the right direction.

TomAugspurger · 2019-09-10T16:16:45Z

Agreed with Marc.

…

On Tue, Sep 10, 2019 at 10:41 AM Marc Garcia ***@***.***> wrote: Answering this comment from @WillAyd <https://github.com/WillAyd> on the issue: #28038 (comment) <#28038 (comment)> This is still open to discussion, but personally I much like this approach than the header. There are some advantages: - Simpler - Runnable code with no magic for users - If we make the code runnable with binder, not having headers/hidden code will probably help - Sphinx will report the correct line on errors - If what we did in this PR can extrapolate to the rest, there is not so much needed in the header anyway I agree with your point that in a huge documentation page, having the import just at the beginning doesn't necessarily make sense, the document will often not be accessed sequentially. But discussed with @jorisvandenbossche <https://github.com/jorisvandenbossche>, and seems reasonable to split very long pages. And with binder in mind, probably worth having the imports repeated at the beginning of main sections, rather than hidden. So, while this PR may not be a perfect solution yet, I think it's a step forward in the right direction. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28089?email_source=notifications&email_token=AAKAOIXHWWUQ3ALT6M46GLLQI65YZA5CNFSM4IOU7FTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6LRWUQ#issuecomment-529996626>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIQIAFX4ZXEVNEXKWOTQI65YZANCNFSM4IOU7FTA> .

TanyaaCJain · 2019-09-10T16:43:28Z

Would you all prefer to split up this file into smaller sections too or just want it for way more lengthier ones? Though I think splitting doc into sections would help to understand difference in file much easily, if preferred.

didn't think about that, but the changes make the number of the In [x] and Out [x] of jupyter change, and the diff is huge, not so easy to see what changed. But thank you for it.

I tried for a few other ways that could possibly generate a difference ignoring the ipython lines but none generated the desired output. I’ll update if I do find one.

WillAyd · 2019-09-10T19:58:10Z

If runnable code with binder is the main draw here (that's what I am inferencing at least, perhaps incorrectly) is there any reason why we don't defer moving things around like this until something like binder would actually get implemented?

TomAugspurger · 2019-09-10T20:01:45Z

Not sure where Marc is at on that, but I stopped my prototype at https://github.com/TomAugspurger/pandas-binder when I noticed that the `{{ header }}` stuff wasn't working. So from that attempt's POV, this is a blocker.

…

On Tue, Sep 10, 2019 at 2:58 PM William Ayd ***@***.***> wrote: If runnable code with binder is the main draw here (that's what I am inferencing at least, perhaps incorrectly) is there any reason why we don't defer moving things around like this until something like binder would actually get implemented? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28089?email_source=notifications&email_token=AAKAOIT3J3HUHTBE6H4BZYDQI735VA5CNFSM4IOU7FTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6MJYLQ#issuecomment-530095150>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIRBFLK5GSV6QPQWRCLQI735VANCNFSM4IOU7FTA> .

WillAyd · 2019-09-10T20:05:57Z

Sounds good. I'll defer to you all then as I think you have a better big picture view of this and I don't think anything here is a showstopper. Just wanted to offer an alternative view - thanks!

jorisvandenbossche · 2019-09-11T08:29:19Z

@WillAyd to come back to your concern (as it was not fully addressed IMO):

I'm not sure I am on board with the concept. Associating each import with its first reference in the rst feels a little arbitrary and I think is just confusing. Seems like it could be somewhat of a nuisance when moving things around; nothing major but in any case a step backwards from just knowing standard imports are done in the header

Currently, all standard imports (pandas, numpy) are at the beginning of each rst file, but hidden for the user. Now we still do them at the top of the file, but visible. So a net improvement, and not arbitrary.

Or at least, that's the case for the pandas import. But you are right that if the from io import BytesIO import is somewhere in the middle of the file, and then re-used without import near the end of the file in a totally different section, this can be confusing (not necessarily more confusing than not showing the import at all as it is now, though):

for writing docs, I don't think this is a problem, we have CI checking for this now to prevent errors
for really common imports (I think mainly numpy), it's maybe worth to still put this at the top of the file with the pandas import instead at the location of first usage? (in the end, that is also what I do in my notebook workflow)
for less common imports that we want to do at the place of usage, I think we can actually repeat this import multiple times if they happen in a single file but in unrelated sections.

And as Marc said, going to smaller files will also help with this.

jorisvandenbossche · 2019-09-11T08:30:24Z

Would you all prefer to split up this file into smaller sections too

@TanyaaCJain I think we want to work towards that, but that's certainly for another PR (for the IO page we have #10446 about this)

jreback · 2019-10-06T23:46:39Z

status on this PR @jorisvandenbossche @datapythonista

datapythonista · 2019-10-06T23:55:02Z

If nobody objects to remove the header in the docs, the seeds, and make the imports explicit, this is ready to be merged, and we can start working on the rest of the docs.

jorisvandenbossche · 2019-10-07T07:45:45Z

I am fully behind removing the header and moving the imports in actual code-blocks.

One thing that might be good to decide on before starting other PRs on the rest of the docs, is where to put those imports.
In the current PR, it is "the first code block where it is used" for everything.

I think that is fine for less common imports, but it could be an option to still put eg numpy and pandas (and matplotlib.pyplot for those rst files that use it) in a visible code block in the beginning of each file.

(but also fine to first try out with the approach as in this PR, and evaluate later about that question)

WillAyd · 2019-11-07T21:11:19Z

@datapythonista @jorisvandenbossche still want to merge?

TomAugspurger · 2019-11-07T21:13:42Z

Yeah, I think so. Another point, maybe already mentioned, is that explicit imports will fix the line-number reported by sphinx in errors.

…

On Thu, Nov 7, 2019 at 3:11 PM William Ayd ***@***.***> wrote: @datapythonista <https://github.com/datapythonista> @jorisvandenbossche <https://github.com/jorisvandenbossche> still want to merge? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28089?email_source=notifications&email_token=AAKAOIV7TW5CQC5DA6CRXS3QSSAABA5CNFSM4IOU7FTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDN2M4Y#issuecomment-551265907>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKAOIXHK67JQH2H4M4JQT3QSSAABANCNFSM4IOU7FTA> .

WillAyd · 2019-11-07T21:14:32Z

Thanks @TanyaaCJain

…ev#28089)

TanyaaCJain added 3 commits August 22, 2019 18:28

DOC: Make explicit in pandas IO doc the iports and options

5e0fbe3

- Removes code header in IO doc page. - Uses explicit imports in the IO page of user guide on first occurence of requirement. - In reference to pandas-dev#28038

Merge branch 'master' of https://github.com/pandas-dev/pandas

3591894

Merge branch 'master' of https://github.com/pandas-dev/pandas into do…

035e35b

…c-remove-header

datapythonista added the Docs label Aug 22, 2019

datapythonista reviewed Aug 22, 2019

View reviewed changes

TanyaaCJain added 2 commits August 23, 2019 02:40

Merge branch 'master' of https://github.com/pandas-dev/pandas into do…

1ff8b3f

…c-remove-header

jorisvandenbossche reviewed Aug 23, 2019

View reviewed changes

TanyaaCJain added 2 commits August 23, 2019 18:12

DOC: Removed pandas row display option because it is not required at …

d3036f3

…the moment

Merge branch 'master' of https://github.com/pandas-dev/pandas into do…

37d420a

…c-remove-header

datapythonista reviewed Aug 23, 2019

View reviewed changes

TanyaaCJain added 3 commits August 23, 2019 22:27

DOC: Remove unused clipdf assignment, numpy print options and seed

f80331e

Merge branch 'master' of https://github.com/pandas-dev/pandas into do…

faec275

…c-remove-header

DOC: Remove unused clipdf assignment, numpy print options and seed

6487a68

WillAyd reviewed Aug 27, 2019

View reviewed changes

WillAyd mentioned this pull request Sep 5, 2019

DOC: Remove docs code header and use explicit imports in the user guide #28038

Open

WillAyd added this to the 1.0 milestone Nov 7, 2019

WillAyd merged commit 7adc14a into pandas-dev:master Nov 7, 2019

Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019

DOC: Make explicit in pandas IO doc the imports and options (pandas-d…

566b48b

…ev#28089)

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

DOC: Make explicit in pandas IO doc the imports and options (pandas-d…

94a8cce

…ev#28089)

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

DOC: Make explicit in pandas IO doc the imports and options (pandas-d…

301403f

…ev#28089)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Make explicit in pandas IO doc the imports and options #28089

DOC: Make explicit in pandas IO doc the imports and options #28089

TanyaaCJain commented Aug 22, 2019 •

edited

Loading

datapythonista commented Aug 22, 2019

TomAugspurger commented Aug 22, 2019

datapythonista commented Aug 22, 2019

jorisvandenbossche commented Aug 22, 2019

datapythonista Aug 22, 2019

TanyaaCJain Aug 23, 2019

TanyaaCJain commented Aug 22, 2019

jorisvandenbossche Aug 23, 2019

TanyaaCJain Aug 23, 2019

jorisvandenbossche commented Aug 23, 2019

datapythonista left a comment

datapythonista Aug 23, 2019

TanyaaCJain Aug 23, 2019

datapythonista Aug 23, 2019

TanyaaCJain commented Aug 23, 2019

datapythonista commented Aug 23, 2019

TanyaaCJain commented Aug 23, 2019 •

edited

Loading

datapythonista commented Aug 23, 2019

WillAyd Aug 27, 2019

datapythonista Aug 27, 2019

datapythonista commented Sep 10, 2019

TomAugspurger commented Sep 10, 2019 via email

TanyaaCJain commented Sep 10, 2019

WillAyd commented Sep 10, 2019

TomAugspurger commented Sep 10, 2019 via email

WillAyd commented Sep 10, 2019

jorisvandenbossche commented Sep 11, 2019

jorisvandenbossche commented Sep 11, 2019

jreback commented Oct 6, 2019

datapythonista commented Oct 6, 2019

jorisvandenbossche commented Oct 7, 2019

WillAyd commented Nov 7, 2019

TomAugspurger commented Nov 7, 2019 via email

WillAyd commented Nov 7, 2019

DOC: Make explicit in pandas IO doc the imports and options #28089

DOC: Make explicit in pandas IO doc the imports and options #28089

Conversation

TanyaaCJain commented Aug 22, 2019 • edited Loading

datapythonista commented Aug 22, 2019

TomAugspurger commented Aug 22, 2019

datapythonista commented Aug 22, 2019

jorisvandenbossche commented Aug 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TanyaaCJain commented Aug 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Aug 23, 2019

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TanyaaCJain commented Aug 23, 2019

datapythonista commented Aug 23, 2019

TanyaaCJain commented Aug 23, 2019 • edited Loading

datapythonista commented Aug 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista commented Sep 10, 2019

TomAugspurger commented Sep 10, 2019 via email

TanyaaCJain commented Sep 10, 2019

WillAyd commented Sep 10, 2019

TomAugspurger commented Sep 10, 2019 via email

WillAyd commented Sep 10, 2019

jorisvandenbossche commented Sep 11, 2019

jorisvandenbossche commented Sep 11, 2019

jreback commented Oct 6, 2019

datapythonista commented Oct 6, 2019

jorisvandenbossche commented Oct 7, 2019

WillAyd commented Nov 7, 2019

TomAugspurger commented Nov 7, 2019 via email

WillAyd commented Nov 7, 2019

TanyaaCJain commented Aug 22, 2019 •

edited

Loading

TanyaaCJain commented Aug 23, 2019 •

edited

Loading