Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URLHeadBear.py: Use robots.txt #2891

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

PrajwalM2212
Copy link
Member

Requests that are not allowed by robots.txt
are reported.

Closes #1782

For short term contributors: we understand that getting your commits well
defined like we require is a hard task and takes some learning. If you
look to help without wanting to contribute long term there's no need
for you to learn this. Just drop us a message and we'll take care of brushing
up your stuff for merge!

Checklist

  • I read the commit guidelines and I've followed
    them.
  • I ran coala over my code locally. (All commits have to pass
    individually.
    It is not sufficient to have "fixup commits" on your PR,
    our bot will still report the issues for the previous commit.) You will
    likely receive a lot of bot comments and build failures if coala does not
    pass on every single commit!

After you submit your pull request, DO NOT click the 'Update Branch' button.
When asked for a rebase, consult coala.io/rebase
instead.

Please consider helping us by reviewing other peoples pull requests as well:

The more you review, the more your score will grow at coala.io and we will
review your PRs faster!

Requests that are not allowed by robots.txt
are reported.

Closes coala#1782
@PrajwalM2212
Copy link
Member Author

Screen Shot 2019-03-16 at 9 51 56 AM

@PrajwalM2212
Copy link
Member Author

PrajwalM2212 commented Mar 16, 2019

Tests have not been added.

I wanted to know if this approach is okay before modifying existing tests.

Copy link
Contributor

@frextrite frextrite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. You can go ahead and add tests now.

But I have one suggestion. Do we really need to show a message to the user if the robots.txt doesn't allow crawling a specific sub-directory? Since it would give a false belief to the user that the mentioned link doesn't work and should be changed, whereas it actually meant that no crawlers are allowed to visit the specific page.

@PrajwalM2212
Copy link
Member Author

PrajwalM2212 commented Mar 18, 2019

@frextrite The message has to be shown. The issue asks the situation to be reported.
Also I will wait for maintainers to see if this approach is okay as this bear is used by lot of other bears.
Thanks for the review 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

URLHeadBear should use robots.txt
3 participants