Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add length to progress bar #4100

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

Conversation

xsuchy
Copy link
Contributor

@xsuchy xsuchy commented Jan 16, 2025

so the progress bar shows real progress

Fixes #3342

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
    The testsuite is broken for me and produce lots of not relevant errors. I will appreciate testing.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • [-] Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Miroslav and thanks!
There is an issue wrt. to making this work with large codebase. Even a linux kernel tarball may be problematic... we may need to find a better way.

@@ -1176,10 +1176,11 @@ def run_scanners(

progress_manager = None
if not quiet:
resources = list(((r.location, r.path) for r in codebase.walk() if r.is_file))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a problem when we have a codebase with 10 million files, we may need to find a better way? ... otherwise the system will literally be stalled for minutes before displaying a progress bar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the kernel, the delay is 45 sec until it traverses 200k files. While the scan itself runs several hours.
With -q there is no delay.

echo_func(f'Scan files for: {scan_names} with {processes} process(es)...')
item_show_func = partial(path_progress_message, verbose=verbose)
progress_manager = partial(progressmanager,
item_show_func=item_show_func,
item_show_func=item_show_func, length=len(resources)*2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the times 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because #3344 Otherwise the progressbar go to 170/170 and then continue to 340/170. This *2 can be removed when we find when we find where is the count done twice. I have a suspicion, but it needs more testing.

@xsuchy
Copy link
Contributor Author

xsuchy commented Jan 16, 2025

Several things I considered as an option:

  1. instead resources = list(((r.location, r.path) for r in codebase.walk() if r.is_file)) use simpler resources = list((1 for r in codebase.walk() if r.is_file)) because we just need the count. This is consume less memory, but does not give any measurable time.

  2. pass the resources to scan_codebase so we do not need to count it twice. Hmm. I will do this. PR updated.

@xsuchy
Copy link
Contributor Author

xsuchy commented Jan 17, 2025

I tried to improve the performance of walk() in aboutcode-org/commoncode#78

so progress bar shows real progress

Signed-off-by: Miroslav Suchý <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

display total number of files being scanned above progress bar
2 participants