Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review help/usage for cli commands #802

Merged
merged 8 commits into from
Jan 15, 2025
Merged

Review help/usage for cli commands #802

merged 8 commits into from
Jan 15, 2025

Conversation

amritghimire
Copy link
Contributor

@amritghimire amritghimire commented Jan 7, 2025

The pattern followed is:

  • Descriptions: Complete sentences with periods
  • Help messages: Concise phrases without periods
  • Consistent terminology ("Iterative Studio")
  • Clear, standardized format for similar arguments

Final output:

datachain --help
usage: datachain [-h] [-V] command ...

DataChain: Wrangle unstructured AI data at scale.

options:
  -h, --help     show this help message and exit
  -V, --version  show program's version number and exit

Available Commands:
  command        Use `datachain command --help` for command-specific help
    studio       Manage Studio authentication
    job          Manage jobs in Studio
usage: datachain cp [-h] [--aws-endpoint-url AWS_ENDPOINT_URL] [--anon] [-u] [-v] [-q] [-f] [-r] [--no-glob] sources [sources ...] output

Copy data files from the cloud.

positional arguments:
  sources               Data sources - paths to cloud storage dirs
  output                Output

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
  -f, --force           Force creating outputs
  -r, -R, --recursive   Copy directories recursively
  --no-glob             Do not expand globs (such as * or ?)
usage: datachain clone [-h] [--aws-endpoint-url AWS_ENDPOINT_URL] [--anon] [-u] [-v] [-q] [-f] [-r] [--no-glob] [--no-cp] [--edatachain] [--edatachain-file EDATACHAIN_FILE] sources [sources ...] output

Copy data files from the cloud.

positional arguments:
  sources               Data sources - paths to cloud storage dirs
  output                Output

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
  -f, --force           Force creating outputs
  -r, -R, --recursive   Copy directories recursively
  --no-glob             Do not expand globs (such as * or ?)
  --no-cp               Do not copy files, just create a dataset
  --edatachain          Create a .edatachain file
  --edatachain-file EDATACHAIN_FILE
                        Use a different filename for the resulting .edatachain file
usage: datachain studio [-h] [--aws-endpoint-url AWS_ENDPOINT_URL] [--anon] [-u] [-v] [-q] {login,logout,team,token,dataset} ...

Manage authentication and settings for Studio. Configure tokens for sharing datasets and using Studio features.

positional arguments:
  {login,logout,team,token,dataset}
                        Use `datachain studio CMD --help` to display command-specific help
    login               Authenticate with Studio
    logout              Log out from Studio
    team                Set default team for Studio operations
    token               View Studio authentication token
    dataset             List available Studio datasets

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
usage: datachain job [-h] [--aws-endpoint-url AWS_ENDPOINT_URL] [--anon] [-u] [-v] [-q] {run,cancel,logs} ...

Commands to manage job execution in Studio.

positional arguments:
  {run,cancel,logs}     Use `datachain studio CMD --help` to display command-specific help
    run                 Run a job in Studio
    cancel              Cancel a job in Studio
    logs                Show job logs and status in Studio

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
usage: datachain dataset [-h] [--aws-endpoint-url AWS_ENDPOINT_URL] [--anon] [-u] [-v] [-q] {pull,edit,ls,rm,remove,stats} ...

Commands for managing datasets.

positional arguments:
  {pull,edit,ls,rm,remove,stats}
                        Use `datachain dataset CMD --help` to display command-specific help

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
usage: datachain ls [-h] [--aws-endpoint-url AWS_ENDPOINT_URL] [--anon] [-u] [-v] [-q] [-l] [--studio] [-L] [-a] [--team TEAM] [sources ...]

List storage contents.

positional arguments:
  sources               Data sources - paths to cloud storage dirs

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
  -l, --long            List files in long format
  --studio              List the files in the Studio
  -L, --local           List local files only
  -a, --all             List all files including hidden files
  --team TEAM           The team to list datasets for. By default, it will use team from config
usage: datachain du [-h] [--aws-endpoint-url AWS_ENDPOINT_URL] [--anon] [-u] [-v] [-q] [-b] [-d N] [--si] sources [sources ...]

Display space usage.

positional arguments:
  sources               Data sources - paths to cloud storage dirs

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
  -b, --bytes           Display sizes in bytes instead of human-readable sizes
  -d N, --depth N, --max-depth N
                        Display sizes up to N directory levels deep (default: 0, summarize provided directory only)
  --si                  Display sizes using powers of 1000 not 1024
usage: datachain find [-h] [--aws-endpoint-url AWS_ENDPOINT_URL] [--anon] [-u] [-v] [-q] [--name NAME] [--iname INAME] [--path PATH] [--ipath IPATH] [--size SIZE] [--type TYPE] [-c COLUMNS]
                      sources [sources ...]

Search in a directory hierarchy.

positional arguments:
  sources               Data sources - paths to cloud storage dirs

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
  --name NAME           Match filename pattern
  --iname INAME         Match filename pattern (case insensitive)
  --path PATH           Path to match pattern
  --ipath IPATH         Like -path but case insensitive
  --size SIZE           Filter by size (+ is greater or equal, - is less or equal). Specified size is in bytes, or use a suffix like K, M, G for kilobytes, megabytes, gigabytes, etc
  --type TYPE           File type: "f" - regular, "d" - directory
  -c COLUMNS, --columns COLUMNS
                        A comma-separated list of columns to print for each result. Options are: du,name,path,size,type (Default: path)
datachain index --help
usage: datachain index [-h] [--aws-endpoint-url AWS_ENDPOINT_URL] [--anon] [-u] [-v] [-q] sources [sources ...]

Index storage location.

positional arguments:
  sources               Data sources - paths to cloud storage dirs

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
usage: datachain show [-h] [--aws-endpoint-url AWS_ENDPOINT_URL] [--anon] [-u] [-v] [-q] [--version VERSION] [--schema] [--limit LIMIT] [--offset OFFSET] [--columns COLUMNS] [--no-collapse] name

Create a new dataset with a query script.

positional arguments:
  name                  Dataset name

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
  --version VERSION     Dataset version
  --schema              Show schema
  --limit LIMIT         Number of rows to show
  --offset OFFSET       Number of rows to offset
  --columns COLUMNS     Columns to show
  --no-collapse         Do not collapse the columns
usage: datachain query [-h] [--aws-endpoint-url AWS_ENDPOINT_URL] [--anon] [-u] [-v] [-q] [--parallel [N]] [-p param=value] <script.py>

Create a new dataset with a query script.

positional arguments:
  <script.py>           Filepath for script

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
  --parallel [N]        Use multiprocessing to run any query script UDFs with N worker processes. N defaults to the CPU count
  -p param=value, --param param=value
                        Query parameters
datachain clear-cache --help
usage: datachain clear-cache [-h] [--aws-endpoint-url AWS_ENDPOINT_URL] [--anon] [-u] [-v] [-q]

Clear the local file cache.

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
datachain gc --help
usage: datachain gc [-h] [--aws-endpoint-url AWS_ENDPOINT_URL] [--anon] [-u] [-v] [-q]

Garbage collect temporary tables.

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
datachain completion --help
usage: datachain completion [-h] [--aws-endpoint-url AWS_ENDPOINT_URL] [--anon] [-u] [-v] [-q] [-s {bash,zsh,tcsh}]

Output shell completion script.

options:
  -h, --help            show this help message and exit
  --aws-endpoint-url AWS_ENDPOINT_URL
                        AWS endpoint URL
  --anon                AWS anon (aka awscli's --no-sign-request)
  -u, --update          Update cache
  -v, --verbose         Verbose
  -q, --quiet           Be quiet
  -s {bash,zsh,tcsh}, --shell {bash,zsh,tcsh}
                        Shell syntax for completions

Closes #773

@amritghimire amritghimire self-assigned this Jan 7, 2025
@amritghimire amritghimire requested review from ilongin, dreadatour and a team January 7, 2025 14:41
Copy link

codecov bot commented Jan 7, 2025

Codecov Report

Attention: Patch coverage is 64.70588% with 12 lines in your changes missing coverage. Please review.

Project coverage is 87.51%. Comparing base (6ec58f9) to head (fbd6eb2).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/datachain/cli/__init__.py 0.00% 4 Missing and 2 partials ⚠️
src/datachain/studio.py 14.28% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #802      +/-   ##
==========================================
- Coverage   87.61%   87.51%   -0.11%     
==========================================
  Files         128      128              
  Lines       11324    11326       +2     
  Branches     1530     1533       +3     
==========================================
- Hits         9922     9912      -10     
- Misses       1020     1028       +8     
- Partials      382      386       +4     
Flag Coverage Δ
datachain 87.45% <64.70%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/datachain/cli/parser/__init__.py Outdated Show resolved Hide resolved
src/datachain/cli/parser/job.py Outdated Show resolved Hide resolved
src/datachain/cli/parser/job.py Outdated Show resolved Hide resolved
@shcheklein
Copy link
Member

@amritghimire thanks! can you please put into description the end result (dump from CLI) please?

Copy link

cloudflare-workers-and-pages bot commented Jan 8, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: fbd6eb2
Status: ✅  Deploy successful!
Preview URL: https://022f8f52.datachain-documentation.pages.dev
Branch Preview URL: https://amrit-fix-typos.datachain-documentation.pages.dev

View logs

Copy link
Contributor

@dreadatour dreadatour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@shcheklein
Copy link
Member

@amritghimire thanks, I've started the document - https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?tab=t.0 . Let's iterate there please.

The pattern followed is:
- Descriptions: Complete sentences with periods
- Help messages: Concise phrases without periods
- Consistent terminology ("Iterative Studio")
- Clear, standardized format for similar arguments
@amritghimire
Copy link
Contributor Author

@amritghimire thanks, I've started the document - https://docs.google.com/document/d/1_QeMQ1NsguHSRSyJpF2n-1s57SHSuiyGzl10q9d1UaE/edit?tab=t.0 . Let's iterate there please.

Updated the PR per the document

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's please update the google doc with the new set of outputs, merge this, and do the next iteration. Thanks for driving this @amritghimire !

@amritghimire amritghimire merged commit 57899d2 into main Jan 15, 2025
37 of 38 checks passed
@amritghimire amritghimire deleted the amrit/fix-typos branch January 15, 2025 03:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Review help/usage for cli commands
4 participants