Skip to content

A python tool to download PDFs of various data from Companies House. Requires pdfkit and wkhtmltopdf (https://pypi.org/project/pdfkit/)

License

Notifications You must be signed in to change notification settings

ChrisKneller/CompaniesHousePDFDownloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CompaniesHousePDFDownloader

A python tool to download PDFs of various data from Companies House. Requires pdfkit and wkhtmltopdf (https://pypi.org/project/pdfkit/).

This tool will download, for each company:

Usage

  • Install pdfkit and wkhtmltopdf. Instructions here and here.
  • Set up an account on Companies House and get your API key here.
  • Store your Companies House API key in environment variables as CH_API_KEY.
  • Ensure you understand HTTP Basic Authentication and the method used by the CH API.
  • Use this tool or a similar tool to generate your basic authentication header.
  • Store your generated authentication token (the "xxx" in "Basic xxx") in environment variables as CH_ACCESS_TOKEN.
  • Reboot as necessary to ensure the system picks up the new environment variables.
  • Clone this repo and cd into it:
git clone https://github.com/ChrisKneller/CompaniesHousePDFDownloader.git
cd CompaniesHousePDFDownloader
  • Save a csv file with all the companies you want to check with the company number as the first entry on each row (e.g. 03977902,Google UK Ltd,other,data,here,doesnt,matter). This line is included in the companies.csv file in the cloned repo.
  • Run the downloader on your chosen csv file:
python csv_run.py companies.csv

Contributing

Pull requests are welcome. There is a lot to do on this to make it function better (e.g. tidy up the try & except statements, handle errors better, making requests asynchronous etc.). For major changes, please open an issue first to discuss what you would like to change.

License

MIT

About

A python tool to download PDFs of various data from Companies House. Requires pdfkit and wkhtmltopdf (https://pypi.org/project/pdfkit/)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages