Skip to content

Releases: spider-rs/spider

v2.26.19

18 Jan 15:09
Compare
Choose a tag to compare

Whats Changed

  1. add auto find sitemap url on 404 or network error.
  2. fix chrome_cache_hybrid compile.
  3. add cache_chrome_hybrid_mem flag to use memory instead of disk.
  4. fix q draining across website methods
  5. fix crawl depth handling

Full Changelog: v2.26.1...v2.26.19

v2.26.1

11 Jan 03:08
Compare
Choose a tag to compare

Whats Changed

This release brings performance improvements by skipping URL parsing per page.
You can now also pass in a second param to the page link methods to collect the links with a new domain target.
Targeting the correct root domain for parsing the links is now handled across features.

If you used page::Page::take_url directly you may need to call page::Page::set_url_parsed_direct_empty() first or the page::Page::get_url_parsed() method.

  1. perf(cli): add page links direct return
  2. cli(scrape): now outputs full page links

Full Changelog: v2.24.15...v2.26.1

v2.24.15

04 Jan 17:53
Compare
Choose a tag to compare

Whats Changed

Add a callback to perform validation using spider::page::Page.
You can now use the basic feature flag to easily disable io-uring on linux and still get the default features with "default-features = false".

  1. feat(website): add on_should_crawl_callback [#241]
  2. feat(page): add blocked_crawl [#242]
  3. chore(disk): fix cfg aho_corasick
  4. chore(fs): remove tentril crate
  5. chore(page): fix crawling initial redirects
  6. chore(chrome): fix compile fs flag
  7. feat(cargo): add basic feature flag
  8. chore(connect): fix compile missing libc
  9. feat(page): add page_error_status_details
  10. perf(page): remove parsing url directly

Full Changelog: v2.23.7...v2.24.15

v2.23.7

31 Dec 11:50
Compare
Choose a tag to compare

Whats Changed

Linux now uses io_uring for the DNS connect phase.

If you do not have a recent version of linux installed disable the feature flag io_uring.

  • feat(io_uring): add io_uring for connect_phase linux
  • chore(fs): fix feature flag compile fs

Full Changelog: v2.22.19...v2.23.7

v2.22.19

24 Dec 15:46
Compare
Choose a tag to compare

Whats Changed

This release brings in a SQLite for improved memory handling with the feature flags disk_native_tls, disk, and disk_aws.
SQLite is set to be used in a hybrid manner with memory in order to maintain performance.

With disk handling and our string interning urls crawled can entire the billions of resources or infinite with EFS attached.

Other Changes

  • chore(website,page): fix concurrent initial scoped access to lazy_static!
  • chore(chrome): add more network block layers for chrome
  • chore(chrome): remove smart mode default idle_dom usage
  • perf(page): dynamic rewriter chunk size
  • chore(website): add connect layer concurrency limit apply
  • perf(runtime): add dedicated thread for request connect

Full Changelog: v2.21.33...v2.22.19

v2.21.33

18 Dec 11:53
Compare
Choose a tag to compare

Whats Changed

Fix http crawling past first page
Fix safe handling abs urls

Full Changelog: v2.21.27...v2.21.33

v2.21.27

10 Dec 12:21
Compare
Choose a tag to compare

Whats Changed

  • add balance feature flag to switch to global semaphores.
  • add remote_addr feature flag to get the page remote address / ip.
  • add re-usable chrome client and stream ws
  • fix chrome inline page navigations
  • fix constraining html only pages

Full Changelog: v2.20.6...v2.21.27

v2.20.6

05 Dec 14:11
Compare
Choose a tag to compare

Whats Changed

  • fix chrome initial page return links
  • add hydration scrips ignore for next.js and astro
  • add base script targets for smart mode
  • add custom domain layer interception for giants
  • add interception analytics and ads blocking
  • fix chrome page timeout bytes transferring

Full Changelog: v2.16.0...v2.20.6

v2.16.0

04 Dec 18:11
Compare
Choose a tag to compare

Whats Changed

  • Chrome crawls now get the total bytes used over the network.
  • Improved ignore list for unwanted crawling request for chrome interception.

Full Changelog: v2.15.0...v2.16.0

v2.15.0

04 Dec 14:37
Compare
Choose a tag to compare

Whats Changed

Major possible performance increase for chrome crawling blocking extra unwanted XHR request and scripts.

  • perf(chrome): add xhr interception

Full Changelog: v2.14.0...v2.15.0