Skip to content

Script in python made to scrape Linked Open Vocabularies

Notifications You must be signed in to change notification settings

Matt-81/LOVScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LOVScraper

Description

Script in python made to scrape Linked Open Vocabularies to get:

  • all the vocabulary's .n3 files (all of which saved in the directory vocabs/)
  • other metadata information about the vocabularies (saved in LOV.xlsx in various sheets)

How to use

First of all we need to install python and pip Then we need to install a few libraries, such as:

  • time
  • json
  • re
  • requests
  • urllib
  • beautifulsoup4
  • xlsxwriter
  • xlrd
  • pandas

Results

In the end, just executing LOVScraper.py and letting it run for about half an hour, we will get:

  • a folder named vocabs/ with inside all the available vocabularies of LOV
  • a file LOV.xlsx with all the metadata information obtained from all the scraped vocabularies' pages
  • a file log.txt with information about what has gone wrong scraping LOV (missing .n3 file or bad links to .n3 files)

About

Script in python made to scrape Linked Open Vocabularies

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages