33 Commits (develop)
 

Author SHA1 Message Date
Sven Slootweg d98ee113bc Rewrite generic OCW parser, BeautifulSoup fix to allow exclusion of comments for string retrieval, and fix BS4 bug 11 years ago
Sven Slootweg 98340b38a0 Rewrite University of Reddit crawler - now with less hacks! 11 years ago
Sven Slootweg 8bbffb9429 Add topic_exists and item_exists methods to Scraper class 11 years ago
Sven Slootweg 0e4df4549f No need to import oursql from within the scrapers 11 years ago
Sven Slootweg 2c3bcc5418 Rewrite Khan Academy crawler 11 years ago
Sven Slootweg d9034b6215 Consistently use row_id, and not itemid or rowid 11 years ago
Sven Slootweg 8c0033074b Support both output logging and error logging in the Environment.log() method 11 years ago
Sven Slootweg b3edd35ecf Add support for lectures and sandboxes 11 years ago
Sven Slootweg d6d8eb70b9 Fix typo - it should be Khan Academy, not Khan University. 11 years ago
Sven Slootweg fb6c43a38f Rewrite scraper to be more modular, and convert the Coursera crawler to the new model 11 years ago
Sven Slootweg c2a8a66dac Update README to fix dependencies list 11 years ago
Sven Slootweg a690cb2c8f Add rudimentary first version of the OCW scraper 11 years ago
Sven Slootweg f188d443d1 Add README 11 years ago
Sven Slootweg 43c700ac2b Add list of various OCW sources for parser development 11 years ago
Sven Slootweg 26b68952fa Add table structure updates for new version of updater 11 years ago
Sven Slootweg a4e744f892 Add list of sources for book data 11 years ago
Sven Slootweg d3bd59f813 Add modified version of BeautifulSoup4 (nth-of-type pseudoselector and full-featured direct descendant support) 11 years ago
Sven Slootweg 8e951f6b27 Add simple script for searching from a terminal 11 years ago
Sven Slootweg d387541822 Support custom provider names 11 years ago
Sven Slootweg a6e350c0d9 Add dumping script 11 years ago
Sven Slootweg 0f5cade812 Simple dumper 11 years ago
Sven Slootweg fa74d394a7 Filter _ search terms 11 years ago
Sven Slootweg a9d2576eaf Add donation link 11 years ago
Sven Slootweg f57d45fa53 Add header message 11 years ago
Sven Slootweg 1503c1f75f Add 404 page 11 years ago
Sven Slootweg bfbfd821b5 Include a small preview in the search results 11 years ago
Sven Slootweg efeef5f70e Change search term requirements 11 years ago
Sven Slootweg 3f02174ba3 Implement some very basic methods to prevent overloading 11 years ago
Sven Slootweg 1fbb21e6d8 Properly use the password when connecting the crawlers 11 years ago
Sven Slootweg dd4c62bc4e Very basic error handling 11 years ago
Sven Slootweg 6ec1a2d90b Add crawlers for coursera and ureddit, get first quick and dirty version of frontend done, and fix buigs and stuff 11 years ago
Sven Slootweg 703a34bfa2 Reorganize updater code and add first design idea for frontend 11 years ago
Sven Slootweg 8152ec8dca First version of update script 11 years ago