16 Commits (develop)

Author SHA1 Message Date
Sven Slootweg d98ee113bc Rewrite generic OCW parser, BeautifulSoup fix to allow exclusion of comments for string retrieval, and fix BS4 bug 12 years ago
Sven Slootweg 98340b38a0 Rewrite University of Reddit crawler - now with less hacks! 12 years ago
Sven Slootweg 8bbffb9429 Add topic_exists and item_exists methods to Scraper class 12 years ago
Sven Slootweg 0e4df4549f No need to import oursql from within the scrapers 12 years ago
Sven Slootweg 2c3bcc5418 Rewrite Khan Academy crawler 12 years ago
Sven Slootweg d9034b6215 Consistently use row_id, and not itemid or rowid 12 years ago
Sven Slootweg 8c0033074b Support both output logging and error logging in the Environment.log() method 12 years ago
Sven Slootweg b3edd35ecf Add support for lectures and sandboxes 12 years ago
Sven Slootweg fb6c43a38f Rewrite scraper to be more modular, and convert the Coursera crawler to the new model 12 years ago
Sven Slootweg a690cb2c8f Add rudimentary first version of the OCW scraper 12 years ago
Sven Slootweg d3bd59f813 Add modified version of BeautifulSoup4 (nth-of-type pseudoselector and full-featured direct descendant support) 12 years ago
Sven Slootweg d387541822 Support custom provider names 12 years ago
Sven Slootweg 1fbb21e6d8 Properly use the password when connecting the crawlers 12 years ago
Sven Slootweg 6ec1a2d90b Add crawlers for coursera and ureddit, get first quick and dirty version of frontend done, and fix buigs and stuff 12 years ago
Sven Slootweg 703a34bfa2 Reorganize updater code and add first design idea for frontend 12 years ago
Sven Slootweg 8152ec8dca First version of update script 12 years ago