Sven Slootweg
|
d98ee113bc
|
Rewrite generic OCW parser, BeautifulSoup fix to allow exclusion of comments for string retrieval, and fix BS4 bug
|
12 years ago |
Sven Slootweg
|
98340b38a0
|
Rewrite University of Reddit crawler - now with less hacks!
|
12 years ago |
Sven Slootweg
|
8bbffb9429
|
Add topic_exists and item_exists methods to Scraper class
|
12 years ago |
Sven Slootweg
|
0e4df4549f
|
No need to import oursql from within the scrapers
|
12 years ago |
Sven Slootweg
|
2c3bcc5418
|
Rewrite Khan Academy crawler
|
12 years ago |
Sven Slootweg
|
d9034b6215
|
Consistently use row_id, and not itemid or rowid
|
12 years ago |
Sven Slootweg
|
8c0033074b
|
Support both output logging and error logging in the Environment.log() method
|
12 years ago |
Sven Slootweg
|
b3edd35ecf
|
Add support for lectures and sandboxes
|
12 years ago |
Sven Slootweg
|
fb6c43a38f
|
Rewrite scraper to be more modular, and convert the Coursera crawler to the new model
|
12 years ago |
Sven Slootweg
|
a690cb2c8f
|
Add rudimentary first version of the OCW scraper
|
12 years ago |
Sven Slootweg
|
d3bd59f813
|
Add modified version of BeautifulSoup4 (nth-of-type pseudoselector and full-featured direct descendant support)
|
12 years ago |
Sven Slootweg
|
d387541822
|
Support custom provider names
|
12 years ago |
Sven Slootweg
|
1fbb21e6d8
|
Properly use the password when connecting the crawlers
|
12 years ago |
Sven Slootweg
|
6ec1a2d90b
|
Add crawlers for coursera and ureddit, get first quick and dirty version of frontend done, and fix buigs and stuff
|
12 years ago |
Sven Slootweg
|
703a34bfa2
|
Reorganize updater code and add first design idea for frontend
|
12 years ago |
Sven Slootweg
|
8152ec8dca
|
First version of update script
|
12 years ago |