diff --git a/docs/developers.md b/docs/developers.md
new file mode 100644
index 0000000..bf4d93d
--- /dev/null
+++ b/docs/developers.md
@@ -0,0 +1,363 @@
+# Documentation for developers
+
+The majority of this document will apply to both third-party developers ("users") of the resolv library, and core/plugin developers writing code for resolv itself. Where necessary, a distinction is made using the terms "users" (for third-party developers that use resolv in their project as a library) and "developers" (for developers that work on either the core or the plugins for the resolv library).
+
+## Purpose
+
+The purpose of python-resolv is quite simple: to provide a reusable library for resolving URLs. "Resolving" in this context refers to various things; for example:
+
+ * The resolution of an obfuscated 'external' 1channel URL to the real URL of a stream.
+ * The resolution of a YouTube URL to directly streamable video files that can for example be downloaded via wget, or streamed via VLC Media Player (in various qualities).
+ * The resolution of a Mediafire page URL to a wget-able direct URL.
+ * The resolution of a Pastebin URL to a 'raw' version, including the fetching of the title.
+ * And so on, and so on...
+
+Basically, resolv's purpose is to turn any kind of URL into the most 'direct' URL that can be acquired for either streaming or downloading, in such a way that it can easily be integrated into third-party software (such as a download manager, media player, etc.)
+
+## Technical summary
+
+The resolv library is a Python module - this means it can be imported like any other module and used in any Python application or application that supports Python scripting. Each "resolver" - a 'plugin' to resolve URLs for a certain service - is its own class, inheriting from the Task base class. A task may either be finished immediately, or require further user input (for example, a password or a CAPTCHA solution). The final result is a nested dictionary with the information that is necessary for downloading or streaming. The library can be kept up to date independently via its PyPi packages.
+
+## Currently supported services:
+
+
+
+ Name |
+ Description |
+
+
+ blank |
+ The run() method has not been called yet. |
+
+
+ finished |
+ The URL was successfully resolved. |
+
+
+ need_password |
+ A password is required to resolve this URL. |
+
+
+ password_invalid |
+ The provided password was incorrect. |
+
+
+ need_captcha |
+ A CAPTCHA needs to be solved to continue resolving. |
+
+
+ captcha_invalid |
+ The given CAPTCHA response was incorrect. |
+
+
+ invalid |
+ The URL is invalid for this resolver. |
+
+
+ unsupported |
+ This type of URL is not supported by this resolver. |
+
+
+ failed |
+ The resolution failed for some other reason. |
+
+
+
+How to handle these situations is up to your application.
+
+### Task.result_type
+
+This variable holds the type of result that the Task holds. It can be any of `url` (for deobfuscated and un-shortened URLs), `file` (for downloadable files), `text` (for pastebins and such), `video` (for streaming video), `audio` (for streaming audio), and `image` for embeddable images. A special type is `dummy` which is used by the `dummy` resolver, but may also appear in other resolvers for testing purposes. For all practical purposes, `dummy` results should be ignored.
+
+__Important:__ Do *not* use this variable to determine whether resolution was successful. A resolver may set this variable before doing any resolution, if the resolver only supports one kind of result.
+
+### Task.results
+
+This variable holds the results of the resolution. The format of these results will differ depending on the result type. When successfully resolving a URL, the results will always be in the form of a dictionary.
+
+Further documentation on the structure of these dictionaries for each result type, can be found in structures.md.
+
+### Task.captcha
+
+If solving a CAPTCHA is required (as indicated by the `need_captcha` state), this variable will hold a Captcha object. The Captcha class is documented further down this document.
+
+### Task.cookiejar
+
+The cookielib Cookie Jar that is used for this task.
+
+### Task.run()
+
+*Returns:* An instance of a Task-derived class, usually itself.
+
+Runs the task.
+
+### Task.fetch_page(url)
+
+*Returns:* A string containing the resulting data.
+
+Does a GET request to the specified `url`, using the Cookie Jar for the task. When manually making GET requests related to a task, always use this function to ensure that session information is retained.
+
+### Task.post_page(url, data)
+
+*Returns:* A string containing the resulting data.
+
+Does a POST request to the specified `url`, using the Cookie Jar for the task. The `data` argument should be a dictionary of POST fields. When manually making POST requests related to a task, always use this function to ensure that session information is retained.
+
+### Task.verify_password(password)
+
+*Returns:* An instance of a Task-derived class, usually itself.
+
+Continues the task, using the provided password. Essentially works the same as the run() method. Password validity is checked via the `state` variable. This function is only available for resolvers that support password-protected URLs.
+
+### Task.verify_image_captcha(solution)
+
+*Returns:* An instance of a Task-derived class, usually itself.
+
+Continues the task, using the provided image CAPTCHA solution. Essentially works the same as the run() method. CAPTCHA solution validity is checked via the `state` variable. This function is only available for resolvers that support CAPTCHA handling.
+
+### Task.verify_audio_captcha(solution)
+
+*Returns:* An instance of a Task-derived class, usually itself.
+
+Continues the task, using the provided audio CAPTCHA solution. Essentially works the same as the run() method. CAPTCHA solution validity is checked via the `state` variable. This function is only available for resolvers that support CAPTCHA handling.
+
+### Task.verify_text_captcha(solution)
+
+*Returns:* An instance of a Task-derived class, usually itself.
+
+Continues the task, using the provided text CAPTCHA solution. Essentially works the same as the run() method. CAPTCHA solution validity is checked via the `state` variable. This function is only available for resolvers that support CAPTCHA handling.
+
+## CAPTCHA handling
+
+If a site requires a CAPTCHA to be solved before you can fully resolve the URL, the state will be set to `need_captcha`. The resolv library does not process CAPTCHAs itself; it simply provides you with the CAPTCHA data so that you can figure out some way to solve it. The `Task.captcha` variable will hold a Captcha object that has everything you will need. To provide a solution for a CAPTCHA, use the appropriate method in the Task instance (see above).
+
+### Captcha.task
+
+This variable will hold a reference to the `Task` this CAPTCHA belongs to.
+
+### Captcha.text
+
+This variable will either be `None` (if no text version of the CAPTCHA was available) or the text challenge as a string.
+
+### Captcha.image
+
+This variable holds `None` or the URL for the image CAPTCHA. __Do NOT use this variable unless you know what you're doing - the majority of image CAPTCHAs are tied to an IP address and set of cookies. You should use the get_image() method for this.__
+
+### Captcha.audio
+
+This variable holds `None` or the URL for the audio CAPTCHA. __Do NOT use this variable unless you know what you're doing - the majority of audio CAPTCHAs are tied to an IP address and set of cookies. You should use the get_audio() method for this.__
+
+### Captcha.get_image()
+
+*Returns:* a tuple containing (file type, binary image data).
+
+You can save the output of this method to a file, or send it elsewhere, to further process the image CAPTCHA.
+
+### Captcha.get_audio()
+
+*Returns:* a tuple containing (file type, binary audio data).
+
+You can save the output of this method to a file, or send it elsewhere, to further process the audio CAPTCHA.
+
+### Some ideas for terminal-based CAPTCHA solving
+
+When writing a terminal-based download application, you often can't just display a CAPTCHA to the end user. A few suggestions to work around this:
+
+* Use a third-party CAPTCHA solving service to cover whatever CAPTCHAs can be covered.
+* Implement a web interface for the application in its entirety.
+* Convert the image CAPTCHA to colored text (the ASCII art approach) to display it on a terminal.
+* Start a temporary HTTP daemon that serves the CAPTCHA and terminates when the CAPTCHA has been solved.
+
+## Resolver-specific documentation
+
+### YouTube
+
+The YouTube resolver provides some specific custom keys for each video result: `itag` (a format identifier used by YouTube internally), `fallback_host`, and a YouTube-supplied `mimetype` definition containing encoding details.
+
+## Documentation specific to plugin (resolver) developers
+
+### Getting started
+
+1. Clone the repository.
+2. Look at existing resolvers, especially dummy.py to see the basic format for a resolver.
+3. Modify a resolver or make your own.
+4. Create a pull request to have your changes merged into the main repository (if you want to).
+
+### Things to keep in mind
+
+* ResolverError exceptions must always contain a user-friendly description.
+* TechnicalError exceptions do not have to be user-friendly, but they must be clear.
+* Don't forget to set metadata in your resolver class!
+* Adhere to the standard formats for results - if you want to return something for which no suitable format exists, change the documentation to add your format and make a pull request to have it added in - this way you can be sure that applications can handle your format in the future.
+* For the sake of consistency, all code, comments, and error messages should be in English.
+* Always set the state of a Task to `failed`, `unsupported` or `invalid` depending on the problem, before raising an exception.
+* When specifying a HTTP method, always use *uppercase* characters (GET, POST).
+
+### Whether to use the failed, unsupported or invalid state
+
+The `invalid` state is intended for situations where it is *certain* that the input (URL) was invalid. For example, the homepage of a filehost instead of a URL to a certain file, or an entirely different site altogether. If the URL is malformed in some way, you may also use this state. If you cannot be entirely sure whether the URL is invalid or whether there was another problem, use the `failed` state. An example of this would be a 'not authorized' page - the URL may be invalid, but it may also be possible that there is simply no public access.
+
+The `unsupported` state is intended for situation where the URL that is provided cannot be resolved because a certain feature needed for this is not available. Examples include a CAPTCHA on a site for which the resolver has no CAPTCHA handling, or a file download on a site for which the resolver only supports resolving video streams. Use of this state should always be temporary - at some point the required functionality should be implemented.
+
+The `failed` state is for everything else.
\ No newline at end of file
diff --git a/docs/structures.md b/docs/structures.md
new file mode 100644
index 0000000..3e6b30b
--- /dev/null
+++ b/docs/structures.md
@@ -0,0 +1,305 @@
+## URLs
+
+
+
+ Key |
+ Description |
+
+
+ url |
+ URL of the video file. |
+
+
+ method |
+ The method to be used for retrieving this URL (either GET or POST). |
+
+
+ postdata |
+ (optional) The POST data to send if the method to be used is POST. This data is in dictionary form. |
+
+
+ quality |
+ A textual description of the video quality (this will typically be along the lines of `360p`, `720p`, `1080p`, `low`, `medium`, `high`, etc, but any value
+ is possible). If the quality is not specified, this will be set to `unknown`. Don't parse this programmatically - use the `priority` field instead. |
+
+
+ format |
+ The name of the file format for this video, along the lines of `webm`, `mp4`, `3gp`, `flv`, `wmv`, etc. While this value should typically be pretty consistent,
+ different abbreviations may be used for different resolvers. It's probably not a good idea to automatically parse these unless you know the exact values
+ a resolver will return. This may be set to `unknown`. |
+
+
+ priority |
+ The priority for this video file. Higher quality video has a lower 'priority'. To always get the highest quality video, go for the URL with the lowest
+ priority (this may not always be 1). |
+
+
+ extra |
+ This is a dictionary that may contain any custom data provided by the specific resolver that is used. Refer to the resolver-specific documentation for this. |
+
+
+
+## Audio
+
+
+
+ Key |
+ Description |
+
+
+ url |
+ URL of the audio file. |
+
+
+ method |
+ The method to be used for retrieving this URL (either GET or POST). |
+
+
+ postdata |
+ (optional) The POST data to send if the method to be used is POST. This data is in dictionary form. |
+
+
+ quality |
+ A textual description of the audio quality (this will typically be along the lines of `low`, `medium`, `high`, `lossless`, etc, but any value is possible). If
+ the quality is not specified, this will be set to `unknown`. Don't parse this programmatically - use the `priority` field instead. |
+
+
+ format |
+ The name of the file format for this audio file, along the lines of `mp3`, `flac`, `midi`, `ogg`, etc. While this value should typically be pretty consistent,
+ different abbreviations may be used for different resolvers. It's probably not a good idea to automatically parse these unless you know the exact values
+ a resolver will return. This may be set to `unknown`. |
+
+
+ priority |
+ The priority for this audio file. Higher quality audio has a lower 'priority'. To always get the highest quality audio file, go for the URL with the lowest
+ priority (this may not always be 1). |
+
+
+ extra |
+ This is a dictionary that may contain any custom data provided by the specific resolver that is used. Refer to the resolver-specific documentation for this. |
+
+
+
+## Images
+
+
+
+ Key |
+ Description |
+
+
+ url |
+ URL of the image. |
+
+
+ method |
+ The method to be used for retrieving this URL (either GET or POST). |
+
+
+ postdata |
+ (optional) The POST data to send if the method to be used is POST. This data is in dictionary form. |
+
+
+ quality |
+ A textual description of the image quality (this will typically be along the lines of `low`, `medium`, `high`, `lossless`, etc, but any value is possible). If
+ the quality is not specified, this will be set to `unknown`. Don't parse this programmatically - use the `priority` field instead. |
+
+
+ format |
+ The name of the file format for this image, along the lines of `jpg`, `png`, `psd`, `svg`, etc. While this value should typically be pretty consistent,
+ different abbreviations may be used for different resolvers. It's probably not a good idea to automatically parse these unless you know the exact values
+ a resolver will return. This may be set to `unknown`. |
+
+
+ priority |
+ The priority for this image. Higher quality images have a lower 'priority'. To always get the highest quality image, go for the URL with the lowest
+ priority (this may not always be 1). |
+
+
+ extra |
+ This is a dictionary that may contain any custom data provided by the specific resolver that is used. Refer to the resolver-specific documentation for this. |
+
+
+
+## Files
+
+
+
+ Key |
+ Description |
+
+
+ url |
+ URL of the file. |
+
+
+ method |
+ The method to be used for retrieving this URL (either GET or POST). |
+
+
+ postdata |
+ (optional) The POST data to send if the method to be used is POST. This data is in dictionary form. |
+
+
+ format |
+ The name of the file format, along the lines of `zip`, `mp3`, `pdf`, `doc`, etc. While this value should typically be pretty consistent,
+ different abbreviations may be used for different resolvers. It's probably not a good idea to automatically parse these unless you know the exact values
+ a resolver will return. This may be set to `unknown`. |
+
+
+ priority |
+ The priority for this URL. More important or faster URLs have a lower 'priority'. To always get the best result, go for the URL with the lowest
+ priority (this may not always be 1). |
+
+
+ extra |
+ This is a dictionary that may contain any custom data provided by the specific resolver that is used. Refer to the resolver-specific documentation for this. |
+
+
+
+## Text
+
+
+
+ Key |
+ Description |
+
+
+ url |
+ URL of the file. |
+
+
+ method |
+ The method to be used for retrieving this URL (either GET or POST). |
+
+
+ postdata |
+ (optional) The POST data to send if the method to be used is POST. This data is in dictionary form. |
+
+
+ format |
+ The name of the file format, along the lines of `zip`, `mp3`, `pdf`, `doc`, etc. While this value should typically be pretty consistent,
+ different abbreviations may be used for different resolvers. It's probably not a good idea to automatically parse these unless you know the exact values
+ a resolver will return. This may be set to `unknown`. |
+
+
+ priority |
+ The priority for this URL. More important or faster URLs have a lower 'priority'. To always get the best result, go for the URL with the lowest
+ priority (this may not always be 1). |
+
+
+ extra |
+ This is a dictionary that may contain any custom data provided by the specific resolver that is used. Refer to the resolver-specific documentation for this. |
+
+
\ No newline at end of file
diff --git a/resolv/__init__.py b/resolv/__init__.py
index 6c0ac33..90507c7 100644
--- a/resolv/__init__.py
+++ b/resolv/__init__.py
@@ -1,23 +1,41 @@
import re
-from resolvers import *
+import resolvers
+
+from resolv.shared import ResolverError
def resolve(url):
if re.match("https?:\/\/(www\.)?putlocker\.com", url) is not None:
- return putlocker.resolve(url)
+ task = resolvers.PutlockerTask(url)
+ return task.run()
elif re.match("https?:\/\/(www\.)?sockshare\.com", url) is not None:
- return sockshare.resolve(url)
+ task = resolvers.SockshareTask(url)
+ return task.run()
elif re.match("https?:\/\/(www\.)?1channel\.ch\/external\.php", url) is not None:
- return onechannel.resolve(url)
+ task = resolvers.OneChannelTask(url)
+ return task.run()
elif re.match("https?:\/\/(www\.)?youtube\.com\/watch\?", url) is not None:
- return youtube.resolve(url)
+ task = resolvers.YoutubeTask(url)
+ return task.run()
elif re.match("https?:\/\/(www\.)?filebox\.com\/[a-zA-Z0-9]+", url) is not None:
- return filebox.resolve(url)
+ task = resolvers.FileboxTask(url)
+ return task.run()
+ elif re.match("https?:\/\/(www\.)?vidxden\.com\/[a-zA-Z0-9]+", url) is not None:
+ task = resolvers.VidxdenTask(url)
+ return task.run()
+ elif re.match("https?:\/\/(www\.)?vidbux\.com\/[a-zA-Z0-9]+", url) is not None:
+ task = resolvers.VidbuxTask(url)
+ return task.run()
+ elif re.match("https?:\/\/(www\.)?filenuke\.com\/[a-zA-Z0-9]+", url) is not None:
+ task = resolvers.FilenukeTask(url)
+ return task.run()
elif re.match("https?:\/\/(www\.)?pastebin\.com\/[a-zA-Z0-9]+", url) is not None:
- return pastebin.resolve(url)
+ task = resolvers.PastebinTask(url)
+ return task.run()
elif re.match("https?:\/\/(www\.)?mediafire\.com\/\?[a-z0-9]+", url) is not None:
- return mediafire.resolve(url)
+ task = resolvers.MediafireTask(url)
+ return task.run()
else:
- return {}
+ raise ResolverError("No suitable resolver found for %s" % url)
def recurse(url):
previous_result = {}
@@ -25,10 +43,10 @@ def recurse(url):
while True:
result = resolve(url)
- if result == {}:
+ if result.state == "failed":
return previous_result
- elif 'url' not in result:
+ elif result.result_type != "url":
return result
- url = result['url']
+ url = result.results['url']
previous_result = result
diff --git a/resolv/resolvers/__init__.py b/resolv/resolvers/__init__.py
index b514ffd..732ff75 100644
--- a/resolv/resolvers/__init__.py
+++ b/resolv/resolvers/__init__.py
@@ -6,3 +6,6 @@ from youtube import *
from filebox import *
from pastebin import *
from mediafire import *
+from vidxden import *
+from vidbux import *
+from filenuke import *
diff --git a/resolv/resolvers/dummy.py b/resolv/resolvers/dummy.py
index deeb57f..7adb959 100644
--- a/resolv/resolvers/dummy.py
+++ b/resolv/resolvers/dummy.py
@@ -1,2 +1,13 @@
-def resolve(input):
- return {'dummy': input}
+from resolv.shared import Task
+
+class DummyTask(Task):
+ result_type = "dummy"
+
+ name = "Dummy Resolver"
+ author = "Sven Slootweg"
+ author_url = "http://cryto.net/~joepie91"
+
+ def run(self):
+ self.results = {'dummy': self.url}
+ self.state = "finished"
+ return self
diff --git a/resolv/resolvers/filebox.py b/resolv/resolvers/filebox.py
index c8bdff1..0fc14ed 100644
--- a/resolv/resolvers/filebox.py
+++ b/resolv/resolvers/filebox.py
@@ -1,70 +1,48 @@
import re, time, urllib2
-from resolv.shared import ResolverError
+from resolv.shared import ResolverError, TechnicalError, Task
-def resolve(url):
- matches = re.search("https?:\/\/(www\.)?filebox\.com\/([a-zA-Z0-9]+)", url)
-
- if matches is None:
- raise ResolverError("The provided URL is not a valid Filebox.com URL.")
-
- video_id = matches.group(2)
-
- try:
- contents = urllib2.urlopen("http://www.filebox.com/embed-%s-970x543.html" % video_id).read()
- except:
- raise ResolverError("Could not retrieve the video page.")
-
- matches = re.search("url: '([^']+)',", contents)
-
- if matches is None:
- raise ResolverError("No video was found on the specified URL.")
-
- video_file = matches.group(1)
-
- stream_dict = {
- 'url' : video_file,
- 'quality' : "unknown",
- 'priority' : 1,
- 'format' : "unknown"
- }
-
- return { 'title': "", 'videos': [stream_dict] }
-
-def resolve2(url):
- # This is a fallback function in case no video could be found through the resolve() method.
- # It's not recommended to use it, as it introduces a 5 second wait.
-
- try:
- import mechanize
- except ImportError:
- raise ResolverError("The Python mechanize module is required to resolve Filebox.com URLs.")
-
- matches = re.search("https?:\/\/(www\.)?filebox\.com\/([a-zA-Z0-9]+)", url)
-
- if matches is None:
- raise ResolverError("The provided URL is not a valid Filebox.com URL.")
-
- try:
- browser = mechanize.Browser()
- browser.set_handle_robots(False)
- browser.open(url)
- except:
- raise ResolverError("The Filebox.com site could not be reached.")
-
- time.sleep(6)
-
- try:
- browser.select_form(nr=0)
- result = browser.submit()
- page = result.read()
- except Exception, e:
- raise ResolverError("The file was removed, or the URL is incorrect.")
-
- matches = re.search("this\.play\('([^']+)'\)", page)
-
- if matches is None:
- raise ResolverError("No video file was found on the given URL; the Filebox.com server for this file may be in maintenance mode, or the given URL may not be a video file. The Filebox.com resolver currently only supports video links.")
-
- video_file = matches.group(1)
-
- return { 'title': "", 'videos': { 'video': video_file } }
+class FileboxTask(Task):
+ result_type = "video"
+
+ name = "Filebox.com"
+ author = "Sven Slootweg"
+ author_url = "http://cryto.net/~joepie91"
+
+ def run(self):
+ matches = re.search("https?:\/\/(www\.)?filebox\.com\/([a-zA-Z0-9]+)", self.url)
+
+ if matches is None:
+ self.state = "invalid"
+ raise ResolverError("The provided URL is not a valid Filebox.com URL.")
+
+ video_id = matches.group(2)
+
+ try:
+ contents = self.fetch_page("http://www.filebox.com/embed-%s-970x543.html" % video_id)
+ except urllib2.URLError, e:
+ self.state = "failed"
+ raise TechnicalError("Could not retrieve the video page.")
+
+ matches = re.search("url: '([^']+)',", contents)
+
+ if matches is None:
+ self.state = "invalid"
+ raise ResolverError("No video was found on the specified URL. The Filebox.com resolver currently only supports videos.")
+
+ video_file = matches.group(1)
+
+ stream_dict = {
+ 'url' : video_file,
+ 'method' : "GET",
+ 'quality' : "unknown",
+ 'priority' : 1,
+ 'format' : "unknown"
+ }
+
+ self.results = {
+ 'title': "",
+ 'videos': [stream_dict]
+ }
+
+ self.state = "finished"
+ return self
diff --git a/resolv/resolvers/filenuke.py b/resolv/resolvers/filenuke.py
new file mode 100644
index 0000000..bc6096d
--- /dev/null
+++ b/resolv/resolvers/filenuke.py
@@ -0,0 +1,93 @@
+import re, time, urllib2
+from resolv.shared import ResolverError, TechnicalError, Task, unpack_js
+
+# No such file or the file has been removed due to copyright infringement issues.
+
+class FilenukeTask(Task):
+ result_type = "video"
+
+ name = "Filenuke"
+ author = "Sven Slootweg"
+ author_url = "http://cryto.net/~joepie91"
+
+ def run(self):
+ matches = re.search("https?:\/\/(www\.)?filenuke\.com\/([a-zA-Z0-9]+)", self.url)
+
+ if matches is None:
+ self.state = "invalid"
+ raise ResolverError("The provided URL is not a valid Filenuke URL.")
+
+ video_id = matches.group(2)
+
+ try:
+ contents = self.fetch_page(self.url)
+ except urllib2.URLError, e:
+ self.state = "failed"
+ raise TechnicalError("Could not retrieve the video page.")
+
+ if 'Choose how to download' not in contents:
+ self.state = "invalid"
+ raise ResolverError("The provided URL does not exist.")
+
+ matches = re.search('