API¶
The wsinfo library bundles the power of the socket
module, some
urllib
subpackages, XML parsing and regular expressions into one library
with the possibility to get a huge amount of information for a specific
website.
-
class
wsinfo.
Info
(url)¶ Class collecting some information about the website located at the given URL.
Parameters: url – Valid URL to the website (e.g. http://example.com/path/to/file.html). -
content
¶ Get the website’s content.
Returns: Content of the website (e.g. HTML code). Return type: str
-
content_type
¶ Get the website’s content type.
Returns: Content-type of the website’s code (e.g. text/html). Return type: str or NoneType
-
favicon_path
¶ Get the path to the website’s icon.
The
href
attribute of the first<link>
tag containingrel="icon"
orrel="shortcut icon"
is used.Returns: The path to the icon of the website (known as favicon). Return type: str or NoneType
-
hierarchy
¶ Get a list representing the heading hierarchy.
Returns: List of tuples containing the heading type (h1, h2, ...) and the headings text. Return type: list
-
http_header
¶ Get the website’s HTTP header.
Returns: HTTP header of the website. Return type: str
-
http_header_dict
¶ Get the website’s HTTP header as dictionary.
Returns: HTTP header of the website as dictionary. Return type: dict
-
http_status_code
¶ Get the website’s HTTP status code.
- 1xx: Information
- 2xx: Success
- 3xx: Redirection
- 4xx: Client error
- 5xx: Server error
See this Wikipedia article for reference.
Returns: HTTP status code of the website. Return type: int
-
ip
¶ Get the IP address of the website’s domain.
Note
This will not always return the IP address of the URL you’ve passed to the
Info
constructor. For example, the server may redirect to another page, and this function will return the IP address of the redirected URL. If the website implements a client side redirect, you will not be redirected but get the IP address of the URL you’ve passed before.Returns: IP address of the website’s domain. Return type: str
-
server
¶ Get the server’s name/type and version.
Most common are Apache, nginx, Microsoft IIS and gws on Google servers.
Returns: A list containing the name or type of the server software and (if available) the version number. Return type: list or NoneType
-
server_country
¶ Get the country the where the server is located.
Warning
This is currently not implemented, I need to do some more research how to do this. I think whois is a buzzword...
Returns: The country where the server hardware is located. Return type: str
-
server_os
¶ Get the operating system the server is running on.
Returns: The name of the servers OS. Return type: str or NoneType
-
server_software
¶ Get a list of the server’s software stack.
Note
This does only work for localhosts, because most public servers don’t list any software configuration in the HTTP response header.
Returns: List of tuples containing both name and version for each software listed in the http header. Return type: list
-
title
¶ Get the website’s title.
The content of the first
<title>
tag in the HTML code is used.Returns: The title of the website. Return type: str
-
url
¶ Get the website’s URL.
Note
This will not always return the URL you’ve passed to the
Info
constructor. For example, the server may redirect to another page, and this function will return the URL of the website you was redirected to. If the website implements a client side redirect, you will not be redirected but get the URL you’ve passed before.Example for clarification:
Using a fresh install of a recent XAMPP,
http://localhost
will redirect tohttp://localhost/dashboard/
:>>> import wsinfo >>> w = wsinfo.Info("http://localhost") >>> w.url 'http://localhost/dashboard/'
The original URL you’ve passed to the
Info
constructor is stored in the class attribute_url
:>>> w._url 'http://localhost'
Returns: URL of the website. Return type: str
-