API

The wsinfo library bundles the power of the socket module, some urllib subpackages, XML parsing and regular expressions into one library with the possibility to get a huge amount of information for a specific website.

class wsinfo.Info(url)

Class collecting some information about the website located at the given URL.

Parameters:url – Valid URL to the website (e.g. http://example.com/path/to/file.html).
content

Get the website’s content.

Returns:Content of the website (e.g. HTML code).
Return type:str
content_type

Get the website’s content type.

Returns:Content-type of the website’s code (e.g. text/html).
Return type:str or NoneType
favicon_path

Get the path to the website’s icon.

The href attribute of the first <link> tag containing rel="icon" or rel="shortcut icon" is used.

Returns:The path to the icon of the website (known as favicon).
Return type:str or NoneType
hierarchy

Get a list representing the heading hierarchy.

Returns:List of tuples containing the heading type (h1, h2, ...) and the headings text.
Return type:list
http_header

Get the website’s HTTP header.

Returns:HTTP header of the website.
Return type:str
http_header_dict

Get the website’s HTTP header as dictionary.

Returns:HTTP header of the website as dictionary.
Return type:dict
http_status_code

Get the website’s HTTP status code.

  • 1xx: Information
  • 2xx: Success
  • 3xx: Redirection
  • 4xx: Client error
  • 5xx: Server error

See this Wikipedia article for reference.

Returns:HTTP status code of the website.
Return type:int
ip

Get the IP address of the website’s domain.

Note

This will not always return the IP address of the URL you’ve passed to the Info constructor. For example, the server may redirect to another page, and this function will return the IP address of the redirected URL. If the website implements a client side redirect, you will not be redirected but get the IP address of the URL you’ve passed before.

Returns:IP address of the website’s domain.
Return type:str
server

Get the server’s name/type and version.

Most common are Apache, nginx, Microsoft IIS and gws on Google servers.

Returns:A list containing the name or type of the server software and (if available) the version number.
Return type:list or NoneType
server_country

Get the country the where the server is located.

Warning

This is currently not implemented, I need to do some more research how to do this. I think whois is a buzzword...

Returns:The country where the server hardware is located.
Return type:str
server_os

Get the operating system the server is running on.

Returns:The name of the servers OS.
Return type:str or NoneType
server_software

Get a list of the server’s software stack.

Note

This does only work for localhosts, because most public servers don’t list any software configuration in the HTTP response header.

Returns:List of tuples containing both name and version for each software listed in the http header.
Return type:list
title

Get the website’s title.

The content of the first <title> tag in the HTML code is used.

Returns:The title of the website.
Return type:str
url

Get the website’s URL.

Note

This will not always return the URL you’ve passed to the Info constructor. For example, the server may redirect to another page, and this function will return the URL of the website you was redirected to. If the website implements a client side redirect, you will not be redirected but get the URL you’ve passed before.

Example for clarification:

Using a fresh install of a recent XAMPP, http://localhost will redirect to http://localhost/dashboard/:

>>> import wsinfo
>>> w = wsinfo.Info("http://localhost")
>>> w.url
'http://localhost/dashboard/'

The original URL you’ve passed to the Info constructor is stored in the class attribute _url:

>>> w._url
'http://localhost'
Returns:URL of the website.
Return type:str