Web Crawler Beta Released!

Web Crawler – first public beta release is out!

Crawler is a utility designed for testing and demonstration of the WebEngine open source library features. This program gathers information about the resources of a specified web server by analyzing references in the HTML markup, text, and JavaScript code. Additionally, a query is sent to the Web Of Trust knowledge base to obtain information about the analyzed site. This check demonstrates analysis of web application vulnerabilities.
First and foremost, please do not be evil. Use crawler only against services you own, or have a permission to test. The given application is not a full-fledged analyzer of web application security.

Furthermore, the library is currently not meant for scanning of rogue and misbehaving HTTP servers; in these cases, correct and stable operation cannot be guaranteed.
The main features provided by the application are listed below:

  • JavaScript analysis aimed at receiving references with simulation of a DOM structure
  • Support of the Basic, Digest, and NTLM authorization schemes
  • Access to the contents of web servers via HTTP
  • Operation via proxy servers with various authorization schemes 
  • A wide variety of options to describe the scan target (lists of scanned domains, restriction of scanning to a host, a domain, or a web server directory, etc.)
  • Modular structure, which allows one to implement plug-ins

Web Crawler GUI – Scan Results Example
Web Crawler GUI – Profiles, Plugins

Download Web Crawler beta ver.0.2 (command line + GUI)

Package Structure

The package consists of two main components: the crawler utility and a XUL-based GUI. To display the GUI, one can use the Firefox browser or a specialized application (e.g. xulrunner or prism). The application root directory contains the utility binary files and the XUL configuration file (application.ini). The nested-directories structure is defined by the rules of formation of applications based on XUL. A user may be interested in the chrome/skin directory, which contains files describing the application appearance. The package offers several pre-installed themes. To change the appearance, it is sufficiently to replace the contents of the chrome/skin/classic directory with the chosen theme. A new theme can be created on the basis of an existing one or by modifying themes from the site http://jqueryui.com/themeroller/. The themes downloaded from this site should be supplemented with some images and CSS descriptions by analogy with the existing ones.

Samples (command line)

First of all, it is necessary to create a configuration file for the utility. You can receive a sample file by calling the utility with the –generate parameter:
crawler –generate crawler.conf

The obtained file can be modified using any text editor to specify the necessary operation parameters. Firstly, set the path to the database file, the work directory containing *.plg files, and the logging mode. After the utility is configured, it is necessary to create a database and initialize the information about the applied plug-ins. For this purpose, the following command can be used:
crawler –init_db

If a database has been already created, the given command will merely update the information about the plug-ins and will not affect other information (such as profiles and scan results).

Here is the simplest way to run the utility:
crawler –target  my.sample.host

This command will launch unrestricted gathering of references starting from the main page of the site http://my.sample.host. The arguments –depth, –dir, –host, –domain, etc. can be used to restrict scanning.

If the database contains a configured scan profile (e.g. with the identifier 0), then you can use it:
crawler –target  my.sample.host –profile 0

Furthermore, you can change the task name (is will be saved in the database and used in the GUI when
displaying the list of tasks):

crawler –target  my.sample.host –profile 0 –name \”First sample task\”

If it is required to obtain the scan results not only in the database, but also in the form of a separate text file, then you can apply the arguments –result and –output:
crawler –target  my.sample.host –profile 0 –name \”First sample task\” –result report.txt –output 2

The file similar to the report in the mode 2 can be obtained from the GUI by clicking the “Export to text”
button when viewing a separate task.



  • Broaden the settings on the Settings tab to avoid necessity of manual utility configuration
  • Implement the feature of viewing log files with filtering and highlighting of the message levels

The Crawler utility and WebEngine library

  • Improve the operation stability and performance
  • Broaden the JavaScript support
  • Tests for web application vulnerabilities 
  • And a number of others ☺


5 thoughts on “Web Crawler Beta Released!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.