Performance & Profiling

The performance of Sphinx-Needs can be tested by a script called performance_test.py inside folder /performance of the checked out github repository.

The performance can be tested with different amounts of needs, needtables and not Sphinx-Needs related dummies (simple rst code).

Test series

To start a series of test with some predefined values, run python performance_test.py series

Running 8 test configurations.
* Running on 5 pages with 50 needs, 5 needtables, 5 dummies per page. Using 1 cores.
  Duration: 6.66 seconds
* Running on 5 pages with 50 needs, 5 needtables, 5 dummies per page. Using 4 cores.
  Duration: 6.70 seconds
* Running on 1 pages with 50 needs, 5 needtables, 5 dummies per page. Using 1 cores.
  Duration: 1.57 seconds
* Running on 1 pages with 50 needs, 5 needtables, 5 dummies per page. Using 4 cores.
  Duration: 1.58 seconds
* Running on 5 pages with 10 needs, 1 needtables, 1 dummies per page. Using 1 cores.
  Duration: 1.64 seconds
* Running on 5 pages with 10 needs, 1 needtables, 1 dummies per page. Using 4 cores.
  Duration: 1.89 seconds
* Running on 1 pages with 10 needs, 1 needtables, 1 dummies per page. Using 1 cores.
  Duration: 1.11 seconds
* Running on 1 pages with 10 needs, 1 needtables, 1 dummies per page. Using 4 cores.
  Duration: 1.14 seconds

RESULTS:

  runtime      pages       needs      needs    needtables    dummies    parallel
  seconds    overall    per page    overall       overall    overall       cores
---------  ---------  ----------  ---------  ------------  ---------  ----------
     6.66          5          50        250            25         25           1
     6.7           5          50        250            25         25           4
     1.57          1          50         50             5          5           1
     1.58          1          50         50             5          5           4
     1.64          5          10         50             5          5           1
     1.89          5          10         50             5          5           4
     1.11          1          10         10             1          1           1
     1.14          1          10         10             1          1           4

Overall runtime: 22.29 seconds.

But you can modify the details and set some static values by setting various parameters. Just run python performance_test.py series --help to get an overview

Usage: performance_test.py series [OPTIONS]

  Generate and start a series of tests.

Options:
  --profile TEXT        Activates profiling for given area
  --needs INTEGER       Number of maximum needs.
  --needtables INTEGER  Number of maximum needtables.
  --dummies INTEGER     Number of standard rst dummies.
  --pages INTEGER       Number of additional pages with needs.
  --parallel INTEGER    Number of parallel processes to use. Same as -j for
                        sphinx-build
  --keep                Keeps the temporary src and build folders
  --browser             Opens the project in your browser
  --snakeviz            Opens snakeviz view for measured profiles in browser
  --debug               Prints more information, incl. sphinx build output
  --basic               Use only default config of Sphinx-Needs (e.g. no extra
                        options)
  --help                Show this message and exit.

Also if --needs, --pages or parallel is set multiple times, one performance test is executed per it.

Example:: python performance_test.py series --needs 1 --needs 10 --pages 1 --pages 10 --parallel 1 --parallel 4 --needtables 0 --dummies 0. This will set 2 values for needs, 2 for pages and 2 for parallel. So in the end it will run 8 test configurations (2 needs x 2 pages x 2 parallel = 8).

Running 8 test configurations.
* Running on 1 pages with 1 needs, 0 needtables, 0 dummies per page. Using 1 cores.
  Duration: 1.03 seconds
* Running on 1 pages with 1 needs, 0 needtables, 0 dummies per page. Using 4 cores.
  Duration: 1.05 seconds
* Running on 10 pages with 1 needs, 0 needtables, 0 dummies per page. Using 1 cores.
  Duration: 1.35 seconds
* Running on 10 pages with 1 needs, 0 needtables, 0 dummies per page. Using 4 cores.
  Duration: 1.54 seconds
* Running on 1 pages with 10 needs, 0 needtables, 0 dummies per page. Using 1 cores.
  Duration: 1.10 seconds
* Running on 1 pages with 10 needs, 0 needtables, 0 dummies per page. Using 4 cores.
  Duration: 1.15 seconds
* Running on 10 pages with 10 needs, 0 needtables, 0 dummies per page. Using 1 cores.
  Duration: 2.11 seconds
* Running on 10 pages with 10 needs, 0 needtables, 0 dummies per page. Using 4 cores.
  Duration: 2.36 seconds

RESULTS:

  runtime      pages       needs      needs    needtables    dummies    parallel
  seconds    overall    per page    overall       overall    overall       cores
---------  ---------  ----------  ---------  ------------  ---------  ----------
     1.03          1           1          1             0          0           1
     1.05          1           1          1             0          0           4
     1.35         10           1         10             0          0           1
     1.54         10           1         10             0          0           4
     1.1           1          10         10             0          0           1
     1.15          1          10         10             0          0           4
     2.11         10          10        100             0          0           1
     2.36         10          10        100             0          0           4

Overall runtime: 11.69 seconds.

Parallel execution

versionadded

0.7.1

You may have noticed, the parallel execution on multiple cores can lower the needed runtime.

This parallel execution is using the “-j” option from sphinx-build. This mostly brings benefit, if dozens/hundreds of files need to be read and written. In this case sphinx starts several workers to deal with these files in parallel.

If the project contains only a few files, the benefit is not really measurable.

Here an example of a 500 page project, build once on 1 and 8 cores. The benefit is ~40% of build time, if 8 cores are used.

  runtime s    pages #    needs per page    needs #    needtables #    dummies #    parallel cores
-----------  ---------  ----------------  ---------  --------------  -----------  ----------------
     169.46        500                10       5000               0         5000                 1
     103.08        500                10       5000               0         5000                 8

Used command: python performance_test.py series --needs 10 --pages 500 --dummies 10 --needtables 0 --parallel 1 --parallel 8

The parallel execution can used by any documentation build , just use -j option. Example, which uses 4 processes in parallel: sphinx-build -j 4 -b html . _build/html

Used rst template

For all performance tests the same rst-template is used:

index

Performance test
================

Config
------
:dummies: {{dummies}}
:needs: {{needs}}
:needtables: {{needtables}}
:keep: {{keep}}
:browser: {{browser}}
:debug: {{debug}}

Content
-------
.. contents::

.. toctree::

{% for page in range(pages) %}
   page_{{page}}
{% endfor -%}

pages

{{ title}}
{{ "=" * title|length }}

Test Data
---------

Dummies
~~~~~~~
Amount of dummies: **{{dummies}}**

{% for n in range(dummies) %}
**Dummy {{n}}**

.. note::  This is dummy {{n}}

And some **dummy** *text* for dummy {{n}}

{% endfor %}

Needs
~~~~~
Amount of needs: **{{needs}}**

{% for n in range(needs) %}
.. req:: Test Need Page {{ page }} {{n}}
   :id: R_{{page}}_{{n}}
{% if not basic %}   :number: {{n}}{% endif %}
   :links: R_{{page}}_{{needs-n-1}}
{% endfor %}

Needtable
~~~~~~~~~
Amount of needtables: **{{needtables}}**

{%  if basic %}
.. needtable::
   :show_filters:
   :columns: id, title, number, links
{% else %}
{% for n in range(needtables) %}
.. needtable::
   :show_filters:
   :filter: int(number)**3 > 0 or len(links) > 0
   :columns: id, title, number, links
{% endfor %}
{% endif %}

Profiling

With option --profile NAME a code-area specific profile can be activated.

Currently supported are:

  • NEEDTABLE: Profiles the needtable processing (incl. printing)

  • NEED_PROCESS: Profiles the need processing (without printing)

  • NEED_PRINT: Profiles the need painting (creating final nodes)

If this option is used, a profile folder gets created in the current working directory and a profile file with <NAME>.prof is created. This file contains CProfile Stats information.

--profile can be used several times.

These profile can be also created outside the performance test with each documentation project. Simply set a environment variable called NEEDS_PROFILING and set the value to the needed profiles.

Example for Linux: export NEEDS_PROFILING=NEEDTABLE,NEED_PRINT.

Analysing profile

Use snakeviz together with --profile <NAME> to open automatically a graphical analysis of the generated profile file.

For this snakeviz must be installed: pip install snakeviz.

Example:

python performance_test.py series --needs 10 --pages 10 --profile NEEDTABLE --profile NEED_PROCESS --snakeviz
_images/snakeviz_needtable.png

Measurements

The measurements were performed with the following setup:

  • Sphinx-Needs 0.7.0 on 1 core as parallel build is not supported by version.

  • Sphinx-Needs 0.7.1, with 1 core.

  • Sphinx-Needs 0.7.1, with 4 cores.

Test details

0.7.0 with 1 core

0.7.1 with 1 core

0.7.1 with 4 cores

30 pages with overall 1500 needs and 30 needtables

55.02 s

36.81 s

34.31 s

100 pages with overall 10.000 needs and 100 needtables

6108.26 s

728.82 s

564.76 s