.. _spider_generic: Generic full-text extraction ---------------------------- The generic spider can transform already existing Atom or RSS feeds, which usually only contain a summary or a few lines of the content, into full content feeds. It is similar to `Full-Text RSS`_ but uses a port of an older version of Readability_ under the hood and currently doesn't support site_config files. It works best for blog articles. Some feeds already provide the full content but in a tag that is not used by your feed reader. E.g. feeds created by Wordpress usually have the full content in the "encoded" tag. In such cases it's best to add the URL to the ``fulltext_urls`` entry which extracts the content directly from the feed without Readability_. There is a little helper script in `scripts/check-for-fulltext-content`_ to detect if a feed contains full-text content. Configuration ~~~~~~~~~~~~~ Add ``generic`` to the list of spiders: .. code-block:: ini # List of spiders to run by default, one per line. spiders = generic Add the feed URLs (Atom or XML) to the config file. .. code-block:: ini # List of URLs to RSS/Atom feeds to crawl, one per line. [generic] urls = https://www.example.com/feed.atom https://www.example.org/feed.xml fulltext_urls = https://myblog.example.com/feed/ .. _Readability: https://github.com/mozilla/readability .. _`Full-Text RSS`: http://fivefilters.org/content-only/ .. _`scripts/check-for-fulltext-content`: https://github.com/PyFeeds/PyFeeds/blob/master/scripts/check-for-fulltext-content