diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index ef12c87a..9641dd16 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -3,3 +3,13 @@ Contributing is always welcome! I am no professional flask developer, if you know a better way that something can be done, please let me know! Otherwise, it's always best to PR into the `dev` branch. + +Please be sure that all new functionality has a matching test! + +Use `pytest` to validate/test, you can run the existing tests as `pytest tests/test_notifications.py` for example + +``` +pip3 install -r requirements-dev +``` + +this is from https://github.com/dgtlmoon/changedetection.io/blob/master/requirements-dev.txt diff --git a/README.md b/README.md index 9d1d7be8..297233a1 100644 --- a/README.md +++ b/README.md @@ -7,16 +7,21 @@ _Know when web pages change! Stay ontop of new information!_ -Live your data-life *pro-actively* instead of *re-actively*, do not rely on manipulative social media for consuming important information. +Live your data-life *pro-actively* instead of *re-actively*. Open source web page monitoring, notification and change detection. Self-hosted web page change monitoring -[![Deploy](https://www.herokucdn.com/deploy/button.svg)](https://dashboard.heroku.com/new?template=https%3A%2F%2Fgithub.com%2Fdgtlmoon%2Fchangedetection.io%2Ftree%2Fmaster) -Read the [Heroku notes and limitations wiki page first](https://github.com/dgtlmoon/changedetection.io/wiki/Heroku-notes) +**Get your own instance now on Lemonade!** + +[![Deploy to Lemonade](https://lemonade.changedetection.io/static/images/lemonade.svg)](https://lemonade.changedetection.io/start) + +- Automatic Updates, Automatic Backups, No Heroku "paused application", don't miss a change! +- Javascript browser included +- Pay with Bitcoin #### Example use cases @@ -37,10 +42,6 @@ Read the [Heroku notes and limitations wiki page first](https://github.com/dgtlm _Need an actual Chrome runner with Javascript support? We support fetching via WebDriver!_ -**Get monitoring now! super simple.** - -Deploy to Heroku for free, Run this python directly, or with docker and/or docker-compose - ## Screenshots Examining differences in content. @@ -91,10 +92,14 @@ docker run -d --restart always -p "127.0.0.1:5000:5000" -v datastore-volume:/dat ```bash docker-compose pull && docker-compose up -d ``` -### Filters + +See the wiki for more information https://github.com/dgtlmoon/changedetection.io/wiki + + +## Filters XPath, JSONPath and CSS support comes baked in! You can be as specific as you need, use XPath exported from various XPath element query creation tools. -### Notifications +## Notifications ChangeDetection.io supports a massive amount of notifications (including email, office365, custom APIs, etc) when a web-page has a change detected thanks to the apprise library. Simply set one or more notification URL's in the _[edit]_ tab of that watch. @@ -118,7 +123,7 @@ Just some examples Now you can also customise your notification content! -### JSON API Monitoring +## JSON API Monitoring Detect changes and monitor data in JSON API's by using the built-in JSONPath selectors as a filter / selector. @@ -128,7 +133,7 @@ This will re-parse the JSON and apply formatting to the text, making it super ea ![image](https://user-images.githubusercontent.com/275001/125165995-d9ea5580-e1dc-11eb-8030-f0deced2661a.png) -#### Parse JSON embedded in HTML! +### Parse JSON embedded in HTML! When you enable a `json:` filter, you can even automatically extract and parse embedded JSON inside a HTML page! Amazingly handy for sites that build content based on JSON, such as many e-commerce websites. @@ -142,19 +147,19 @@ When you enable a `json:` filter, you can even automatically extract and parse e `json:$.price` would give `23.50`, or you can extract the whole structure -### Proxy configuration +## Proxy configuration See the wiki https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration -### Raspberry Pi support? +## Raspberry Pi support? -Raspberry Pi and linux/arm/v6 linux/arm/v7 arm64 devices are supported! +Raspberry Pi and linux/arm/v6 linux/arm/v7 arm64 devices are supported! See the wiki for [details](https://github.com/dgtlmoon/changedetection.io/wiki/Fetching-pages-with-WebDriver) -### Windows native support? +## Windows native support? Sorry not yet :( https://github.com/dgtlmoon/changedetection.io/labels/windows -### Support us +## Support us Do you use changedetection.io to make money? does it save you time or money? Does it make your life easier? less stressful? Remember, we write this software when we should be doing actual paid work, we have to buy food and pay rent just like you. @@ -164,12 +169,12 @@ BTC `1PLFN327GyUarpJd7nVe7Reqg9qHx5frNn` Support us! -### Commercial Support +## Commercial Support I offer commercial support, this software is depended on by network security, aerospace , data-science and data-journalist professionals just to name a few, please reach out at dgtlmoon@gmail.com for any enquiries, I am more than glad to work with your organisation to further the possibilities of what can be done with changedetection.io -[release-shield]: https://img.shields.io/github/v/release/dgtlmoon/changedetection.io?style=for-the-badge +[release-shield]: https://img.shields.io:/github/v/release/dgtlmoon/changedetection.io?style=for-the-badge [docker-pulls]: https://img.shields.io/docker/pulls/dgtlmoon/changedetection.io?style=for-the-badge [test-shield]: https://github.com/dgtlmoon/changedetection.io/actions/workflows/test-only.yml/badge.svg?branch=master diff --git a/changedetection.py b/changedetection.py index ffb31015..90946089 100755 --- a/changedetection.py +++ b/changedetection.py @@ -14,6 +14,7 @@ from changedetectionio import store def main(): ssl_mode = False + host = '' port = os.environ.get('PORT') or 5000 do_cleanup = False @@ -21,9 +22,9 @@ def main(): datastore_path = os.path.join(os.getcwd(), "datastore") try: - opts, args = getopt.getopt(sys.argv[1:], "Ccsd:p:", "port") + opts, args = getopt.getopt(sys.argv[1:], "Ccsd:h:p:", "port") except getopt.GetoptError: - print('backend.py -s SSL enable -p [port] -d [datastore path]') + print('backend.py -s SSL enable -h [host] -p [port] -d [datastore path]') sys.exit(2) create_datastore_dir = False @@ -37,6 +38,9 @@ def main(): if opt == '-s': ssl_mode = True + if opt == '-h': + host = arg + if opt == '-p': port = int(arg) @@ -59,7 +63,7 @@ def main(): os.mkdir(app_config['datastore_path']) else: print ("ERROR: Directory path for the datastore '{}' does not exist, cannot start, please make sure the directory exists.\n" - "Alternatively, use the -d parameter.".format(app_config['datastore_path']),file=sys.stderr) + "Alternatively, use the -C parameter.".format(app_config['datastore_path']),file=sys.stderr) sys.exit(2) datastore = store.ChangeDetectionStore(datastore_path=app_config['datastore_path'], version_tag=changedetectionio.__version__) @@ -93,13 +97,13 @@ def main(): if ssl_mode: # @todo finalise SSL config, but this should get you in the right direction if you need it. - eventlet.wsgi.server(eventlet.wrap_ssl(eventlet.listen(('', port)), + eventlet.wsgi.server(eventlet.wrap_ssl(eventlet.listen((host, port)), certfile='cert.pem', keyfile='privkey.pem', server_side=True), app) else: - eventlet.wsgi.server(eventlet.listen(('', int(port))), app) + eventlet.wsgi.server(eventlet.listen((host, int(port))), app) if __name__ == '__main__': diff --git a/changedetectionio/__init__.py b/changedetectionio/__init__.py index e0393686..7366b734 100644 --- a/changedetectionio/__init__.py +++ b/changedetectionio/__init__.py @@ -11,24 +11,30 @@ # proxy per check # - flask_cors, itsdangerous,MarkupSafe -import time +import datetime import os -import timeago -import flask_login -from flask_login import login_required - +import queue import threading +import time +from copy import deepcopy from threading import Event -import queue - -from flask import Flask, render_template, request, send_from_directory, abort, redirect, url_for, flash - -from feedgen.feed import FeedGenerator -from flask import make_response -import datetime +import flask_login import pytz -from copy import deepcopy +import timeago +from feedgen.feed import FeedGenerator +from flask import ( + Flask, + abort, + flash, + make_response, + redirect, + render_template, + request, + send_from_directory, + url_for, +) +from flask_login import login_required __version__ = '0.39.7' @@ -64,6 +70,7 @@ app.config['LOGIN_DISABLED'] = False # Disables caching of the templates app.config['TEMPLATES_AUTO_RELOAD'] = True +notification_debug_log=[] def init_app_secret(datastore_path): secret = "" @@ -137,13 +144,21 @@ class User(flask_login.UserMixin): def get_id(self): return str(self.id) + # Compare given password against JSON store or Env var def check_password(self, password): - import hashlib import base64 + import hashlib + + # Can be stored in env (for deployments) or in the general configs + raw_salt_pass = os.getenv("SALTED_PASS", False) + + if not raw_salt_pass: + raw_salt_pass = datastore.data['settings']['application']['password'] + + raw_salt_pass = base64.b64decode(raw_salt_pass) + - # Getting the values back out - raw_salt_pass = base64.b64decode(datastore.data['settings']['application']['password']) salt_from_storage = raw_salt_pass[:32] # 32 is the length of the salt # Use the exact same setup you used to generate the key, but this time put in the password to check @@ -194,7 +209,7 @@ def changedetection_app(config=None, datastore_o=None): @app.route('/login', methods=['GET', 'POST']) def login(): - if not datastore.data['settings']['application']['password']: + if not datastore.data['settings']['application']['password'] and not os.getenv("SALTED_PASS", False): flash("Login not required, no password enabled.", "notice") return redirect(url_for('index')) @@ -221,8 +236,10 @@ def changedetection_app(config=None, datastore_o=None): @app.before_request def do_something_whenever_a_request_comes_in(): - # Disable password loginif there is not one set - app.config['LOGIN_DISABLED'] = datastore.data['settings']['application']['password'] == False + + # Disable password login if there is not one set + # (No password in settings or env var) + app.config['LOGIN_DISABLED'] = datastore.data['settings']['application']['password'] == False and os.getenv("SALTED_PASS", False) == False # For the RSS path, allow access via a token if request.path == '/rss' and request.args.get('token'): @@ -408,6 +425,7 @@ def changedetection_app(config=None, datastore_o=None): def get_current_checksum_include_ignore_text(uuid): import hashlib + from changedetectionio import fetch_site_status # Get the most recent one @@ -520,6 +538,7 @@ def changedetection_app(config=None, datastore_o=None): 'notification_title': form.notification_title.data, 'notification_body': form.notification_body.data, 'notification_format': form.notification_format.data, + 'uuid': uuid } notification_q.put(n_object) flash('Test notification queued.') @@ -556,8 +575,7 @@ def changedetection_app(config=None, datastore_o=None): @login_required def settings_page(): - from changedetectionio import forms - from changedetectionio import content_fetcher + from changedetectionio import content_fetcher, forms form = forms.globalSettingsForm(request.form) @@ -573,8 +591,8 @@ def changedetection_app(config=None, datastore_o=None): form.notification_format.data = datastore.data['settings']['application']['notification_format'] form.base_url.data = datastore.data['settings']['application']['base_url'] - # Password unset is a GET - if request.values.get('removepassword') == 'yes': + # Password unset is a GET, but we can lock the session to always need the password + if not os.getenv("SALTED_PASS", False) and request.values.get('removepassword') == 'yes': from pathlib import Path datastore.data['settings']['application']['password'] = False flash("Password protection removed.", 'notice') @@ -608,7 +626,7 @@ def changedetection_app(config=None, datastore_o=None): else: flash('No notification URLs set, cannot send test.', 'error') - if form.password.encrypted_password: + if not os.getenv("SALTED_PASS", False) and form.password.encrypted_password: datastore.data['settings']['application']['password'] = form.password.encrypted_password flash("Password protection enabled.", 'notice') flask_login.logout_user() @@ -620,7 +638,10 @@ def changedetection_app(config=None, datastore_o=None): if request.method == 'POST' and not form.validate(): flash("An error occurred, please see below.", "error") - output = render_template("settings.html", form=form, current_base_url = datastore.data['settings']['application']['base_url']) + output = render_template("settings.html", + form=form, + current_base_url = datastore.data['settings']['application']['base_url'], + hide_remove_pass=os.getenv("SALTED_PASS", False)) return output @@ -635,10 +656,11 @@ def changedetection_app(config=None, datastore_o=None): if request.method == 'POST': urls = request.values.get('urls').split("\n") for url in urls: - url = url.strip() + url, *tags = url.split(" ") + # Flask wtform validators wont work with basic auth, use validators package if len(url) and validators.url(url): - new_uuid = datastore.add_watch(url=url.strip(), tag="") + new_uuid = datastore.add_watch(url=url.strip(), tag=" ".join(tags)) # Straight into the queue. update_q.put(new_uuid) good += 1 @@ -871,6 +893,15 @@ def changedetection_app(config=None, datastore_o=None): uuid=uuid) return output + @app.route("/settings/notification-logs", methods=['GET']) + @login_required + def notification_logs(): + global notification_debug_log + output = render_template("notification-log.html", + logs=notification_debug_log if len(notification_debug_log) else ["No errors or warnings detected"]) + + return output + @app.route("/api//snapshot/current", methods=['GET']) @login_required def api_snapshot(uuid): @@ -939,17 +970,33 @@ def changedetection_app(config=None, datastore_o=None): compresslevel=8) # Create a list file with just the URLs, so it's easier to port somewhere else in the future - list_file = os.path.join(datastore_o.datastore_path, "url-list.txt") - with open(list_file, "w") as f: - for uuid in datastore.data['watching']: - url = datastore.data['watching'][uuid]['url'] + list_file = "url-list.txt" + with open(os.path.join(datastore_o.datastore_path, list_file), "w") as f: + for uuid in datastore.data["watching"]: + url = datastore.data["watching"][uuid]["url"] f.write("{}\r\n".format(url)) + list_with_tags_file = "url-list-with-tags.txt" + with open( + os.path.join(datastore_o.datastore_path, list_with_tags_file), "w" + ) as f: + for uuid in datastore.data["watching"]: + url = datastore.data["watching"][uuid]["url"] + tag = datastore.data["watching"][uuid]["tag"] + f.write("{} {}\r\n".format(url, tag)) # Add it to the Zip - zipObj.write(list_file, - arcname="url-list.txt", - compress_type=zipfile.ZIP_DEFLATED, - compresslevel=8) + zipObj.write( + os.path.join(datastore_o.datastore_path, list_file), + arcname=list_file, + compress_type=zipfile.ZIP_DEFLATED, + compresslevel=8, + ) + zipObj.write( + os.path.join(datastore_o.datastore_path, list_with_tags_file), + arcname=list_with_tags_file, + compress_type=zipfile.ZIP_DEFLATED, + compresslevel=8, + ) # Send_from_directory needs to be the full absolute path return send_from_directory(os.path.abspath(datastore_o.datastore_path), backupname, as_attachment=True) @@ -1000,7 +1047,6 @@ def changedetection_app(config=None, datastore_o=None): @app.route("/api/delete", methods=['GET']) @login_required def api_delete(): - uuid = request.args.get('uuid') datastore.delete(uuid) flash('Deleted.') @@ -1075,7 +1121,6 @@ def changedetection_app(config=None, datastore_o=None): # Check for new version and anonymous stats def check_for_new_version(): import requests - import urllib3 urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) @@ -1101,6 +1146,7 @@ def check_for_new_version(): app.config.exit.wait(86400) def notification_runner(): + global notification_debug_log while not app.config.exit.is_set(): try: # At the moment only one thread runs (single runner) @@ -1115,7 +1161,21 @@ def notification_runner(): notification.process_notification(n_object, datastore) except Exception as e: - print("Watch URL: {} Error {}".format(n_object['watch_url'], e)) + print("Watch URL: {} Error {}".format(n_object['watch_url'], str(e))) + + # UUID wont be present when we submit a 'test' from the global settings + if 'uuid' in n_object: + datastore.update_watch(uuid=n_object['uuid'], + update_obj={'last_notification_error': "Notification error detected, please see logs."}) + + log_lines = str(e).splitlines() + notification_debug_log += log_lines + + # Trim the log length + notification_debug_log = notification_debug_log[-100:] + + + # Thread runner to check every minute, look for new watches to feed into the Queue. def ticker_thread_check_time_launch_checks(): diff --git a/changedetectionio/fetch_site_status.py b/changedetectionio/fetch_site_status.py index 7f678657..d75c0c6e 100644 --- a/changedetectionio/fetch_site_status.py +++ b/changedetectionio/fetch_site_status.py @@ -57,8 +57,9 @@ class perform_site_check(): stripped_text_from_html = "" watch = self.datastore.data['watching'][uuid] + # Unset any existing notification error - update_obj = {} + update_obj = {'last_notification_error': False, 'last_error': False} extra_headers = self.datastore.get_val(uuid, 'headers') @@ -118,16 +119,21 @@ class perform_site_check(): if is_html: # CSS Filter, extract the HTML that matches and feed that into the existing inscriptis::get_text html_content = fetcher.content - if has_filter_rule: - # For HTML/XML we offer xpath as an option, just start a regular xPath "/.." - if css_filter_rule[0] == '/': - html_content = html_tools.xpath_filter(xpath_filter=css_filter_rule, html_content=fetcher.content) - else: - # CSS Filter, extract the HTML that matches and feed that into the existing inscriptis::get_text - html_content = html_tools.css_filter(css_filter=css_filter_rule, html_content=fetcher.content) - - # get_text() via inscriptis - stripped_text_from_html = get_text(html_content) + if not fetcher.headers.get('Content-Type', '') == 'text/plain': + + if has_filter_rule: + # For HTML/XML we offer xpath as an option, just start a regular xPath "/.." + if css_filter_rule[0] == '/': + html_content = html_tools.xpath_filter(xpath_filter=css_filter_rule, html_content=fetcher.content) + else: + # CSS Filter, extract the HTML that matches and feed that into the existing inscriptis::get_text + html_content = html_tools.css_filter(css_filter=css_filter_rule, html_content=fetcher.content) + + # get_text() via inscriptis + stripped_text_from_html = get_text(html_content) + else: + # Don't run get_text or xpath/css filters on plaintext + stripped_text_from_html = html_content # Re #340 - return the content before the 'ignore text' was applied text_content_before_ignored_filter = stripped_text_from_html.encode('utf-8') @@ -136,7 +142,6 @@ class perform_site_check(): # in the future we'll implement other mechanisms. update_obj["last_check_status"] = fetcher.get_last_status_code() - update_obj["last_error"] = False # If there's text to skip # @todo we could abstract out the get_text() to handle this cleaner diff --git a/changedetectionio/notification.py b/changedetectionio/notification.py index 5c5a1fb1..54495685 100644 --- a/changedetectionio/notification.py +++ b/changedetectionio/notification.py @@ -25,9 +25,7 @@ default_notification_body = '{watch_url} had a change.\n---\n{diff}\n---\n' default_notification_title = 'ChangeDetection.io Notification - {watch_url}' def process_notification(n_object, datastore): - import logging - log = logging.getLogger('apprise') - log.setLevel('TRACE') + apobj = apprise.Apprise(debug=True) for url in n_object['notification_urls']: @@ -53,11 +51,22 @@ def process_notification(n_object, datastore): n_title = n_title.replace(token, val) n_body = n_body.replace(token, val) - apobj.notify( + # https://github.com/caronc/apprise/wiki/Development_LogCapture + # Anything higher than or equal to WARNING (which covers things like Connection errors) + # raise it as an exception + + with apprise.LogCapture(level=apprise.logging.DEBUG) as logs: + apobj.notify( body=n_body, title=n_title, - body_format=n_format, - ) + body_format=n_format) + + # Returns empty string if nothing found, multi-line string otherwise + log_value = logs.getvalue() + if log_value and 'WARNING' in log_value or 'ERROR' in log_value: + raise Exception(log_value) + + # Notification title + body content parameters get created here. def create_notification_parameters(n_object, datastore): diff --git a/changedetectionio/store.py b/changedetectionio/store.py index 7c1cceb3..8403edcc 100644 --- a/changedetectionio/store.py +++ b/changedetectionio/store.py @@ -133,7 +133,7 @@ class ChangeDetectionStore: self.add_watch(url='http://www.quotationspage.com/random.php', tag='test') self.add_watch(url='https://news.ycombinator.com/', tag='Tech news') self.add_watch(url='https://www.gov.uk/coronavirus', tag='Covid') - self.add_watch(url='https://changedetection.io', tag='Tech news') + self.add_watch(url='https://changedetection.io/CHANGELOG.txt') self.__data['version_tag'] = version_tag @@ -332,7 +332,7 @@ class ChangeDetectionStore: self.needs_write = True return changes_removed - def add_watch(self, url, tag, extras=None): + def add_watch(self, url, tag="", extras=None): if extras is None: extras = {} diff --git a/changedetectionio/templates/_common_fields.jinja b/changedetectionio/templates/_common_fields.jinja index ef5dd455..4d757086 100644 --- a/changedetectionio/templates/_common_fields.jinja +++ b/changedetectionio/templates/_common_fields.jinja @@ -10,9 +10,13 @@ AWS SNS - sns://AccessKeyID/AccessSecretKey/RegionName/+PhoneNo SMTPS - mailtos://user:pass@mail.domain.com?to=receivingAddress@example.com") }} -
Use AppRise - URLs for notification to just about any service! Please read the notification services wiki here for important configuration notes +
+
diff --git a/changedetectionio/templates/import.html b/changedetectionio/templates/import.html index 77bd9b40..943e580d 100644 --- a/changedetectionio/templates/import.html +++ b/changedetectionio/templates/import.html @@ -5,7 +5,14 @@
- One URL per line, URLs that do not pass validation will stay in the textarea. + + Enter one URL per line, and optionally add tags for each URL after a space, delineated by comma (,): +
+ https://example.com tag1, tag2, last tag +
+ URLs which do not pass validation will stay in the textarea. +
+