Commit Graph

68 Commits (b29fec0d9541694c76a87caa5fb0afb59585372e)

Author SHA1 Message Date
dgtlmoon f9387522ee
Fetching - Be sure that content-type detection works when the headers are a mixed case (#1604)
2 years ago
dgtlmoon 1aeafef910
Fetcher - Puppeteer experimental fetcher wasn't returning the status-code (#1585)
2 years ago
dgtlmoon e4f6d54ae2 BrowserSteps - Refactored to re-use playwright context which should solve some errors
2 years ago
dgtlmoon d939882dde
Fetcher - Experimental fetcher improvements (Code TidyUp, Improve tests, revert to old playwright when using BrowserSteps for now) (#1564)
2 years ago
dgtlmoon 5325918f29
Puppeteer fetcher, adding disk cache and other fixes (#1563)
2 years ago
dgtlmoon 316f28a0f2
Fetcher - Experimental fetcher fixes, now only enabled with 'USE_EXPERIMENTAL_PUPPETEER_FETCH' env var (default off) (#1561)
2 years ago
dgtlmoon 94f38f052e
Fetcher - playwright/browserless - Use builtin node puppeteer handler in browserless, scales way better, and is faster (#1559)
2 years ago
dgtlmoon 6e71088cde New feature - Restock / stock / out of stock monitor option/mode
2 years ago
dgtlmoon 41856c4ed8
Re #1365 - Playwright - Browser "Service Workers" should be enabled by default but unset via env var PLAYWRIGHT_SERVICE_WORKERS=block (#1367)
2 years ago
dgtlmoon d47a25eb6d
Playwright - Removing old bug fix where playwright needed screenshot called twice to make the full screen screenshot be actually fullscreen (#1356)
2 years ago
dgtlmoon fcfd1b5e10
Ability to configure extra proxies via the UI (#1235)
2 years ago
dgtlmoon 13c4121f52
PDF File change detection - Initial PDF fetcher support with basic text extraction (#1244)
2 years ago
dgtlmoon 0c380c170f
Playwright - Better error reporting and re-try fetch on fail once (#1238)
2 years ago
dgtlmoon b76148a0f4
Fetcher - CPU usage - Skip processing if the previous checksum and the just fetched one was the same (#925)
2 years ago
dgtlmoon 93cc30437f
Playwright+BrowserSteps - Fetch changes - Fetch simply after page starts rendering + delay seconds, disable service workers
2 years ago
dgtlmoon 69756f20f2 VisualSelector & BrowserSteps - Scraper improvements, remove duplicate code
2 years ago
dgtlmoon fde7b3fd97 Remove dupe xpath finder prep code
2 years ago
dgtlmoon 5b530ff61c
Configurable "Browser Steps" when Playwright/Chrome is configured (enter text, scroll, wait for text, click button etc) (#478)
2 years ago
dgtlmoon df6e835035
Make VisualSelector show first available multiple selector, refactor to make more maintainable (#1132)
2 years ago
dgtlmoon 359fc48fb4
Filters can now accept a list/multiple filters (#1064) #623
2 years ago
dgtlmoon 669fd3ae0b
Dont use default Requests `user-agent` and `accept` headers in playwright+selenium requests, breaks sites such as united.com. (#1004)
2 years ago
dgtlmoon 3ebb2ab9ba Selenium fetcher - screenshot should be taken after 'wait' time, not before #873
2 years ago
dgtlmoon 3705ce6681 Render Extract Configurable Delay Seconds should also apply after executing any JS #958
2 years ago
dgtlmoon f7ea99412f Re #958 - remove change screensize, should be in 1280x720 default, was causing "Unable to retrieve content because the page is navigating and changing the content." on some sites
2 years ago
dgtlmoon 1193a7f22c Playwright - Support proxy auth mechanisms (#859)
2 years ago
dgtlmoon e461c0b819
Playwright fetcher didn't report low level HTTP errors correctly (like Connection Refused) (#852)
2 years ago
dgtlmoon 9942107016
Massive improvements to error handling - show separate output for non HTTP 200 status replies
2 years ago
dgtlmoon 1eb5726cbf Execute JS should happen after waiting seconds
2 years ago
dgtlmoon e6173357a9 Visual Selector direct element finder fix
2 years ago
dgtlmoon fae1164c0b
Ability to specify JS before running change-detection (#744)
3 years ago
dgtlmoon 169c293143 Playwright - log console errors to output
3 years ago
dgtlmoon 6553980cd5
Playwright - Use HTTP Request Headers override (Cookie, etc)
3 years ago
dgtlmoon 4a91505af5 Playwright screenshots - no need for high-res "bug workaround" screenshot, use lower quality/faster configurable image quality env var
3 years ago
dgtlmoon 82b900fbf4 Give more helpful error message when a page doesnt load
3 years ago
dgtlmoon 358a365303 Tweaks to playwright fetch code - better timeout handling
3 years ago
dgtlmoon 8294519f43 Content fetcher - Handle when a page doesnt load properly
3 years ago
dgtlmoon 8ba8a220b6 Playwright - Correctly close browser context/sessions on exceptions
3 years ago
dgtlmoon 5cefb16e52 Minor code cleanup
3 years ago
dgtlmoon 341ae24b73 Re #616 - content trigger - adding extra test (#620)
3 years ago
dgtlmoon 9d742446ab Playwright - ByPass CSP for more reliable JS scraping, disable accept downloads
3 years ago
dgtlmoon e3e022b0f4 VisualSelector - Better handling of filter targets that are no longer available in the HTML
3 years ago
dgtlmoon 7983675325 Visual Selector - be more resilient when sites interfere with the xPath scraping
3 years ago
dgtlmoon eef56e52c6 Adding new Visual Selector for choosing the area of the webpage to monitor - playwright/browserless only (#566)
3 years ago
dgtlmoon 6734fb91a2
Option to control if pages with no renderable content are a change (example: JS webapps that dont render any text sometimes) (#608)
3 years ago
dgtlmoon 16809b48f8 Playwright - raise EmptyReply on empty reply, no need to process further
3 years ago
dgtlmoon 67c833d2bc
Re #214 - configurable wait extra seconds for webdriver requests before extracting text (#606)
3 years ago
weeix 31fea55ee4
Fix PLAYWRIGHT_DRIVER_URL default value (cf. #587) (#599)
3 years ago
dgtlmoon 18f0b63b7d
Ability to specify a list of proxies to choose from, always using the first one by default, See wiki (#591)
3 years ago
dgtlmoon 9807cf0cda Playwright - code fix
3 years ago
dgtlmoon d4b5237103 Playwright fetcher - more reliable by just waiting arbitrary seconds after the last network IO
3 years ago