Commit Graph

89 Commits (f45a1b485a81b1d4b4ec30d34bce71cc858f0999)

Author SHA1 Message Date
dgtlmoon 04b7d98e6c Removing experimental puppeteer fetching browser
11 months ago
dgtlmoon 03976cd0e8 Prefer to use SockPuppetbrowser
11 months ago
dgtlmoon e9a9790cb0
Fetching - Make an obvious error when using BrowserSteps with the simple text fetcher (#2145)
12 months ago
dgtlmoon 7d96b4ba83
Fetching - Always record `server` software reply headers (will be used in the future) (#2143)
12 months ago
dgtlmoon d31a45d49a
Fetcher - Improve status_code logging (#2130 #2122)
12 months ago
Constantin Hong 4be0fafa93
Support Loguru as a logger (#2036)
1 year ago
dgtlmoon e051b29bf2
Browser Steps - General error handling improvements (#2083)
1 year ago
dgtlmoon 273bd45ad7
Fetching - Custom browser on experimental/puppeteer fetcher - Don't switch to custom puppeteer mode if external browser URL is active (#2068)
1 year ago
dgtlmoon d8ee5472f1
Update playwright fetcher library and API calls
1 year ago
dgtlmoon 5229094e44
New functionanlity - Selectable browser / ability to add extra browser connections (good for using "scraping browsers"/ etc) (#1943)
1 year ago
dgtlmoon c8dcc072c8
Code refactor for fetchers (#1941)
1 year ago
dgtlmoon a0665e1f18 Fetcher - experimental puppeteer fetch - dont rewrite the proxy protocol (fixes socks5 bug)
1 year ago
dgtlmoon 6a589e14f3
BrowserSteps - Wrong text taken from browser steps (#1911)
1 year ago
dgtlmoon 4ae27af511
Code cleanup - Browser Steps
1 year ago
dgtlmoon e1860549dc
Fetching - Browser Step enabled watches should also identify 404/non-200 status situations (#1907)
1 year ago
dgtlmoon 349111eb35
Fetching/BrowserSteps - Going to a page was using slightly logic to the main way - make them use the same methods (#1890)
1 year ago
Marcelo Alencar 0aef5483d9
Upgrade selenium to 4.14.0 (latest) (#1783)
1 year ago
dgtlmoon 7debccca73
Fetching - Clarifying how fetchers work with SOCKS5 proxies
1 year ago
dgtlmoon e30b17b8bc
UI + Fetching - Be more helpful when a filter contains no text, suggest ways to deal with images in filters (#1819)
1 year ago
dgtlmoon 57de4ffe4f
Page fetching - Fixed possible incorrect browser user-agent header in playwright/puppeteer/browserless fetchers (#1811)
1 year ago
dgtlmoon 7cb7eebbc5 Browser Steps - When cleaning up old screenshots, check the file exists
2 years ago
dgtlmoon f9387522ee
Fetching - Be sure that content-type detection works when the headers are a mixed case (#1604)
2 years ago
dgtlmoon 1aeafef910
Fetcher - Puppeteer experimental fetcher wasn't returning the status-code (#1585)
2 years ago
dgtlmoon e4f6d54ae2 BrowserSteps - Refactored to re-use playwright context which should solve some errors
2 years ago
dgtlmoon d939882dde
Fetcher - Experimental fetcher improvements (Code TidyUp, Improve tests, revert to old playwright when using BrowserSteps for now) (#1564)
2 years ago
dgtlmoon 5325918f29
Puppeteer fetcher, adding disk cache and other fixes (#1563)
2 years ago
dgtlmoon 316f28a0f2
Fetcher - Experimental fetcher fixes, now only enabled with 'USE_EXPERIMENTAL_PUPPETEER_FETCH' env var (default off) (#1561)
2 years ago
dgtlmoon 94f38f052e
Fetcher - playwright/browserless - Use builtin node puppeteer handler in browserless, scales way better, and is faster (#1559)
2 years ago
dgtlmoon 6e71088cde New feature - Restock / stock / out of stock monitor option/mode
2 years ago
dgtlmoon 41856c4ed8
Re #1365 - Playwright - Browser "Service Workers" should be enabled by default but unset via env var PLAYWRIGHT_SERVICE_WORKERS=block (#1367)
2 years ago
dgtlmoon d47a25eb6d
Playwright - Removing old bug fix where playwright needed screenshot called twice to make the full screen screenshot be actually fullscreen (#1356)
2 years ago
dgtlmoon fcfd1b5e10
Ability to configure extra proxies via the UI (#1235)
2 years ago
dgtlmoon 13c4121f52
PDF File change detection - Initial PDF fetcher support with basic text extraction (#1244)
2 years ago
dgtlmoon 0c380c170f
Playwright - Better error reporting and re-try fetch on fail once (#1238)
2 years ago
dgtlmoon b76148a0f4
Fetcher - CPU usage - Skip processing if the previous checksum and the just fetched one was the same (#925)
2 years ago
dgtlmoon 93cc30437f
Playwright+BrowserSteps - Fetch changes - Fetch simply after page starts rendering + delay seconds, disable service workers
2 years ago
dgtlmoon 69756f20f2 VisualSelector & BrowserSteps - Scraper improvements, remove duplicate code
2 years ago
dgtlmoon fde7b3fd97 Remove dupe xpath finder prep code
2 years ago
dgtlmoon 5b530ff61c
Configurable "Browser Steps" when Playwright/Chrome is configured (enter text, scroll, wait for text, click button etc) (#478)
2 years ago
dgtlmoon df6e835035
Make VisualSelector show first available multiple selector, refactor to make more maintainable (#1132)
2 years ago
dgtlmoon 359fc48fb4
Filters can now accept a list/multiple filters (#1064) #623
2 years ago
dgtlmoon 669fd3ae0b
Dont use default Requests `user-agent` and `accept` headers in playwright+selenium requests, breaks sites such as united.com. (#1004)
2 years ago
dgtlmoon 3ebb2ab9ba Selenium fetcher - screenshot should be taken after 'wait' time, not before #873
2 years ago
dgtlmoon 3705ce6681 Render Extract Configurable Delay Seconds should also apply after executing any JS #958
2 years ago
dgtlmoon f7ea99412f Re #958 - remove change screensize, should be in 1280x720 default, was causing "Unable to retrieve content because the page is navigating and changing the content." on some sites
2 years ago
dgtlmoon 1193a7f22c Playwright - Support proxy auth mechanisms (#859)
2 years ago
dgtlmoon e461c0b819
Playwright fetcher didn't report low level HTTP errors correctly (like Connection Refused) (#852)
2 years ago
dgtlmoon 9942107016
Massive improvements to error handling - show separate output for non HTTP 200 status replies
2 years ago
dgtlmoon 1eb5726cbf Execute JS should happen after waiting seconds
2 years ago
dgtlmoon e6173357a9 Visual Selector direct element finder fix
2 years ago