Skip to content

Scraper

Scraper is the abstract base class shared by Gemini and ChatGPT. All public methods documented here are available on both.

Bases: ABC

Abstract base class for LLM chatbot scrapers.

Manages the Chrome browser lifecycle, simulated typing, file downloads, and state polling. Subclasses implement the platform-specific DOM interactions for a particular chatbot (e.g. Gemini, ChatGPT).

Use Scraper.setup() once to establish a persistent login session, then instantiate the subclass directly for subsequent runs.

__init__(chrome_version=None, download_dir=Path('.'), headless=False, typing_delay=0.025, disable_web_security=True, data_dir=None)

Parameters:

Name Type Description Default
chrome_version

Chrome major version number. Defaults to auto-detecting the installed Chrome version.

None
download_dir

Directory where downloaded files (e.g. generated images) are saved.

Path('.')
headless

Run the browser without a visible window.

False
typing_delay

Seconds between each keystroke when typing character-by-character.

0.025
disable_web_security

Pass --disable-web-security to Chrome. Needed for some scrapers (e.g. ChatGPT, Gemini) but triggers bot detection on stricter sites — set False for those.

True
data_dir

Root directory where Hermex stores its data. Defaults to the platform-appropriate data directory (~/.local/share/hermex on Linux, ~/Library/Application Support/hermex on macOS). Browser profiles are stored as subdirectories within this path (e.g. data_dir/chrome_profile/).

None

open_url(url=None, timeout=30)

Open a URL in the browser and wait for the page to be ready.

Parameters:

Name Type Description Default
url

URL to navigate to.

None
timeout

Maximum seconds to wait for the page to be ready before raising TimeoutException.

30

send_message(message, submit=True, images=None, paste=False, fake_typing=True, typing_delay=None) abstractmethod

Input a message into the chat, optionally attaching images.

Parameters:

Name Type Description Default
message str

Text to send.

required
submit bool

Whether to press Enter after composing the message.

True
images list[str | Path]

List of image file paths to attach before the message.

None
paste bool

If True, paste the message instead of typing it character by character. Useful for long messages where typing is too slow.

False
fake_typing bool

When paste=True, type dummy text first to avoid bot detection, then replace it with the real message.

True
typing_delay float

Seconds between each keystroke. Overrides the instance-level default set in the constructor for this call only.

None

query(message, timeout=None, images=None, paste=False, fake_typing=True, typing_delay=None, get_markdown=False, remove_watermark=False)

Send a message, wait for the response to complete, and return it.

Parameters:

Name Type Description Default
message str

Text to send.

required
timeout float

Maximum seconds to wait for the response before raising TimeoutException. Defaults to 5 minutes.

None
images list[str | Path]

List of image file paths to attach (platform-dependent).

None
paste bool

If True, paste the message instead of typing it character by character. Useful for long messages where typing is too slow.

False
fake_typing bool

When paste=True, type dummy text first to avoid bot detection, then replace it with the real message.

True
typing_delay float

Seconds between each keystroke. Overrides the instance-level default.

None
get_markdown bool

If True, return the raw markdown source instead of plain text.

False
remove_watermark bool

If True, remove the watermark from any downloaded image.

False

Returns:

Type Description
AssistantMessage

AssistantMessage with text and image fields (either may be None, but not both).

get_last_response(get_markdown=False, remove_watermark=False) abstractmethod

Retrieve the last response from the chat interface.

Parameters:

Name Type Description Default
get_markdown bool

If True, return the raw markdown source instead of plain text.

False
remove_watermark bool

If True, remove the watermark from any downloaded image.

False

Returns:

Type Description
AssistantMessage

AssistantMessage with text and image fields (either may be None, but not both).

get_state() abstractmethod

Return the current state of the chatbot UI.

Possible states: - State.IDLE: the interface is ready and waiting for input. - State.TYPING: the input box has content that has not been submitted yet. - State.UPLOADING: a file upload is in progress. - State.GENERATING: the model is actively generating a response.

Returns:

Type Description
State

A State value representing the current UI state.

Raises:

Type Description
Exception

if the state cannot be determined (e.g. expected DOM elements are missing). Callers that need to tolerate transient failures should use wait_until_idle() instead, which has built-in error tolerance.

wait_until_idle(timeout=None)

Block until the chatbot has finished generating its response.

Parameters:

Name Type Description Default
timeout float

Maximum seconds to wait before raising TimeoutException. Defaults to 5 minutes.

None

simple_query(prompt, images=None, timeout=None) classmethod

Open the browser, send a prompt, and return the response.

Convenience method for one-shot scripts that don't need a persistent session. Opens the browser, sends the prompt, closes the browser, and returns the full AssistantMessage.

Parameters:

Name Type Description Default
prompt

The prompt text to send.

required
images

Optional list of image file paths to attach.

None
timeout

Maximum seconds to wait for the response. Defaults to 5 minutes.

None

Returns:

Type Description

AssistantMessage with text and image fields.

setup(data_dir=None) classmethod

First-time setup required before using Hermex.

Opens a browser window so you can browse around briefly. This builds a browser profile that looks like a real user, which significantly reduces bot detection risk in subsequent automated runs. Everyone must run this at least once after installation.

If you need login-gated features (e.g. image upload), log in during this session. Hermex will reuse the saved session in all future runs — repeat setup only if your session expires.

Close the browser window when done.

Parameters:

Name Type Description Default
data_dir

Must match the data_dir you pass to the constructor. Defaults to the platform-appropriate data directory.

Usage: Gemini.setup()

None

close()

Close the browser and clean up

sleep(t)

Sleep for approximately t seconds, with a small random jitter to appear more human-like.

Parameters:

Name Type Description Default
t

Target sleep duration in seconds.

required

short_wait()

Wait for the default short duration (7 seconds). Use after UI interactions that need a moment to settle.

long_wait()

Wait for the default long duration (5 minutes). Use after sending a prompt that triggers image generation or a slow response.

refresh_page()

Reload the current page.

get_current_url(only_base=False)

Return the current browser URL.

Parameters:

Name Type Description Default
only_base

If True, strip query parameters and return only the base URL.

False