How to extract a domain from a website URL

Task

A table contains full page URLs. The export needs a separate domain column with only the domain, without http://, https://, www., path, query parameters, or anchor.

Short Answer

You can do it without regular expressions by searching and removing substrings. For mixed URL formats, the regex option is shorter.

How to do it in Eofferix with substring search and removal

Create the final domain column from url.
In Transformations, add rules: Contains substring https:// — Remove substring https://; then the same rule for http://.
Add a rule: Starts with www. — Remove substring www..
To remove path, query string, and anchor, use rule pairs for /, ?, and #: first Remove after, then Remove substring with the same character.
Without regex: protocol and www are removed as substrings; path and parameters are trimmed by separators.
Save the column settings.

How to do it in Eofferix with regular expressions

Create the final domain column from url.
Add ^https?:// — Remove substring. ^ means the start of the string, https? means http or https, and :// is the literal protocol part.
Add ^www\. — Remove substring. The dot is escaped as \. because a plain dot in regex means any character.
Add [/?#].*$ — Remove substring. [/?#] finds the first path, query, or anchor separator, .* takes everything after it, and $ means the end of the string.
The regex version is shorter: three rules remove the protocol, www, and everything after the domain.
Save the column settings.

Before / After

Before

source data

row_id	url
1	https://www.shop.example.com/catalog/jackets?utm_source=feed

After

result

row_id	domain
1	shop.example.com