Whether it's for outreach purposes or link removals, you won't get far without searching for email addresses, this of course can be time consuming but well worth it and a manual approach is always recommended, this way you're going to be able to contact the right person first time and have a more tailored approach when you're doing so – but of course there are ways to speed this process up.
There are times though when you can't find contact details, even legitimate sites don't always display a clear way to contact them so there are a number of ways you can try and get the details, this might be checking Whois data, checking to see if they have social accounts, I've had some good success in the past using services such as spyonweb.com to see if they have other properties with shared analytics that might be displaying the contact details or using an email validation service and having a guess at common email address.
Recently I was doing a large link clean-up; removing all the poor quality links that have been built to a new client's site over the years and I was left with around 100 domains that had no visible contact details or contact forms, now I could just disavow the links and hopefully that will be enough but there's no fun in that and I prefer to remove links than just disavow to ensure they don't get caught in an algorithm update at some point in the future – The problem is, this is now a big task.
To solve this problem and try to speed up the process I wrote a Python script that basically tests a list of URLs against common email names such as ‘email@example.com', ‘firstname.lastname@example.org', ‘email@example.com' etc. etc.
Here it is in action…(play it in full screen)
These names can be whatever you want, I have just provided a few for you to start with but if you really wanted you could check it against a names database or registrar names. Also this obviously checks against multiple URLs (line by line in the text file), I just added the one to quickly show you as an example.
The validation works by checking if the host has a SMPT Server and therefore actually exists. Once you have the list of valid emails, you’ll want to make a judgement call on which email is best, you can also use a tracking script to see if your email has been opened, if not try another email address.
Now I know what you’re thinking, why not just stick the variations in an email and send them all through BCC?, well I can think of a few reasons; it's potentially more time consuming to start with and less organised, also if one person receives the emails for multiple accounts, you may come across as a spammer and simply have your emails deleted. You also might run into problems if you do this on a large scale as Gmail, for example, has a 500 message send limit per day (1 message to 500 recipients counts as 500 emails).
A few things to keep in mind…
The code speaks to two files; common.txt (this contains your ‘admin@’ type names) and urls.txt (this contains your URLs), there is a third file output.csv, this will be automatically created so don't worry about it too much it will be created in the directory where you have the program running (this should be the location of the other two files also.)
This is using python 2.7 – If you’ve not used Python before you can just install it, it’s as easy as installing any software. Download it from here.
The more domains and names you load into it, the longer it will take to run but just set it going and get on with something else.
Any issues and the program just keeps going and marks it as an error in the csv file. This will be timeout errors for whatever reason.
It works with and without the http protocol as the URLs go through a cleaning process, so for example, the following; ‘http(s)://bronco.co.uk’, ‘http(s)://www.bronco.co.uk’ and ‘http(s)://www.bronco.co.uk/who-we-are.html’ will all work!
Validate Email. This module it needed to do the actual validation checks.
Interrupting Cow. When testing this script I was checking 25 email address which was taking about 9 minutes! This was due to the length of time it was taking for errors to timeout, I used this module to write in a timeout limit of 5 seconds, this cut the time down to about a minute and four seconds so it was well worth my sunday evening figuring it out!
I hope you find this useful and if you have suggestions you can email me or add some comments below 🙂
Anyway enough chit-chat, here’s the code*!
*I’m not a developer and I’m sure there is a more efficient way of writing this code, Python is just something I have been exploring in my spare time. We have a software developer in-house who writes all our tools, custom to our needs, whether it’s in Python or any other language and I’m sure if he wrote this it would look very different.