Data scraping

Element Descriptor

Google is not the font of all knowledge. Data scraping is a form of information-gathering on which data is gathered and copied off the web. When done effectively, it allows group to gather a large amount of publicly available information in a manner that alleviates any problems with google search and can be particularly helpful in instances where timelines do not allow FOIAs requests, or these would not be revealing.

Level descriptors

NovicePractitionerExpertNinja
Manually extracts data from a variety of web pages and puts it into a usable, useful format (e.g. Excel spreadsheet)Able to use intelligent automation to extract a large amount of data from HTML, outputting to a CSV or Excel. The practitioner data scraper is likely to be a competent user of Python.Able to use a range of intelligent automation programmes (Python and others), to scrape hundreds or thousands of data points. The expert data scraper will be able to output to a range formats, not only a CSV or Excel, but also JSON, which can be supported by an API. At this level, the expert is likely to have some formal data science training.Regularly trawls both the open and (shudder) dark and deep web to extract from billions of data points. Mastery of a range of intelligent automation and likely to have designed their own customised programme.

Element Overview Essay

This is a draft. If something doesn’t make sense, or you see typos, or if you have further ideas, please email us on contact@activecitizenshiptoolkit.net

The causes of data being scraped badly, are quite simple. It’s a complicated business, and people receive zero  training in how to do this. And if you don’t have the training, you’re gonna have to learn the hard, long, slow way.

The consequences of not scraping data well is you miss data that would be useful for your campaign. You waste.time, which means that you have less time for doing other campaigning things or just even having a life, And that you use up time, energy, morale that could be spent movement building or sustaining yourself by relaxing and vegging out in front of Netflix if that’s your thing, and if you have that level of privilege. 

So the fix is find people who do data scraping from websites or whatever. learn the tricks of the trade. And also think about GDPR and staying GDPR compliant, because you really don’t want to fall foul of that. It would be a reputation suck, money suck, etc.

Development Resources

Assessment Resources