Creativity and Problem Solving for Data Science (whatever it may mean...) | An experimental spin-off from Nektra Advanced Computing
12 followers 0 articles/week
The Call of the Web Scraper

Astrid, our Data Big Bang and Nektra content editor, is heading to Nepal on a birding and trekking quest. She needs birds sounds from xeno-canto and The Internet Bird Collection to identify the hundreds of species found in Nepal, but the site does not offer batch downloads. We could not pass up the opportunity to offer a useful scraper for birders....

Wed Nov 20, 2013 17:53
Web Scraping 101: Pulling Stories from Hacker News

This is a guest post by Hartley Brody, whose book “The Ultimate Guide to Web Scraping” goes into much more detail on web scraping best practices. You can follow him on Twitter, it’ll make his day! Thanks for contributing Hartley! Hacker News is a treasure trove of information on the hacker zeitgeist. There are all sorts of cool things you could do with...

Thu Sep 12, 2013 12:43
Scraping Web Sites which Dynamically Load Data

Preface More and more sites are implementing dynamic updates of their contents. New items are added as the user scrolls down. Twitter is one of these sites. Twitter only displays a certain number of news items initially, loading additional ones on demand. How can sites with this behavior be scraped? In the previous article we played with Google Chrome...

Tue Jul 30, 2013 09:41
Precise Scraping with Google Chrome

Developers often search the vast corpus of scraping tools for one that is capable of simulating a full browser. Their search is pointless. Full browsers with extension capabilities are great scraping tools. Among extensions, Google Chrome’s are by far the easiest to develop, while Mozilla has less restrictive APIs. Google offers a second way to control...

Sat Jul 6, 2013 21:46
Scraping for Semi-automatic Market Research

It is easy to scrape Microsoft TechNet Forums and normalize the resulting information to have a better idea of each thread’s rank based on views and initial publication date. Knowing how issues are ranked can help a company choose what to focus on. This code can be used to scrape any of Microsoft TechNet’s forum. In the example below we scraped the...

Sat Jun 29, 2013 18:56
Letters from the Future: Challenging Google’s Search Engine

A previous version of this article was posted on Duck Duck Go reddit, where the user _zekiel pointed out that DDG currently uses the two-level search we had proposed. Google is the undisputed search leader (88% market share in the US1). Google is not only ahead of competitors in terms of quality of search results, infrastructure worthy of science fiction,...

Sat Apr 27, 2013 19:52

Build your own newsfeed

Ready to give it a go?
Start a 14-day trial, no credit card required.

Create account