The amount of public data available on the internet increases every minute. Open data cities alone share enough information to create entire ecosystems for innovative products and services. Most of the time, what keeps companies from creating a synergy between internal and public data is finding out how to collect the latter. According to Smartproxy – a public data access platform – current barriers to valuable information include predatory localisation practices as well as technological challenges that are solvable but lack prominence in the general business public.  

In the current climate, using public data in combination with internal customer tracking brings out the largest advantages for most businesses. If you are asking yourself what public data can tell you – or if there ’ s a single data gathering solution to give you a data-driven edge – this article by Smartproxy might help you better understand the data market as a whole.

An Insider’s View into Public Data Collection With Smartproxy An Insider’s View into Public Data Collection With Smartproxy Photo: Smartproxy

What can public data tell you about a new market

Is there a market at all? Many businesses use public data to scout market saturation and movements: from press monitoring to SEO statistics, there are clear indicators for an open market. Sporadic social media mentions and press releases or interviews show initial plays in the market. But constant quotes from competitors reveal their entrenched position that can tweak marketing budget estimates for new markets.

On top of that, a silent public data space might indicate that a country or region is not ready to use a new product or service. For instance, using a bestselling Smartproxy’s product – the residential proxy network – to gather data from a particular country's public sources might show a service that there are no competitors in the public’s eye. This can mean that the local startup environment is weak, that the legislature is prohibitive, or that the public is disinterested.

By combining public data with legal analysis and other basic market evaluation steps, businesses can get a better sense of what challenges they’ll face in a new market. Discovering new market opportunities is a powerful but not the only way to use public information.

Data intelligence: competitors, you, and the customer

In Smartproxy’s experience, public data becomes even more valuable after a company discovers a new niche. Most of the time, the next step involves gathering pricing data and other information for competitor analysis. This is a standard part of business development and marketing strategies. Nonetheless, companies looking into new markets face some obstacles for getting this data: localisation, dynamic pricing, and technical protection.

The first obstacle, localisation, is clear cut – many products have varying prices for different locales: a bathtub in India costs less than the same product in the US. Smartproxy’s customers overcome it by setting up residential proxies to gather localised information from different locales. Some use automatic methods (if they need data from e-commerce sites with thousands of products), some just change their IP address with a Chrome proxy extension and browse competitor websites.

Dynamic pricing is a harder nut to crack, as it adjusts the content to whoever is viewing it. Automation is currently the best solution for dynamic pricing intelligence. For instance, setting up an automated browser that uses different device signatures and, if required, accounts. You can connect to a website as a 20-something female with a new iPhone or a 60-something on an outdated PC. These ‘profiles’ then browse categories and see dynamic recommendations that can give you a better idea of competitor pricing solutions.

At this point, public data collection starts getting into a legal gray zone. It involves simulated avatars using unique IP addresses to access competitor websites, gather data from social media websites and forums. GDPR, the CCPA, and dozens of other local data protection laws make ‘public’ data a topic of debate, which is unlikely to be solved anytime soon. In the meantime, websites are adopting technical solutions to clinch every bit of data for themselves, even if all of their content is user-generated or explicitly public.

Solving technical obstacles with technical solutions

Technical blocks are the most common obstacle businesses face when gathering information. If they access a public database too many times or browse their competitor’s website too often, their IP address can get banned. Most of the time this happens automatically. Hopping on a VPN is not an option: VPN providers use data centers for their IP addresses, so these addresses are often blacklisted. (This is why you get all those CAPTCHAs when using a VPN.)

Residential proxies help solve the initial issue of having a reliable IP address for data access, but dynamic pricing intelligence and technical block avoidance require a bit more technical ingenuity. You need to switch browser agents, device signatures, randomize  MAC addresses, and use dozens of other techniques that a simple marketer might not know.

The market has no single cookie-cutter solution for getting public from all sources globally.  Open data cities usually have robust APIs, while others hide data behind dynamically-loading website designs, dozens of CAPTCHA, browser fingerprinting, user tracking, and cookie solutions. No single software solution has the capability to avoid every obstacle in this ever-evolving landscape.

According to Smartproxy, most companies still succeed by using residential proxies in combination with open-source web scraping software or paid scraping tools. But they often need to have at least one in-house software developer to quickly adapt to new obstacles. The technical know-how for circumventing these barriers is extremely valuable, which is why not too many specialists are keen to share their insights.

Open data benefits everyone, but most are staying silent

Technical knowledge in the digital space is less about programming languages and more about knowing how data is served, protected, and acquired. US laws allow companies to use publicly available data as they please. No one is preventing competitors from reviewing your prices. But knowing that there are certain technical limitations and clear ways to avoid them makes a business stand out in the data collection sphere.

This article is a brief introduction into the fascinating and complex world of global data access, and it is Smartproxy’s clearly stated position that sharing and giving access to data pushes our civilization forward. The company wants to leave you with a note that it is looking to change the current public discourse, where not enough professionals are willing to talk about the world of public web data scraping. And Smartproxy invites you to talk about gathering public data in a live chat on their site at any time.