Following recent announcements from Winton Group’s data business Hivemind, HFMTechnology takes a virtual tour inside the colony

BY CARLY MINSKY - February 2018

Last month, one of the world’s biggest hedge funds turned its quantitative resources inward. In a departure from the type of research and analytical projects commonly undertaken at data-driven firms, scientists at David Harding’s Winton Group made the hedge fund sector the subject of inquiry. But in answering the question ‘How big is the hedge fund industry?’, the resulting report was more than just a navel-gazing exercise for curiosity’s sake. Accepting that “all asset management firms sit on a spectrum, whether in terms of fees charged, strategies employed, or types of vehicles offered to investors” could produce significant shifts in allocation activity and hedge fund evaluation.

Beyond the headline result that firms meeting the typical hedge fund criteria manage far less than the assumed $3trn – $850bn according to the report – Winton’s researchers conclude in no uncertain terms that “no distinct ‘hedge fund industry’ exists”.

“This matters on at least two counts,” the authors explain. “First, pension schemes and other big institutional investors usually have a discrete asset allocation bucket for hedge funds. Given the definitional mess that surrounds the relevant firms, it seems probable that sub-optimal investment decisions may be being taken as a result.

“A second issue concerns how asset management firm performance is assessed. There are many different hedge fund indices against which firms are benchmarked. Many of these indices, however, face similar definitional problems and can as a result be poor yardsticks for the firms that use them as a reference.”

Jonathan Levy, head of product research at Winton, adds: “This misconception is potentially dangerous… This approach of treating hedge funds as an asset class feels increasingly illogical.”


Into the hive

This project – unusual as it may be in its self-referential subject matter – is one of many initiatives running at Winton’s data division Hivemind, which now runs as a separate business offering services externally, although it is still wholly owned by Harding’s firm. Just 12 months after spinning out, the data science and technology company is used as a resource in three quarters of Winton’s research efforts, according to Daniel Mitchell, director for research data at Winton Group, speaking exclusively to HFMTechnology.

Many key innovations at Hivemind are not, as might be assumed, within the field of data analysis, but rather in building proprietary data sets from primary sources. In fact, Winton representatives say that the easiest way to understand the essence of Hivemind’s activities is to understand its genesis around three years ago.

Among the hypotheses scientists at Winton wanted to test back then was whether companies that make a lot of acquisitions underperform. The aim was to analyse “generally acquisitive” companies, and test a proposed definition of “recklessly acquisitive”. But the best dataset fit for the purpose only dated back to the 1990s, and so didn’t provide enough data for meaningful and reliable testing. So a team set to work scouring older newspaper headlines and stories, which developed a new need for intelligent technology to systematically collect and organise the data.

By using off-the-shelf artificial intelligence tools, and building proprietary software, the group established what it called “an automated pipeline” of relevant, comprehensive data. Three years on and this approach remains fundamental to all of Hivemind’s work.

“Hivemind has produced 20 proprietary datasets over the past 12 months,” added Henrik Grunditz, head of business development at Hivemind.

A recent project to build a corporate governance dataset covering US companies is a prime example of Hivemind’s use of natural language processing techniques (NLP) to extract relevant but unstructured information. The rich dataset includes information on company boards, members’ age, gender and what holdings they have, how many boards they currently sit on and past appointments, and other social profiling of relevant family relationships.

In this case, NLP was applied to SEC filings and other publicly available information, doing the work that otherwise would require human readers. This level of intelligent automation allows Hivemind to focus its human intelligence on subsets of information sources which are particularly important, particularly difficult to extract information from or where the extracted information was flagged as unusual by data quality metrics.

The results – which also used commercially available datasets from prominent data vendors – produced a number of new systems Winton is now trading, including both event-based signals and corporate governance signals. Ultimately, the work at Hivemind is driven by the understanding that the value of any information moves through a lifecycle, finally becoming redundant when that information is ubiquitous. The “edge” Hivemind produces for the firm is not in any particular dataset, but rather in the continued ability to source, create and enhance new, quality datasets using proprietary technology. In Winton’s 2016 annual report, released in October 2017, David Harding credited the firm’s success to its “technological bias”, and named other “tech-savvy” peers dominating the sector, including Renaissance, Two Sigma, Citadel and AQR.

“The global architecture of secondary market trading has been reconfigured as a tech business, dominated by these tech-savvy firms,” he wrote.


Quality data

While the firm is enthusiastic and proactive in its approach to cutting-edge artificial intelligence and machine learning tools, the key focus for Hivemind is the quality of training data. As it stands, its data scientists find that without large, well-labelled training data sets, any machine learning techniques they might employ are likely to be unreliable or ineffective. Looking forward, the business sees significant potential to provide training data services for external clients, by turning unstructured datasets into accuratelycategorised, machine-learning training data.

Even so, artificial intelligence and machine learning techniques only go so far. Hivemind’s researchers use machine learning for entity extraction through named entity recognition, for example to identify relevant nuances in news articles, but even so, they claim that the technique still pales in comparison to human ability to easily determine context.

This is why the data division also leverages human abilities on a huge scale through crowdsourcing initiatives on Amazon Mechanical Turk – a webbased marketplace through which companies can “programmatically access” a diverse, ondemand workforce for tasks that require human intelligence.

Many of Hivemind’s projects include a vast amount of “trivial” tasks at an early stage, like labelling images. In these cases, crowdsourcing the work can get a substantial amount, if not all of it, done in a few days, often less. While the company is perfectly comfortable harnessing a distributed workforce for tasks like collecting single well-defined numbers from unstructured documents, and basic data quality checks, anything that requires more complex human responses is put to Hivemind’s in-house data processing analysts.

Hivemind’s enthusiasm for collaboration has also led to its involvement in a public data processing project which has few, if any, direct benefits to any of Winton’s strategies. Operation Weather Rescue asks the British public to contribute to the arduous task of digitising handwritten weather logs recorded at the summit of the UK’s highest mountain, Ben Nevis, between 1883 and 1904. Not only is the process of copying information from paper documents into a digital database a perfect candidate for Hivemind’s AI-driven tools, but the project also aligns with Winton’s broader interest in climate change and predicting the market impact.

Winton has the capacity to take over Operation Weather Rescue entirely; key figures at the firm were prepared to do so on a pro-bono basis, a spokesperson told HFMTechnology. But since an essential part of the project’s mandate is to encourage public engagement with weather data, the meteorologists preferred not to outsource the operation to Winton alone. As it stands, Winton have instead offered to manage the data quality assurance process to validate public contributions.

Over the last year, the applications of Hivemind’s technology has been wide-ranging; a significant minority of these applications were requested by external clients, who were first invited to discuss potential use-cases with Hivemind’s scientists half a year ago.

Hivemind’s business development head Henrik Grunditz mentions that his team is “working with a cyberintelligence company that is helping banks with threat detection, including KYC”. Overall, Hivemind’s staff number around 65 people, forming a pyramid with a majority of data processing analysts, a significant technology team, a small number of data scientists and senior data scientists, and Grunditz leading the business strategy.


The future

On the development side, it’s clear that Hivemind has only just scratched the surface of what new opportunities it could uncover. One final project revealed to HFMTechnology worked to create a comprehensive history of ships traversing the Suez Canal by processing French documents stored in an Egyptian archive. At least in terms of geography and language, Hivemind’s team have not yet reached their limit.

Winton’s data business stands out not just for its innovation, but also for its unusual willingness to share details of its proprietary work in the public domain. Naturally, the team are more forthcoming with details about projects for external clients than with details of Winton’s internal initiatives, but in general the enthusiasm for new techniques and applications in data science trumps the need for secrecy


Click here to download the original PDF