General questions about Data Science
+ What is unstructured data?
There is no hard and fast definition of the boundaries between structured, semi-structured and unstructured data. Indeed, data might be considered structured in that it conforms to a useful model and is well organized for one purpose but if you want to use it for another purpose it might be considered unstructured or semi-structured.
However, in broad terms unstructured data refers to data which does not fit into a pre-defined data model such as a table with rows and columns. It is often not computationally searchable in a sophisticated way and is rarely standardized or organized according to defined rules.
Examples of unstructured data include documents such company reports, news articles and legal contracts; online material such as webpages and social media content; and multimedia sources such as images, infographics, audio recordings and video.
+ What does Data Collection mean?
When humans produce information to describe the world and its events, they tend not to do so in a focused, consistent and structured way. Because of this, incredibly useful information gets lost in dense text documents like annual reports, news articles, legal documents or multimedia sources such as images and video.
Data Collection is the process of creating useful, structured datasets from these kinds of unstructured sources.
+ What does Data Labelling mean?
Data labelling is the manual curation of data by humans on machine learning and AI applications.
Supervised machine learning is a powerful tool for classification, forecasting and much more, but success relies heavily on having access to high quality labelled data, both for training and evaluation in a model’s initial construction, and throughout the model’s lifecycle to improve and adapt it.
+ What does Data Wrangling mean?
Data wrangling is the process of enriching data by cleaning, mapping and unifying it to produce datasets that are easy to access and analyse.
Data scientists and business analysts generally spend a lot of their time wrangling data into a useable shape before they can learn or create value from it.
+ What does Data Elicitation mean?
Data Elicitation is the process of extracting information from non-recorded data such as people’s thoughts, opinions and expertise.
For instance, you’d use Data Elicitation techniques if you wanted to compare opinions on the emotive impact of content on your website.
Questions about the benefits of Hivemind
+ We already outsource our manual data processing, so how can Hivemind add value?
The idea of outsourcing data processing to add scale or save cost is of course nothing new; but the traditional outsourcing model is often poorly suited to the needs of a modern data-driven business. It can feel like an opaque and disconnected area of activity that is hard to integrate into your workflows and processes rather than a secondment of resource which can add real scale within those processes. Hivemind fundamentally changes the way you interact with a distributed human team.
Our intuitive, flexible and self-contained user interface increases the productivity of the contributor teams and the quality of the data they produce.
The work the team does is made auditable and transparent: you can see the provenance of each data point, tracing it from the original source material through the judgements made by contributors to the final dataset, and the speed and accuracy of the work is clearly shown per contributor per task.
And Hivemind’s robust API makes it easy to integrate the team’s efforts into your processes, helping you construct automated workflows with humans in the loop.
+ Our data is extremely sensitive, what assurances does Hivemind offer?
Hivemind originates from the secretive world of quantitiative hedge funds, where data and IP security are business critical concerns. Your data is encrypted end-to-end and nobody has access to it except you, and any Contributors you explicitly grant access to help solve your Tasks. You can choose to use contributors exclusively from inside your organisation, or a trusted outsourcer operating under an NDA.
Individual Contributors see only one Microtask at a time, containing just the information they need to see. All activity on the platform is logged and accessible to you.
Our enterprise grade security protocols are why Hivemind is trusted by regulated financial institutions like Winton Group, Barclays PLC, and Fidelity International.
+ Aren't repetitive, manual tasks exactly the sort of thing I should be automating?
There are of course plenty of repetitive and manual tasks that can be fully automated very effectively. However there are many others which, despite all the great advances in machine learning and artificial intelligence, require the flexibility and sophistication of human intelligence if they are to be completed well. This is particularly the case where you are dealing with unstructured or semi-structured sources such as text documents, images, websites and so on; after all, this content is designed specifically for human rather than computational consumption.
At Hivemind, we believe that the optimal solution is often a combination of man and machine: a collective intelligence approach where computational techniques do the heavy lifting or get to an 80% solution while human intelligence can provide the sophistication required to solve the trickier tasks and get to a high quality solution.