Computer vision is playing an increasingly pivotal part across industry, from tracking progress on construction sites to deploying smart barcode scanning in warehouses. But training the underlying AI model to accurately identify images can be a slow, expensive, resource-intensive endeavor — one that isn’t guaranteed to produce results every time. And that is where fledgling German startup Hasty is setting out to help, with the promise of “next-gen” tools that expedite the entire model training process for annotating images.
Hasty, which was founded out of Berlin in 2019, today announced that it has raised $3.7 million in a seed round of funding led by Shasta Ventures, a Silicon Valley VC firm with a number of notable exits to its name, including Nest (acquired by Google), Eero (acquired by Amazon), and Zuora (IPO). Other participants in the round include iRobot Ventures and Coparion.
The global computer vision market is pegged as a $11.4 billion industry in 2020, a figure that’s projected to rise to more than $19 billion by 2027. Data preparation and processing is one of the most time-consuming tasks in the AI sphere, constituting around 80% of the time spent on related projects. In computer vision specifically, annotation, or labeling, is a technique used to mark and categorize images to give machines the meaning and context behind the picture, enabling it to spot other similar objects. Much of this annotation work falls to trusty old humans.
The ultimate problem that Hasty is looking to fix is that the vast majority of data science projects never make it into production, with significant resources expended in the process.
“Current approaches to data labeling are too slow,” Tristan Rouillard, hasty cofounder and CEO, told VentureBeat. “Machine learning engineers often have to wait three to six months for first results to see if their annotation strategy and approach is working, because of the delay between labeling and model training.”
Hasty ships with 10 built-in automated AI assistants, each dedicated to cut down on human spadework. For example, Dextr allows the user to click just four extreme points on an object to highlight it and suggest annotations.
And Hasty’s AI “instance segmentation” assistant creates swifter annotations where there are multiple instances of an object within an image.
The assistants observe while the user annotates and can make suggestions for labels once they reach a specific confidence score, though the user can correct these suggestions to improve the model as they go along while receiving feedback on how effective their annotation strategy is.
“This gives the neural network a learning curve — it learns on the project as you label,” Rouillard said.
There are already countless tools out there designed to help simplify this process, including Amazon’s SageMaker, Google-backed Labelbox, V7, and Dataloop, the latter of which announced a fresh $11 million round of funding just last month.
Hasty, for its part, claims it can make the entire model-training and annotation process significantly faster through the way it combines automation, model-training, and annotation.
As with similar platforms, Hasty uses an interface where humans and machines collaborate. Hasty can make suggested annotations after having being exposed to just a few human-annotated images, with the user (e.g. the machine learning engineer) accepting, rejecting, or editing that suggestion. That real-time feedback is then used to improve the models, which creates better suggestions and thus expedites the model training process the more it’s used — this is often referred to as “the data flywheel.”
“Everyone is looking to build a self-improving data flywheel, the problem with (computer) vision AI is getting that flywheel to turn at all in the first place, [as] it’s super expensive and only works 50% of the time — this is where we come in,” Rouillard said.
In effect, Hasty’s neural networks learn while the engineers are building out their data sets, so that the “build,” “deploy,” and “evaluate” facets of the process all happen more or less concurrently, rather than in sequence. Indeed, a typical linear approach may take months to arrive at a first testable AI model, only to discover that it is deeply flawed due to errors in the data or “blind assumptions” made at the project’s inception. Hasty brings agility to the mix.
That in itself isn’t entirely novel, but digging into the weeds, Rouillard said that his company views automated labeling in a similar light to autonomous driving, insofar as different technologies operate at different “levels” — in the self-driving vehicle sphere, some cars can brake or change lanes, while others are pretty much capable of near full autonomy. Translated to annotation, Rouillard said that Hasty goes further than many of its rivals regarding automation, in terms of minimizing the number of clicks required to label an image or entire batches of images.
“Everyone preaches automation, but it is not obvious what is being automated,” Rouillard explained. “Almost all tools have good implementations of level 1 automation, but only a few of us take the trouble of providing level 2 and 3 in a way that produces meaningful results.”
Data is essentially the fuel for machine learning, and so getting more (accurate) data into an AI model at scale is key.
In addition to a manual error finding tool, Hasty also offers an AI-powered error finder, which automatically identifies where the likely issues are in a project’s training data. It’s a quality control feature designed to find common mistakes in annotation, circumventing the need to search through data for the errors.
“This allows you to spend your time fixing errors instead of looking for them, and helps you to build confidence in your data quickly while you annotate,” Rouillard said.
Hasty claims around 4,000 users, constituting a fairly even mix of corporations, universities, startups, and app developers, spanning just about every industry. “We have 3 of the top 10 German companies in logistics, agriculture and retail using Hasty,” Rouillard added.
A typical use case in agriculture might involve an AgTech company training an AI model to identify crop, pests, or diseases, while in logistics it can be used to train machines to automatically sort parcels by type. Rouillard added that it’s also being used in the sports realm to provide real-time game analysis and stats for soccer coverage.
With $3.7 million in the bank, the company plans to accelerate product development and expand its customer base across Europe and North America.