Along with the huge and growing demand for AI applications, there is a complementary thirst for infrastructure and supporting software that makes AI applications possible. From data prep and training to deployment and beyond, a number of startups have stepped on the scene to guide you through the burgeoning world of MLops. Here is a look at some of the most interesting that will make your AI initiatives more successful.
Weight and bias
Weights & Biases is becoming a big presence in the machine learning space, especially among data scientists who want a well-designed, comprehensive experience tracking service. First and foremost, W&B has out-of-the-box integration with almost all of the popular machine learning libraries (plus it’s pretty easy to add custom metrics).
Second, you can use as many W&B as you want, as a turbocharged version of Tensorboard, or also as a way to control and report on hyperparameter tuning, or also as a collaboration center where everyone in your Data science team can see results or replicate experiments conducted by other team members. For the enterprise, W&B can even be used as a governance and provenance platform, providing an audit trail of which inputs, transformations and experiences have been used to build a model as the model is built. moves from development to production.
Your data scientists probably already know W&B, and if they don’t use it within the company, they certainly want to. If OpenAI, GitHub, Salesforce, and Nvidia use W&B, why can’t you?
Seldon is another company with an open core offering that offers additional business functionality on top of that. The open source component is Seldon Core, a cloud native way to deploy models with advanced features such as arbitrary model chains for inference, Canary deployments, A / B testing, and multi-armed bandits, and Support for frameworks like TensorFlow, Scikit-learn, and XGBoost out of the box. Seldon also offers the open source Alibi library for inspecting and explaining machine learning models, containing a variety of methods to better understand how model predictions are formed.
One cool feature of Seldon Core is that it’s incredibly flexible in the way it fits into your tech stack. You can use Seldon Core on its own or insert it into a Kubeflow deployment. You can deploy models created through MLFlow or use Nvidia’s Triton inference server, which allows you to leverage Seldon in different ways for maximum gain.
For the enterprise, there’s Seldon Deploy, which provides a full suite of tools for model governance, including dashboards, audited workflows, and performance monitoring. This offer is aimed at data scientists, SREs, as well as managers and auditors. You won’t be entirely surprised to find that Seldon’s focus on auditing and explaining has made this UK-based startup a hit with the banks, Barclays and Capital One using their services.
While there are many competitors in the template deployment space, Seldon provides a full set of features and a very strong focus on Kubernetes deployment in its core offering, as well as useful enterprise additions for customers. companies that want a more complete solution.
Pine cone / Zilliz
The search for vectors is hot right now. With recent advances in machine learning in areas such as text, images, and audio, vector research can have a transformative effect on research. For example, a search for “Kleenex” may return a retailer’s selection of fabrics without the need for custom synonym replacement rules, because the language model used to generate a vector integration will place the search query in the same area. of vector space. . And the exact same process can be used to locate sounds or perform facial recognition.
[ Also on InfoWorld: 3 AI startups revolutionizing NLP ]
While current search engine software is not often optimized for vector search, work continues in Elastic and Apache Lucene, and a host of open source alternatives provide the capability of high speed and high vector search. scale (for example, NMSLib, FAISS, Annoy). In addition, many startups have emerged to ease some of the burden of setting up and maintaining vector search engines from your poor operations department. Pinecone and Zilliz are two such startups providing vector research to the company.
Pinecone is a pure SaaS offering, where you upload the integrations produced by your machine learning models to their servers and send requests through their API. All aspects of hosting including security, scaling, speed, and other operational issues are handled by the Pinecone team, which means you can be up and running with a search engine. of similarity in a few hours.
Although Zilliz will soon have a managed cloud solution in the form of Zillow Cloud, the company is taking the open core approach with an open source library called Milvus. Milvus encapsulates commonly used libraries such as NMSLib and FAISS, providing a simple deployment of a vector search engine with an expressive and easy-to-use API that developers can use to build and maintain their own vector indexes.
Grid.ai is the brainchild of the folks behind PyTorch Lightning, a popular high-level framework built on PyTorch that abstracts much of the standard PyTorch masterpiece and makes it easy to train on one or 1000 GPUs with a few switches of settings. Grid.ai takes and uses the simplification PyTorch Lightning provides, allowing data scientists to train their models using transient GPU resources as seamlessly as running code locally.
Do you want to run a hyperparameter scan on 200 GPUs at once? Grid.ai will allow you to do this, by handling all the provisioning (and decommissioning) of infrastructure resources in the background, ensuring that your datasets are optimized for large-scale use and by providing metrics reports, all bundled with a one-to-use web user interface. You can also use Grid.ai to run instances for interactive development, either on the console or attached to a Jupyter notebook.
Grid.ai’s efforts to simplify training of large-scale models will be useful for companies that regularly need to launch trainings that occupy 100 or more GPUs at a time, but it remains to be seen how many of those customers are there. Still, if you need a streamlined training pipeline for your data scientists that minimizes cloud costs, you should definitely take a close look at Grid.ai.
DataRobot would love to own your business’s AI lifecycle, from data readiness to production deployment, and the company makes a good case for that. DataRobot’s data preparation pipeline has all the web UI features you would expect to make data enrichment a snap. it is introduced into a model.
DataRobot has an automated machine learning system that will train a series of models against targets for you, allowing you to select the best performing generated model or one of your own uploaded to the platform. When it comes to deployment, the platform’s built-in MLops module tracks everything from availability to drifting data over time, so you can always see your models’ performance at a glance. There is also a feature called Humble AI which allows you to put additional guardrails on your models in case any low probability events occur at the time of the prediction, and of course these can also be tracked through the MLops module.
With a slight difference from most of the other startups on this list, DataRobot will be moving to bare metal in your own data centers and Hadoop clusters, as well as private and managed cloud offerings, showing that it is committed to compete in all areas of the enterprise AI platform fights ahead, serving customers, from the fast-growing start-up to the established Fortune 500 enterprise.
MLops is one of the hottest areas in AI right now – and the need for accelerators, platforms, management and monitoring will only increase as more businesses will enter the AI space. If you join the AI Gold Rush, you can turn to these five startups to provide your pickaxes and axes!
Copyright © 2021 IDG Communications, Inc.