Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Covariant.ai and applying deep learning to robotics (indexventures.com)
86 points by wojtczyk on May 6, 2020 | hide | past | favorite | 30 comments


Here are two example sentences that catch my attention:

1. "Most robots these days make use of some form of Deep Learning." This is not obvious. What is the basis for it?

2. "Robots themselves have been around forever, but, with a few exceptions, have been disappointments." In historical context, this is hardly true. Look at assembly lines, at automation, just to start.


The author of the piece seems to confuse robotics for industry automation and for academia and research. Most high-end research/service robots have some form of Deep Learning baked in to them, the vast majority of them being Computer Vision related object/feature detection & classification.

The purpose of the CV part might vary, but for mobile robots (delivery, drone etc) it is usually to enhance localization & mapping (SLAM)by detecting features in the environment. For static robots (arms, P&P) usually object detection for pick, and segmentation+classification for placing/stacking it correctly.

Covariant.ai, Osaro and other CV in robotics start ups are simply trying to sell this as a module to existing arms, to make then slightly more flexible.


Venturing a guess, a possible way for #1 to hold true (in some sense) might be the maturity of camera-based object detection and classification, which even the most risk-averse industrial robot builders might have justified to adopt in their latest offerings.


Maybe if they have a narrow definition of robot that requires it to have computer vision, as opposed to just a machine that repeats a repetitive task exactly.


Most industrial robots are relatively simple machines that repeat a task described in details. Only input may be to wait until a new part came in or to move down the stack of parts until the gripper runs into something to grip.


When I read the post, I interpret it as more of a PR piece than neutral analysis.


Just for some context from someone who is involved in robotics, both Google X and Samsung Research have research teams working on robotics arms. I would expect to see a lot more of these companies in the coming years, weaving a narrative of RL ( currently getting hyped a lot in academia, again ) and factory automation.

Manipulation is another task that appear deceptively simple, but is actually very complex for machines, similar to autonomous driving. Personally, any solution involving manipulation with fingers cannot be viable. Thankfully their approach appear to use a simple gripper. Most of their publication is around general RL (https://covariant.ai/our-approach). And again similar to AVs, the sim to real gap is pretty big here too.

One good thing is that warehouses is a more constrained environment and can be further structured around specific robots. And Amazon has internal robotics teams and have deployed robotic arms in limited settings. It works there because the entire warehouse is structured around robots, that's what it takes.


Just curious, do you think OpenAI's approach (https://openai.com/blog/solving-rubiks-cube/) will work? They were able to use fingers in a non-trival setting. It has a long way to go before it can be deployed in any useful capacity, but to me it challenges the idea that robot fingers won't ever be viable.


Interclass variation amongst rubik cubes is far lesser than intra-class variation amongst different objects. I would put that in the same category as dialogue systems generating responses using RL ( I've actually worked on this ), incidentally also on top of the page today. These things are mostly dog and pony shows. RL's application in practice is limited, even for Covariant .ai I'm not convinced that most of the paper they have posted is not just marketing. Academics are quite good at playing this game, see this paper on their site[1]. It has nothing to do w/ grippers. In practice once you touch hardware, all that policy gradient goodiness goes out the window, hardware considerations, domain specification, etc will dominate how good your solution is. So "domain spec and design" means you need to work closely w/ your customers, and have a say in how their warehouse is designed. Amazon doesn't run into this issue because everything is done in house. But if you try to deploy the same system at Walmart w/o strong institutional support, the system will fail.

Thus companies such as this is a pump and flip play. There's a direct comp that's Amazon's internal division, everyone is trying to copy Amazon's supply chain efficiency these days, so the best outcome is strategic investment from Walmart and such, and then followed by an acquisition.

You saw similar companies come out in the nascent days of the deep learning hype, socher from Stanford comes into mind. His company MetaMind was sold to Salesforce for xx million amount, all the engineers got a fine pay day, they didn't really release a product. But they certainly published some nice papers along the way.

[1] https://openreview.net/attachment?id=ByeWogStDS&name=origina...


They posted their double-blind submission on arxiv? Seems very shady. Is this common in AI conferences?

https://arxiv.org/abs/1906.05862


Yeah in CS ML it’s pretty common. Many papers are on arxiv before submission. With changes in between.


I think some in fields conference editors may grumble about this but from what I've seen in CS they don't mind at all. They just ask the authors not to cite the preprint to give it away, and ask the reviewers not to try and figure out who the author is (as always).


I think https://www.alexirpan.com/2019/10/29/openai-rubiks.html is a pretty good take. tldr: unfortunately, even tiny discrepancies between the simulation and reality (like the size of the cube's bezels or a few extra parts in the robotic hand) can significantly degrade your model's performance in the real world, even with tons of domain randomization during training. So, either we need to improve our simulations to randomize a lot more details, or we need to improve our neural nets to generalize more effectively.


It's worth noting that warehouses are indeed nice for constrained tasks, but they themselves are generally quite unconstrained environments, as they are generally constantly changing shape. In the context of mobile robots, mapping and localization without any kind of concrete features to monitor is a hard problem.

Just some food for thought, I suppose.


Another impressive startup in this area is nomagic.ai. From what I know, they are more advanced than covariant, had been in production for more than a year and recently raised a decent Seed round.

Good luck to both teams!


Osaro and Dexterity have also both raised Series Bs to work on the same problem.


Meta/off-topic: I think that for many people the use all these cool-sounding "mathy" names - covariant, differential, tensor (flow), etc. may in fact be more irritating and confusing than justified to any meaningful degree.



I like mathy company names. In fact, as a math person from long before it was cool, I love that others like them too now. I still own a few mathy domain names that friends warned me against using years ago when I was deciding on the name of startups. Maybe their time has come.

What does irk me a bit is when programmers abuse math words. Like implementing correlation and calling it convolution. Or using Tensor to describe a nested array of numbers. Sort of like nerdrage over an inauthentic movie portrayal of a comic.

But Tensorflow otherwise is just kinda descriptive. An iteration of the "dataflow graph" idea that operates on "tensors".


What’s wrong with using mathy names to describe mathy products?


It's annoying when the products aren't what the names describe. It would be like naming a brand of facial tissues "Motor Vehicle" or a brand of computers "Apple." ;)


From my research, maybe the most interesting characteristic is that all these companies seem dependent on the sucker gripper. I haven't worked in this field, but you'd think it'd be easy getting other gripper to work well, especially since covariant is combining simulated training with non-simulated training.


I think that the communities experience has led them to the belief that the suction gripper is simply the most effective gripper currently available. It’s possible to get other grippers working in theory, but I think if you want reliability suction on constrained packages is the way to go.


As someone in robotics (but not Hardware side) my humble opinion is that it is the easiest(software), durable and robust one (and probably cheapest) available.

When you have two finger grippers, you have to consider how wide they can open (if it is too narrow, usability suffer, too wide the mechanical complexity increases), how many joints you have (too many make them heavy/low durability/expensive), and the torque/force you can put on the fingers (too high it crush the object, too low it slips out. Also more force you want, more expensive the gripper becomes).

On top of that, on the software side: if you use a suction pad you only need to estimate _one point_ of the object to be picked, and can also ignore the "grasping" problem all together. For other grippers you have to estimate several points (two for a simple gripper) and this increase the complexity for both detection (labeling of data, sensitive to center of mass in the object, etc) and grasping


The advantage of suctions grippers is that they are naturally compliant, and thus forgiving of positioning errors.

There's some cool looking combinations as well, Righthand Robotics makes grippers that retracts a suction cup into a set of fingers. It looks a bit like Xenomorph.


I was under the impression that reinforcement learning already tackled the "picking" problem sufficiently well.


I believe your impression is wrong. I believe that amazon shut down its picking challenge in 2018 because it was clear to them the technology wasn’t ready to replace humans in there distributions centers yet.

I’d be happy to be proven wrong though if you have examples.


We are pretty far away. Perception-based manipulation is a very active area of research. It will help a lot of underfed grad students get their PhD's over the next decade.


Maybe on a research level in still relatively controlled settings.

It's only recently that that you could use this with a high success rate in 'the wild'


> the technology was shockingly advanced ... we were blown away

This is a very impressive step on the road, but this kind of hyperbole always sets off my Segway early-warning-system.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: