This article doesn’t address when cameras are blocked, which is the obvious issue with camera-only self driving. Teslas have crashed when cameras were blinded by the sun. Now throw in snow, rain, dust… Is that solvable with lots of cameras and different types? …Does it need to be solved?
Maybe the bigger question - anyone know the status of low cost lidar? Dozens of startups and larger companies were working on it 10 years ago, yet Lidar still costs “thousands” according to the article
There's something bizarre going on with lidar manufacturing/pricing. When I was looking into it for a specific application it was much cheaper to buy entire made-in-china products containing the exact lidar module than it is to buy the part separately, even at bulk pricing.
This is very common with all kind of components. There are economies of scale your vendor can achieve when they sell someone a million of the same thing. Also the company buying a milion of the same thing is going to pay the vendor a significant sum, even if they get all kind of discounts, and that puts them at a much better negotiating position than you buying a single one.
Hobbyist buying a few units of a component, even if they are buying it with a significant margin, will net the producer peanuts. So not surprising they don’t worry much about serving them that market.
Yes, I'm aware of that. That's why I added the bit about bulk pricing.
In my case I was looking at buying quite a number of units, outside of a hobbyist application. In fact, I would say it was a higher number than the cheaper China-made products could possibly sell (different market sizes).
It seemed to me that they didn't want to sell for any price really but would make an exception if they could really, really rip me off.
As others have pointed out, lidar doesn't denote capability.
1d lidars that have a range of 8 meters indoors are quite cheap <$15 volume.
"2d" lidar, that is one measuring one plane's depth, are generally a lost more costly. Not only that they are bigger and eat more power. again indoor only.
3d lidars are more expensive still, and if you want it to work outdoors, even more.
What does a human driver do when their vision is obstructed?
1. Attempt to use the vehicle's built-in windscreen wiper to remove the obstruction.
2. Failing that, stop the car. Preferably before the vision gets so badly obstructed that the car cannot safely be brought to a stop. But stop the car even so.
3. Get out and clear the obstruction. Admittedly the AI will have trouble with this, but it is vanishingly rare anyway, and if the car is carrying passengers, this task can be given to the passengers.
When the human continues driving in incliment weather, and runs into the back of a van full of kids and kills all of them, we put them in jail for making a bad judgement call.
How do we handle the AI mowing over a pedestrian when it makes a bad judgement call? Right now, the status quo is that we do jack and shit, and I can't help but feel like that's not a good plan.
It’s an interesting conundrum, but in a full AI world the hope is that it’s so rare we don’t feel the need to be punitive at all and can chock it up to bad luck and try to learn from it. Perhaps more similar to when a airplane crashes and people die.
People and ADAS have their own, different, and critical weaknesses. Neither is a panacea. (Which is why mass transit investment should be prioritized over scifi fantasy ADAS.)
> Teslas have crashed when cameras were blinded by the sun. Now throw in snow, rain, dust…
I used to think the more sensors the better, but after listening to George Hotz talk about it I can see the logic of focusing on ambient spectrum in visual and near range. Of course, he will talk up his approach as best, but here it is as best as I recall:
1. more sensors ~= more signal
2. more sensors means
a. longer processing pipeline for fusing data streams (timing, registration)
b. more software, thus more surface area for defects
c. decisions about response when 1 sensor modality fails
3. visual range spectrum is
a. well adapted for environment
b. has inexpensive and high quality sensors
c. sufficient for humans so is sufficient to get to human-like driving by a computer
The answer to blocked cameras is:
1. to have protocols to slow down and stop gracefully
2. maintain enough of a spatial model of the vehicle surroundings to perform the above (Simultaneous Localization and Mapping, SLAM)
Our eyeballs are not cameras and have way more depth info from their function than just two arrays of pixels that you can derive parallax from, and all the claims that "humans only use their eyes" fundamentally ignore all the other parts we use, up to and including an intrinsic simulation of physics in our brain.
Yes, sure. Cameras and biological light sensing have different tradeoffs. My lay person's understanding is that the eye-brain neuron pathway bandwidth is not theoretically sufficient for what we perceive and so our brain is effectively running an ongoing simulation of the future a few miliseconds ahead of now and correcting based on sensory input.
The book "An Immense World: How Animal Senses Reveal the Hidden Realms Around Us" by Ed Yong [0] is really great for understanding how sensory input informs but isn't the same as a mental model of the world built into the operations of a living thing.
Likewise ADAS and similar systems do not operate simply on what is sensed at any particular moment. Even ahead of things like being blinded by a sunset, there are occlusions when one object moves behind another and cannot be directly detected but can be inferred by an object model that predicts future positions given the the earlier known velocity and acceleration. [1]
More than that, I mean eyes have more data than just what light is hitting their retinas. The work that the brain and neurons do to aim and focus your eyes at a distant object essentially solves several math problems that give you very direct distance info. Your brain knows that, if the angular deviation of your eyes away from parallel is X to aim at an object, then it is ~Y distance away. It also knows that, these muscles have to flex this much to focus on that object, which ALSO provides depth info to your brain. Solid state image sensors cannot provide either of those datasets.
These two processes are actually why VR can be difficult on the eyes, because while the main way your brain senses depth is the parallax (the classic "binocular vision" way people think of), the sense of focus is telling your brain that everything is right in front of your eyes.
The first rangefinder, micking this process mechanically, was invented in 1769. You’re essentially arguing for Lidar / sensor fusion.
Do you have any sources for this being a significant factor in human depth estimation? “Infinity” focus starts at 6 meters, yet we’re able to estimate much larger distances with great accuracy.
I looked up the history of the rangefinder and the work of Watt in the 1770s is kind of obscure. For one, he called it a “micrometer” [0] even though he also created something like what is called a micrometer today, only he called it an “end measuring machine.” Additional confusion comes from “telemeter” as an early term for a rangefinder. Only Watt was also there at the beginning of what we now call telemetry: “additions to his steam engines for monitoring from a (near) distance such as the mercury pressure gauge and the fly-ball governor.” [2]
Watt's micrometer, designed between 1770 and 1771, was what we would now call a 'rangefinder'. It was used for measuring distances, and was essential for his canal surveying work.
Adapted from a telescope, with adjustable cross-hairs in the eye-piece, it was particularly useful for measuring distances between hills or across water.
Maybe the bigger question - anyone know the status of low cost lidar? Dozens of startups and larger companies were working on it 10 years ago, yet Lidar still costs “thousands” according to the article