Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The tech stack in the splat world is still really young. For instance, I was thinking to myself: “Cool, MVSplat is pretty fast. Maybe I’ll use it to get some renderings of a field by my house.”

As far as I can tell, I will need to offer a bunch of photographs with camera pose data added — okay, fair enough, the splat architecture exists to generate splats.

Now, what’s the best way to get camera pose data from arbitrary outdoor photos? … Cue a long wrangle through multiple papers. Maybe, as of today… FAR? (https://crockwell.github.io/far/). That claims up to 80% pose accuracy depending on source data.

I have no idea how MVSplat will deal with 80% accurate camera pose data… And I also don’t understand if I should use a pre-trained model from them or train my own or fine tune one of their models on my photos… This is sounding like a long project.

I don’t say this to complain, only to note where the edges are right now, and think about the commercialization gap. There are iPhone apps that will get (shitty) splats together for you right now, and there are higher end commercial projects like Skydio that will work with a drone to fill in a three dimensional representation of an object (or maybe some land, not sure about the outdoor support), but those are like multiple thousand-dollar per month subscriptions + hardware as far as I can tell.

Anyway, interesting. I expect that over the next few years we’ll have push button stacks based on ‘good enough’ open models, and those will iterate and go through cycles of being upsold / improved / etc. We are still a ways away from a trawl through an iPhone/gphoto library and a “hey, I made some environments for you!” Type of feature. But not infinitely far away.



COLMAP to generate pose data using structure-from-motion; if you use Nerfstudio to make your splat (using Splatfacto method) it includes a command that will do the COLMAP alignment. This definitely is a weak spot though and a lot goes wrong in the alignment process unless you have a smooth walkthrough video of your subject with no other moving objects.

On iPhone, Scaniverse (owned by Niantic) produces splats far more accurately than splatting from 2D video/images, because it uses LiDAR to gather the depth information needed for good alignment. I think even on older iPhones without LiDAR, it’s able to estimate depth if the phone has multiple camera lenses. Like ryandamm said above, the main issue seems to be low value/demand for novel technology like this. Most of the use cases I can think of (real estate? shopping?) are usually better served with 2D videos and imagery.


I think the barrier to commercialization is the lack of demonstrated economic value to having push button splats. There's no shortage of small teams wiring together open source splats / NeRF / whatever papers; there's a dearth of valuable, repeatable businesses that could make use of what those small teams are building.

Would it be cool to just have content in 3D? Undoubtedly. But figuring out a use case, that's where people need to be focusing. I think there are a lot of opportunities, but it's still early days -- and not just for the technology.


Yes - agreed. There’s a clear use case for indie content, but tooling around editing/modifying/color/lighting has to improve, and rendering engines or converters need to get better. FWIW it doesn’t seem like a dead-end tech to me though; more likely a gateway tech to cost improvements. We’ll see.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: