Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Perfect fidelity isn't there but all popular browsers have a "save page" functionality which seems to work really well.


That's exactly what the parent is criticizing. The problem with save page is that the HTML you save still contains tons of links to server resources, particularly CSS and JS. Of course those links will work if you look at the saved page immediately after you save it. The problem is that if you come back later, sometimes even just the next day, they no longer work. A lot of JS file names are auto-generated random numbers, produced by packaging systems rather than humans, which change whenever the developers edit their JS. They aren't designed to be stable.

There are tools that try to fetch those links and update the HTML to point to the local copy. But those tools can only go so far. JS is allowed to fetch new files dynamically, and there's no reliable way to look at a piece of code and automatically figure out what it's going to fetch when you run it.


> JS is allowed to fetch new files dynamically, and there's no reliable way to look at a piece of code and automatically figure out what it's going to fetch when you run it.

You've diverged from the context and are no longer doing an apples-to-apples comparison. The things you're describing are all opt-in and amount to having to deal with an adversarial input. There's nothing inherent to the medium that requires those things.

In other words, a person publishing a PDF is already abstaining from certain things. (Namely, the sorts of things you're bringing up that would make for a pathological case.) If the person who publishes a PDF does a straightforward translation into a web page, then you end up with something that doesn't exhibit any of the downsides you're discussing.


No, but the medium allows these things. And that's a problem.


Good point, and also relevant user name :)


No, most browsers will save the resources as well and rewrite the HTML to reference them. You can have problems with dynamically loaded things but I have found that it works very well in practice. I have had maybe one page that was significantly broken saving from Firefox over the years.


Thanks dude, it's nice to see that there still arr people that can read a text and understand the point.


I've found the best way to save a page on a browser is to print it ... to PDF.


Absolutely, depending on how much I care about the content, I either print it directly from the reader mode (which gives pretty bland results) or I touch up the page itself with things like "column-count: 2" and a few changes to headlines, to give it the look of a proper print article. Either way, printing to PDFs is a great way to archive/save web content for later.


it's quite nice this way. much better than the old .mht file even. it skips the junk.


This is brilliant... I hadn't thought about it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: