NASA can't figure out what's causing computer issues on the Hubble telescope

rajandatta · on June 24, 2021

Anyone else look at the headline and feel this is one of the dumbest headlines ever. It makes it sound like NASA's incompetent. 'Why haven't you fixed Hubble yet?'.

There doesn't seem to be any nuance or respect that they're trying to repair an orbiting telescope that was launched 30 years ago and designed 40 years ago - and that people are patiently trying to sort through a fully autonomous system 400 miles above the surface of the Earth with a very large set of failure options.

For me - huge props to NASA and other organizations that do this kind of work and keep these systems running for decades. I need to reboot Windows every 2-3 days

davesque · on June 24, 2021

Yeah, I hear you on that. Another possibility is that NPR is trying to manufacture a sense of mystery or surprise as journalists often do with science stories. A bit less nefarious but also sort of annoying in its own way.

kevin_thibedeau · on June 24, 2021

It's more than annoying. It comes part and parcel with the dumbing down of society.

geoduck14 · on June 24, 2021

Most of my work could benefit from a journalist

> Overworked Engineer misses semicolon. All night review session finds it, data gets loaded!

> Management insists Jira stories be routed to new Epic. Team Lead spends hours learning Jira API before giving up and doing it "the hard way"

tharkun__ · on June 24, 2021

Completely not related to the topic at hand but ;)

I love this "spend hours figuring out the API" then giving up.

Maybe it's just me but I have noticed that a lot (and it mean a lot) of devs will overestimate greatly on anything manual they need to do that they don't like. And spend hours if not days trying to automate it. Which is fine if it's gonna be needed again soon or over and over. But really, what is so bad in spending literally 5 minutes doing the above manually with a Jira filter and bulk edit? And by extension sometimes there's not even a bulk edit and you need to do something by clicking the same 5 steps 50 times to acgieve something. Again 5 minutes of actual work. Just put on a nice fast song from whatever music genre you happen to love and do it. Done.

Is it just me?

ravel-bar-foo · on June 25, 2021

I discovered this in my PhD. I spent 3 months trying to automate a process (running a finite element model atop very fragile software platform). I couldn't figure out how to automate my way around the errors the system would give me, every run was unpredictable in duration due to the errors, and I ended up spending a lot of time holding the system's hand.

Eventually gave up on the automation and resolved to spend a few weeks clicking every 15 minutes to complete the simulations. It was very hard to stay motivated, but I got two papers out of it.

The disadvantage is now my CV says I'm an expert at an FEM framework which I never want anything to do with ever again.

xwolfi · on June 25, 2021

Normally if your eyes and hands are synchronized to do an operation, it can be automated.

Sometimes problems come from trying to automate at the lowest level, when maybe screenshotting a vm and sending a click at a specific position if the pixel colors change to the error condition could be done a bit faster.

rebuilder · on June 25, 2021

The real engineers spend weeks analyzing the pros and cons of a manual, one-time fix vs. building an automated solution.

And the even realer ones will spend years coming up with a universal solution to figuring out which is more optimal.

DangitBobby · on June 25, 2021

At least automating it can be kind of fun and educational.

not2b · on June 25, 2021

Journalists don't choose the clickbait headlines and generally don't care for them; they are business decisions, especially for any site that makes money via ad clicks. They even do A/B testing: initially some visitors are shown headine A, others see headline B. If B gets more clicks, A is dropped.

dennis_jeeves · on June 26, 2021

> Management insists Jira stories be routed to new Epic.

Agile expert called in, who saves the day(night?)...

jcun4128 · on June 24, 2021

I find a lot of content on YT is like that. The actual work is skipped hand-waved then it's every other second cut scene and some music on top... Idk. Started unsubscribing from channels lately. Still some good ones.

embeddTrway690 · on June 25, 2021

That's internet for most people. We are in a bubble and the bubble is actually much better than that.

rajandatta · on June 24, 2021

I'm not sure how much more mystery one neds than 'Only orbiting telescope in human history has a fault our best engineers haven't diagnosed as yet!' :-)

imoverclocked · on June 24, 2021

Interestingly, we humans actually have more than just Hubble floating around up there in the way of telescopes.

https://en.wikipedia.org/wiki/List_of_space_telescopes

femto · on June 25, 2021

Missing from the list are the space telescopes that look inwards:

https://en.wikipedia.org/wiki/KH-11_Kennen

slipframe · on June 25, 2021

Fascinating buried lead:

> A NASA history of the Hubble,[24] in discussing the reasons for switching from a 3-meter main mirror to a 2.4-meter design, states: "In addition, changing to a 2.4-meter mirror would lessen fabrication costs by using manufacturing technologies developed for military spy satellites."

Hubble was derived from these spy satellites.

perl4ever · on June 25, 2021

What's mindboggling to me is the sheer number of spy satellites that were more or less each equivalent to Hubble.

There are estimated to have been something like seventeen of the KH-11 satellites, each one of which cost something like 2/3s of a Nimitz class aircraft carrier.

https://en.wikipedia.org/wiki/KH-11_Kennen

And there are an unknown number of its successor, but probably at least 3, where they updated it for stealth.

https://en.wikipedia.org/wiki/Misty_(satellite)

It feels like a parallel universe where what is a technical struggle and tour de force here, is churned out by the dozens.

slipframe · on June 25, 2021

Hubble was botched initially, they had to send astronauts up to fix it before they could get usable images out of it. I wonder what the success rate for those spy satellites was. Maybe some of them were replacements for others, replaced instead of repaired?

wmf · on June 25, 2021

If you go back a little more, they used to launch one or two film-based disposable spy satellites per month.

mancerayder · on June 24, 2021

> trying to manufacture a sense of mystery or surprise

That's a positive spin on clickbait.

JJMcJ · on June 24, 2021

Either mystery or bad puns based on the science.

Extra points if you can throw "Einstein" in the headline.

mhh__ · on June 24, 2021

> Anyone else look at the headline and feel this is one of the dumbest headlines ever. It makes it sound like NASA's incompetent. 'Why haven't you fixed Hubble yet?'.

I didn't get that impression.

This does intrigue me - I like browsing hackernews, but I often get the impression that some people (not the PC specifically) here are either ridiculously anal about English or genuinely do not parse sentences the way I do.

skissane · on June 24, 2021

> I often get the impression that some people (not the PC specifically) here are either ridiculously anal about English or genuinely do not parse sentences the way I do.

People with autistic traits sometimes parse sentences in an overly literal or precise manner. I rarely do this anymore, but when I was a child and teenager I did it more often.

When I'm speaking of "autistic traits", I'm not speaking just of people diagnosed with autism/ASD (who are of course represented here), but also people with broad autism phenotype (BAP), the subclinical manifestation of ASD. BAP is when you have more of the symptoms of ASD than the average person does, but not enough to justify an actual diagnosis of ASD. BAP is quite common in software engineers, and STEM professionals more generally, so I think there are likely a lot of people on this site with BAP (albeit most of them have probably never heard of it.) The people you are talking about quite possibly do have some degree of BAP, and this behaviour is quite possibly a manifestation of their BAP.

not2b · on June 25, 2021

But the headline is literally and precisely true, at least at present. At some point they may be able to figure out the issues enough to restore operation, but not yet.

skissane · on June 25, 2021

Sometimes, how this trait manifests itself, is in a difficulty in seeing that a sentence could have multiple meanings; the mind focuses on one particular meaning and struggles to perceive the other possibilities. Usually there are several different ways to read something in an overly literal/precise manner; the fact that one of them is true doesn't do much good if one's mind has decided to fixate on one of the others that isn't.

spfzero · on June 25, 2021

They use “can’t”, meaning “can not”. Unable. That’s misleading, and no doubt intentional, to create more drama. The more accurate word would be “haven’t”, or “haven’t yet”, but that’s not alarming enough.

spockz · on June 24, 2021

To me this title comes across as just factual and not diminutive in any way.

foxpurple · on June 24, 2021

Can’t work out gives a sense that they have tried everything and failed. “Haven’t worked out yet” is still factual and implies that they are still working on it.

Waterluvian · on June 25, 2021

I hear you. But I’ve also learned as an engineer that nobody really cares to hear from you how difficult and impressive it is.

bamboozled · on June 24, 2021

No, I just read it like they must have a really hard problem and hope they find a solution soon.

IncRnd · on June 25, 2021

> There doesn't seem to be any nuance or respect that they're trying to repair an orbiting telescope that was launched 30 years ago and designed 40 years ago - and that people are patiently trying to sort through a fully autonomous system 400 miles above the surface of the Earth with a very large set of failure options.

That's what the article says. The headline is clickbait.

asymptosis · on June 24, 2021

I thought they were riffing on recent UFO hype.

tambourine_man · on June 25, 2021

Is Windows still like that or was that tongue in cheek? I run my Mac for months, only restart for updates.

wruza · on June 25, 2021

Still? It got exponentially worse last few years. I deliberately turn off unplanned auto-updates by any means because windows 10 reboot/shutdown (when I really need one – drivers, etc) takes minutes because of mandatory updates. When I still had windows laptop and had to bring it somewhere I hibernated it instead of turning off, because turn off would mean “please wait for 20 minutes of battery stress test” or few hours if it’s that lucky day of not-a-service-pack-4.

TideAd · on June 25, 2021

I've worked on some legacy code in my time but debugging this thing has gotta be completely nuts.

code_duck · on June 25, 2021

Saying “can’t figure out” gives me the impression that they’ve tried their hardest and have concluded they are unable to diagnose or solve the problem, which is not accurate.

jacobwilliamroy · on June 24, 2021

I think you may be projecting some self esteem issues here.

qzw · on June 24, 2021

pushes up glasses I would watch the heck out of a Twitch stream of their debugging/brainstorming sessions. I always loved the movie Apollo 13, especially the technical troubleshooting parts.

kabdib · on June 24, 2021

Henry S F Cooper Jr.'s book The Evening Star describes some of the remote debugging and other problem solving that was necessary when the Magellan probe experienced computer problems while orbiting Venus. It's been a few decades since I read it, but it was pretty detailed and rather exciting.

kevmo · on June 24, 2021

This sort of comment is why I still read HN.

barkingcat · on June 24, 2021

And the people downvoting this comment is why I will stop reading HN.

NeutronStar · on June 24, 2021

What did that comment actually bring to the conversation?

gentleman11 · on June 24, 2021

Occasionally, very occasionally, it’s nice to read somebody just expressing enthusiasm instead of just posting a clever counter argument. It’s like a spice that you only want a little of but that’s still nice

Dylan16807 · on June 24, 2021

Does that spice go bad if it turns gray?

Teever · on June 24, 2021

"this"

jorvi · on June 24, 2021

Eh, I can understand both sides of the fence. 'this is why I read HN' is nothing but a slightly more verbal '+1', but as you stated it does humanize HN and makes it feel more social.

In terms of downvotes, what really irks me and what I often see is people posting factually correct information, but still being sent into faded oblivion because some sect of the community's worldview doesn't agree with the facts.

ben0x539 · on June 24, 2021

I don't understand this viewpoint. Information being factually correct is a low bar. I have a lot of factually correct information that is irrelevant or misleading, or that I could state in a way that drags down the level of discourse more than it illuminates truth. Factually correct information is usually involved in tu quoque fallacies, or used to goad people into drawing false, non-sequitur conclusions. The Hacker News guidelines lay out a list of expectations for comments that go beyond factual correctness.

If someone uses factually correct information to make a comment thread worse, I can see how downvotes could be justified.

jorvi · on June 25, 2021

The fact (heh) that people instantly rammed the downvote button to get what I said to -2 ironically proves my point. Oh well.

eitland · on June 24, 2021

It encourages others to post more these comments.

Since comment scores was removed this is the only way to signal this to others besides the original commenter.

That said it should not be overused. If it annoys someone I guess they should downvote it but I don't think there is a need to reflexively downvote every time someone adds a friendly meta comment.

(And if people start gaming it for karma farming I guess it should be downvoted relentlessly until that stops :-)

RHSeeger · on June 24, 2021

It didn't necessarily bring anything to this one conversation. It did, however, communicate that "this is the type of information that that person finds valuable on Hacker News". And knowing what other people in your social group like to hear/discuss is an important part of keeping that group vibrant and wonderful.

So no, it's probably not as useful as the comment it was referring too, but it was useful (to some of us) as it pertains to the community as a whole.

kbelder · on June 24, 2021

Conversation.

pc86 · on June 24, 2021

It's arguably even worse than just commenting "This." At least that is small enough you can scan over it and barely even register its existence. But this fedora-tipping "Thank you kind sir this is the type of Internet Content I enjoy!" doesn't even afford you that luxury.

jbuhbjlnjbn · on June 24, 2021

I really dislike the downvote function because it reinforces self-censoring. And I completely loathe the implementation of it, you need xxxx upvotes to downvote posts....I have no words.

Well, in opposing it I especially read the faded comments and upvote any of those that are not completely abhorrent.

Take that, ycombinator.

beerandt · on June 24, 2021

>I especially read the faded comments and upvote

This seems to happen a lot more frequently here than anywhere else.

I'm not really sure what that says, other than people still read comments that are faded. Also that people shouldn't worry about self-censoring.

I don't have a problem with downvotes or the karma needed to do it, but

I do sometimes wish it were possible to reply to a dead comment, especially if you vouch for it and it's still dead.

Sometimes they're worth defending, or is relevant in a non-obvious way, and sometimes the comment itself is discussion worthy, as it relates to the topic, even if it's wrong or seems trollish.

omikun · on June 24, 2021

How could it have been reworded to avoid the "fedora-tipping" connotation?

I'm being sincere here since I also appreciate book recommendations and I get probably half my book recommendations from HN.

Freestyler_3 · on June 24, 2021

I think the point is to instead of using the keyboard, use the mouse to click the up arrow, and leave it at that. (I know how tempting it is to reply quickly to something, I have the urge to just post whats going on my mind right away unfiltered. So I am very forgiving, but not everyone is)

pbhjpbhj · on June 24, 2021

That only tells the person who owns the comment that you appreciate it, with comment scores you're correct that almost all "I like this" comments are wrong, without comment scores then they become useful again.

amalcon · on June 24, 2021

Worth considering that comment scores were hidden for a reason. Exposing that information to everyone, as opposed to just the comment author, does not necessarily improve the discussion.

bee_rider · on June 24, 2021

This sort of comment is why I still read HN!

ArcticCelt · on June 24, 2021

This youtube series of video, follow a group that restored an Apollo Guidance Computer that a collector basically pulled from the trash.

https://www.youtube.com/watch?v=2KSahAoOLdU&list=PL-_93BVApb...

My favorite part is when they needed the version of the software that was used for the moon landing but they only had the source code for a previous version (scanned from giant binder) and the hash value of the version of the landing. By a series of educated guesses, by reading memos and by analysis of the source code they modified the old code the exact way so it gave them the correct hash, confirming that they correctly and exactly recreated the original code.

It's being a while and I go from memory, I might have some details wrong. See this video for this story. https://www.youtube.com/watch?v=-JTa1RQxU04

trothamel · on June 24, 2021

If you want to see debugging a computer in space, check out Apollo 13's sequel, Apollo 14. The moon landing is being held up by shorted-out switch that's causing the LM to abort the landing, and it's up to the programmers back home to figure out how to work around it in time to allow the landing.

Apollo 13 was the story of a 'successful failure', while Apollo 14 shows how hard work and creative thinking can turn failure into success.

cameldrv · on June 24, 2021

Don Eyles' book Sunburst and Luminary has a chapter on this, and Don was primarily responsible for the Apollo 14 workaround. The book is also generally just a fantastic account of what it was like to develop software for the Apollo Guidance Computer.

trothamel · on June 24, 2021

Also about living through the sexual revolution. It's a really interesting book, but as much of a memoir as a technical book.

geocrasher · on June 24, 2021

Scott Manley to the rescue: "The Computer Hack That Saved Apollo 14" https://www.youtube.com/watch?v=wSSmNUl9Snw

NikolaNovak · on June 24, 2021

Is there actually an Apollo 14 movie? I can find Apollo 18 but not a Apollo 14 feature movie.

I saw this but it's a short documentary and may not be what you meant: https://www.amazon.com/Apollo-14-Complete-Downlink-Edition/d...

cratermoon · on June 24, 2021

There's the HBO series "From the Earth to the Moon", which covers this with the Apollo 14 mission. Highly recommended.

NikolaNovak · on June 24, 2021

Ahh, yes; I've seen the series multiple times - agreed that it's great, especially for those who enjoyed Apollo 13. I wasn't sure if there was a different Apollo-14 movie OP/Trothamel was referring to...

trothamel · on June 24, 2021

No, or at least not that I know of. I was having a bit of fun by declaring Apollo 14 (the mission) the sequel to Apollo 13 (the mission).

bumby · on June 24, 2021

Was this the scenario where there was a false positive warning light but they had no way to test if it was truly a false alarm? I remember attending a talk by an Apollo engineer who convinced the control room that the switch design had a propensity to a short and it really came down to a probability-based judgement call

macksd · on June 24, 2021

Honestly that movie is a lot of why I got into math and engineering.

Jim Lovell was undeniably a badass but I watched that movie and thought the heroes were the ones reading telemetry off a computer screen and using their slide rule to figure out what to do. I hope Hidden Figures does that for another generation.

nonameiguess · on June 24, 2021

Not was. Is!

I noticed on the last season of the Expanse that Luna headquarters was named after him and bothered to look him up. Dude's 93 and still kicking!

It's amazing how well the astronaut medical screening worked. Unless they get killed in the line of duty, these guys are all living incredibly long.

seanc · on June 24, 2021

Like John Aaron, the steely eyed missile man!

keanebean86 · on June 24, 2021

This would be cool for earth satellites.

On the other hand watching a stream involving something on Mars, let alone voyager, would be pretty boring!

Send: ls

Ok let's take a 20 minute break.

diamondo25 · on June 24, 2021

From my work experience its like this: 1. Assemble commands to run 2. Run the commands and see results in the 15 minute window 3. See if you can do more commanding in the minutes you have left 4. Make a new plan, wait for next pass, and goto 1

For LEO satellites, that usually means you have 2 blocks of 3 13 minute passes, when the groundstation is in The Netherlands. For a Svalbard groundstation, you get a lot more, but still 13 minute or less passess.

sfink · on June 24, 2021

I've worked on something like this, just a lot more mundane. We had Linux PCs strapped to the ceiling of various locations, mostly malls, together with a camera and projector to produce an interactive display on the floor. I had a couple of times where somebody would be onsite and the projector would be off or the display would be mangled. And it takes quite a while to get a lift to get up to the box (if it would even be allowed at that time of day), there was no network at that time, and all they had was a wireless IR keyboard that occasionally dropped keypresses.

Imagine dictating shell commands, over the phone, to a salesperson who has no idea what half the characters are that you're asking him to type, and the only output signal I could come up with was ejecting the CD tray, which was just visible from the ground...

(Note that the goal wasn't usually to fix things on the spot, it was more to triage things like whether we needed to have a replacement projector on hand, which was a big deal.)

abnry · on June 24, 2021

Job Posting: NASA programmer, needs at least 1 wpm typing speed and experience with compiling large projects.

diamondo25 · on June 24, 2021

The amount of preparation is much, much more than every other "accessible" installation. Typos are the worst to recover from, backspace usually doesnt exist. As I've sent commands to our Linux-running satellites, its usually prepending your commands with ctrl-c characters and a couple newlines at the end, just to make sure it runs and nothing is left in the buffer. There is also a possibility that commands get executed multiple times, and there are usually limits in transmission speed, processing speed, and frame length. Sending a lot of characters over a terminal can cause characters to be eaten, creating typos you can't see, affecting the commanding immensely.

bentcorner · on June 24, 2021

Isn't there some way to ensure that what you typed is what is being executed? Dropping characters from the terminal sounds terrifying.

I don't know enough about ssh and terminals to know if it's possible to type "12345" and see "12345" echoed back to me but really what the remote session sees is "1245".

diamondo25 · on June 24, 2021

Yes, terminals usually echo back the characters. In our case this would be buffered and we could request the buffer. But that would still take some operations. Best way, usually, is to send a bunch of commands in a way you ensure proper order of execution (eg write a file, check checksum of file, execute file), and make sure you can pull the logs afterwards.

Nowadays, links and systems get easier to work with, and you can sometimes have a literal TTY open to the system, like Reactor Hello World has ( https://reaktorspace.com/reaktor-hello-world/ ). However, this is over S-band, which is a 2Mbit/s link, so overhead for a stable TTY (or ethernet connection) is a lot less than using UHF/VHF.

abnry · on June 24, 2021

Very fascinating! You haven't happened to written a blog post or something on this, have you? I am sure HN would love reading about it.

diamondo25 · on June 24, 2021

Sorry, I did not. There are plenty of stories on the internet about cubesats, they get launched by universities even :)

3pt14159 · on June 24, 2021

I'm surprised theres no error correction in your uplink. Crazy.

nonameiguess · on June 24, 2021

One of the more interesting projects I ever had to complete was while awaiting clearance when I first started working for the NRO years back in ground processing algorithms (no longer work there, by the way). One of the long time guys gave us an assignment to implement the BCH error-correcting code, and then iteratively optimize it until we were able to implement the algorithm for computing the code published in a company proprietary whitepaper that was more efficient than any of the publicly published algorithms. That was just everything that is fun about programming. The actual production implementation corrected up to 8 bits per block, though I can't remember the block size any more.

Ground processing was a totally separate program and contract from mission control, though, so my team only ever received data from the satellites. We never sent data to them.

diamondo25 · on June 24, 2021

It depends on the API. If your API is "put this data over uart to the TTY", and the uart of the device is overloaded and drops characters... Or maybe mangles characters due to bitflips. Or what have you. Its all possible!

gundul · on June 24, 2021

Best programmer. -1 wpm.

mywittyname · on June 24, 2021

I'm so fast that I can do -127wpm. Only in certain software though.

Koshkin · on June 24, 2021

Only needs a keyboard with one key.

robocat · on June 24, 2021

A telegraph key: https://en.m.wikipedia.org/wiki/Telegraph_key

mkr-hn · on June 24, 2021

Time to bring back flowchart templates.

throwawayboise · on June 24, 2021

If you ever worked on a busy mainframe your compile jobs could easily be queued for 20-30 minutes. Made you much more careful to check for typos and do test runs of the code "in your head" before submitting.

idreyn · on June 24, 2021

In case you haven't seen it: https://apolloinrealtime.org/13/

bfeist · on June 25, 2021

Hey there. I'm the author of the website. Thanks for the post. Happy to answer any questions.

nevster · on June 25, 2021

Wow - so here I am an hour later... Thanks for that link!

asavadatti · on June 25, 2021

What a fantastic website. Reminds me of Encarta 97

1911z · on June 24, 2021

Thank you for sharing, this is amazing

belter · on June 24, 2021

Fantastic site. Thanks for sharing.

Izkata · on June 24, 2021

Fun weirdness of even limited multilingualness: For some reason my brain first parsed this as "a pollo in real time" - or, from Spanish, "a chicken in real time".

falcrist · on June 24, 2021

It's worth noting that the people in that movie were WAY more loud and emotional than the real NASA engineers and operators.

You can see how NASA people react to tough situations by watching the videos of mission control during the Challenger and Columbia disasters. No shouting. No arguments. Just cool professionalism and restrained emotions.

They have a job to do, and they do it well even under stress. "Steely-eyed missile men/women" indeed.

bumby · on June 24, 2021

>the people in that movie were WAY more loud and emotional than the real NASA engineers and operators.

There are plenty of NASA engineers and leaders who lose their cool. I’m only saying that so people don’t overly lionize them in a way that prevents them from pursuing a similar job because they feel they are somehow cut from a different cloth.

erosenbe0 · on June 24, 2021

Everybody knows that when presented with the irrefutable evidence that the Challenger o-rings would fail, they more or less just let the astronauts die. Definitely cut from same cloth as any other org.

bumby · on June 24, 2021

That’s not quite accurate. It wasn’t that there was “irrefutable” evidence that the o-rings would fail, it was there wasn’t data that they would, or wouldn’t, fail.

“The O-rings were never tested in extreme cold.”[1]

There wasn’t data which led to discussions about uncertainty, but that shouldn’t be conflated with irrefutable evidence of failure.

The obviousness of it (like many engineering failures) was only apparent in hindsight.

“Evidence, in retrospect, points to a long period of time, especially based on post-flight inspections when the joint design weakness was ‘sending a message’ and the true potential of this message was not perceived and reacted to.”[2]

“Not perceived” isn’t compatible with “irrefutable evidence that it would fail”.

[1] https://www.space.com/31732-space-shuttle-challenger-disaste...

[2]https://www.govinfo.gov/content/pkg/GPO-CRPT-99hrpt1016/pdf/...

eschneider · on June 24, 2021

I dunno. That sort of thing is exactly my job, except the remote devices are still on earth somewheres. What you'd see is me sitting in a library drinking coffee and looking at source code and schematics until I had an answer that matched the evidence.

Satisfying, but not exactly must watch tv.

Izkata · on June 24, 2021

> Satisfying, but not exactly must watch tv.

What's in your head could be though. That's my pet theory on the movie Hackers, what we're seeing on the computer screens isn't what's actually there, it's the characters' mental constructs visualized.

moocowtruck · on June 24, 2021

you just killed any future dramatic space troubleshooting film scenes for me

Koshkin · on June 24, 2021

Debugging on a computer that is down is not a very exciting process.

qzw · on June 24, 2021

In the movie Hero[0], two kung fu masters fight a battle purely in their minds. And when the mental fight was over, they only execute one physical move to finish the battle.

Think of this as the computer equivalent of that scene.

[0] https://www.imdb.com/title/tt0299977/?ref_=fn_al_tt_1

bshep · on June 24, 2021

In case anyone wants to watch a clip of the fight:

https://youtu.be/AeeoEpmyb2Y

fishtoaster · on June 24, 2021

_Hero_'s always been one of my favorites. A lot of kung fu movies try to strike a balance between aesthetics and realism - I really enjoy a movie that picks one (in this case the former) and goes all in on it. It's got a fight that takes place entirely on the surface of a lake, and another that takes places in a forest of falling leaves that change color several times throughout the scene. It's an incredibly beautiful movie.

barbazoo · on June 24, 2021

Both: Dance around for 10 minutes trying to out-physics each other

Guy 1: Why don't I just poke him with the pointy bit

Great scene though, makes me want to watch the whole movie.

qzw · on June 24, 2021

Yeah, this is strictly artistic kung fu, which is basically high-mortality ballet. There are also many "realistic" martial arts films, if that's your thing. I enjoy both styles, depending on mood.

the_af · on June 24, 2021

Agreed. Wuxia to be specific, which is kinda like fantasy kung fu and has a long tradition.

It includes powers like becoming weightless, killing with a single movement, flying, etc.

avaldes · on June 24, 2021

Like the battle between Sherlock and Moriarty in Sherlock Holmes: A Game of Shadows?

the_af · on June 24, 2021

Yes. A lot of Western action movies owe their inspiration to Chinese movies (and I suppose, viceversa). In this case Hero (2002) -- or a similar movie, since I doubt it invented this trope -- is likely an inspiration for A Game of Shadows (2011).

jasonwatkinspdx · on June 24, 2021

HK movies clearly inspired the action movies of the 90s.

the_af · on June 24, 2021

Yes, with some cases like John Woo even explicitly crossing over to Hollywood :)

A friend just reminded me the same "battle of the minds" trope is used in Takeshi Kitano's version of "Zatoichi": https://www.youtube.com/watch?v=SlVK7ogwyUI

ISL · on June 24, 2021

What a wonderful visual representation of the notion that, "a battle is won before it begins."

pbhjpbhj · on June 24, 2021

Which is a Tsun-tzu reference presumably, he says don't enter a battle unless you have 'already' won (through preparation, numerical supremacy, etc.).

gautamcgoel · on June 24, 2021

Ugh, such a good movie... If you haven't seen it, do yourself a favor and go see it.

NetOpWibby · on June 24, 2021

This sounds amazing!

meepmorp · on June 24, 2021

No, but "Debugging on a computer that is down... in space," does sound more interesting, right?

You have a computer that you can only interact with over a radio link, and need to make it start working again with only what you know about how the system is built and a limited set of remote commands. Sounds like something I'd get obsessed with solving.

amelius · on June 24, 2021

There is a game in here, somewhere, somehow.

zomglings · on June 24, 2021

Paging Zachtronics (https://www.zachtronics.com/).

pbourke · on June 24, 2021

Speak for yourself

zepearl · on June 24, 2021

I did something like that when trying to boot my brand new root server in Finland a few weeks ago (tried ~50 times while having UEFI enabled plus mdadm raid1 on GPT partitions, never worked, asked support to disable UEFI, worked).

Confirming that not being able to ping/connect to it during the failed attempts was absolutely not exciting :)

only_as_i_fall · on June 24, 2021

I'll settle for the post-mortem

mkarr · on June 24, 2021

pg down

Sigh.

pg down

Sigh

...

Repeat for hours.

smileysteve · on June 24, 2021

especially with extremely long response times.

afterburner · on June 24, 2021

I think what you're really saying is you'd watch the movie version of this.

And considering there are no life or death stakes, it still wouldn't be as exciting as Apollo 13.

hungryforcodes · on June 24, 2021

That might not be true though.

This guy took about 30 hours of video of him porting an 80s version of unix to the ESP8266. Warts and all -- live!

I've started to watch it and it's fascinating!

https://www.youtube.com/watch?v=cDHcGY7EzUM&t=62s

You could have a whole channel with different teams debugging satellite technology and if you're bored, it would probably be quite interesting. The bigger problem is most likely concerns about IP and secret protocols and so on.

"And now Bob will log into TeleSat123 via SSH." <We see bob type in root / password123>

"Oups, uh..gosh we'll just go to a commercial break!"

stackbutterflow · on June 24, 2021

I guess the last thing you want when you're debugging something during your work is for the whole word to watch over your shoulder.

loufe · on June 25, 2021

I am halfway through the second season of For All Mankind, it's a series based on an accelerated (versus the decelerated) series of events around the time of the Apollo missions. I think you may enjoy it!

dehrmann · on June 24, 2021

I have no idea if it would be more or less exciting than a tech company warroom.

geoduck14 · on June 24, 2021

Me, earlier this week:

Did that work... no. Well, what about... THIS... still no. 3 hours later... clear the cache?!? Aww crap

caycep · on June 24, 2021

Someone is going to suggest unplugging it and plugging it in again, i'm sure

fouric · on June 24, 2021

I know that, at different times, NASA has used Forth[1] and Lisp[2] in some of their space applications. Both of these languages offer REPLs that generally accelerate the debugging process, and while your "average" Lisp might be unsuitable for hard real-time applications (due to the presence of a garbage collector, usually without the hard real-time constraints that you can get out of garbage collectors with extreme effort), I wonder if they have some equivalently interactive system on-board the Hubble.

> Most of Hubble's components have redundant back-ups, so once scientists figure out the specific component that's causing the computer problem, they can remotely switch over to its back-up part.

Wait, then why don't they just switch over each component in turn? The "divide and conquer" debugging strategy.

[1] https://www.forth.com/resources/space-applications/

[2] https://flownet.com/gat/jpl-lisp.html

etskinner · on June 24, 2021

> Wait, then why don't they just switch over each component in turn? The "divide and conquer" debugging strategy.

My guess would be that they want to try that method only if this debugging doesn't work. Imagine that there's an electrical issue in item 1 that fries item 2. If you switch over to item 2b, then you fry item 2b too!

This is exactly what happened with the Soviet Salyut 7 station. They tripped an over-current protection, didn't fix the root issue, and remotely turned the circuit back on. A series of electrical shorts then rendered the entire station without power, resulting in the need for one of the most daring station rescue stories of all time:

https://arstechnica.com/science/2014/09/the-little-known-sov...

voldacar · on June 24, 2021

Wow, that's an amazing story. Thanks for posting, i had no idea something like that ever took place

dmckeon · on June 24, 2021

> Mission controllers, very tired now that the end of their 24-hour shift was approaching

Are shifts this long still common practice in US or RU space programs?

beerandt · on June 24, 2021

>"The rule of thumb is when something is working you don't change it," Hertz said. "We'd like to change as few things as possible when we bring Hubble back into service."

AnIdiotOnTheNet · on June 24, 2021

A lesson not taught to any modern software developer. Instead they change things all the time for no real reason other than that they want to change things.

35fbe7d3d5b9 · on June 24, 2021

One of the best senior engineers I worked with taught me how to run an outage. The most important thing? Stop what you are doing, take charge, and get everyone else to stop what they are doing.

The best case scenario of a bunch of engineers flailing about on a bridge turning knobs is that you luck into a fix but don't know how you got there. But you're more likely to make things worse.

boardwaalk · on June 24, 2021

Sounds like “locking the doors” (Space Shuttle disasters). Although, there was really not much to recover from there.

etskinner · on June 24, 2021

I hadn't heard of this before, chilling but cool: https://www.theguardian.com/world/2003/feb/13/columbia.space...

londons_explore · on June 24, 2021

In my experience, the best strategy depends a lot on the severity of the outage.

If all the alarms are going off because of a loss of redundancy, then currently there is no outage. The correct move should be carefully considered, and maybe tested in the sandbox environment.

If there is currently a 100% outage, it's best to go all out on trying every possible fix, because typically you'll restore service quicker that way. Sure, occasionally you dig yourself a deeper hole, but usually it's the best strategy.

CGamesPlay · on June 24, 2021

> If there is currently a 100% outage, it's best to go all out on trying every possible fix, because typically you'll restore service quicker that way. Sure, occasionally you dig yourself a deeper hole, but usually it's the best strategy.

Almost assuredly not. If a system hits a 100% outage, there are about to be a series of cascading failures by dependent systems. If you don't even understand which system is the root cause, all you're doing is testing a bunch of never-before-tried combinations in production and hoping something works.

ComputerGuru · on June 24, 2021

> If there is currently a 100% outage, it's best to go all out on trying every possible fix, because typically you'll restore service quicker that way. Sure, occasionally you dig yourself a deeper hole, but usually it's the best strategy.

Maybe. Once an outage has happened, an additional 30 minutes or hour of outage, depending 100% on the service in question, might be a bearable cost if it means preventing a domino effect of future issues caused by measures taken to restore the outage in a hurry.

sunstone · on June 25, 2021

I envy those innocent souls who have never heard of the concept of configuration management.

grumple · on June 24, 2021

This is not true. If something's working, you don't change it for no reason.

Business requirements and requests change all the time. 90% of our work is done in response to that. The other 10% is fixing up technical problems due to increased scaling or bugs found, and then basically never do we upgrade a system to keep up with security updates or change to a more modern tech.

AnIdiotOnTheNet · on June 24, 2021

> This is not true. If something's working, you don't change it for no reason

Sure, there's often a reason like "we wanted to write it in a different language" or "we've overhauled the UI to be slower and more cumbersome".

nonameiguess · on June 24, 2021

This highly depends. I believe what Grumple is talking about is contracted work for a specific customer that is asking you for something different, even though what you've already delivered works. You're talking about consumer-facing product teams that make whatever they want and try to sell it after the fact, coming to their own decisions regarding what their "requirements" should be. When your work is funded by a specific customer, the requirements are whatever that customer says they are. And they're almost never going to let you change something just because you wanted to in order to learn a technology you can put on your resume.

Nextgrid · on June 24, 2021

"we need to justify our front-end developers' and designers' jobs"

edgeform · on June 24, 2021

> Wait, then why don't they just switch over each component in turn? The "divide and conquer" debugging strategy.

Let's say the CPU is the actual issue, but the problem manifests itself in the memory module. You swap over to the backup memory module, and suddenly the problem vanishes!

Two months later, the problem manifests again. Identical presentation. This time, there is no backup to switch over to test.

You fly a Very Expensive Mission to the telescope only to find out the CPU was the issue, and if you had figured that out originally you wouldn't be up here with four memory modules.

nucleardog · on June 24, 2021

Let's say the memory is the actual issue, but it's manifested itself by data being corrupted and triggering undesired behaviour. Unfortunately, the corrupted state has already been written back to persistent storage.

So you swap in the backup storage module and all your problems go away. Until it happens again and corrupts _that_ too.

baryphonic · on June 24, 2021

> while your "average" Lisp might be unsuitable for hard real-time applications (due to the presence of a garbage collector, usually without the hard real-time constraints that you can get out of garbage collectors with extreme effort), I wonder if they have some equivalently interactive system on-board the while your "average" Lisp might be unsuitable for hard real-time applications (due to the presence of a garbage collector, usually without the hard real-time constraints that you can get out of garbage collectors with extreme effort), I wonder if they have some equivalently interactive system on-board the Hubble.

This is fascinating to me. Do you have any pointers to information/research/projects focused on hard real-time garbage collection? A Lisp with hard real-time garbage collection (even if Herculean to implement) would be fantastic.

mietek · on June 24, 2021

How about a Lisp without the need for garbage collection at all?

http://web.archive.org/web/20020331165324/http://home.pipeli...

baryphonic · on June 24, 2021

This is also fascinating. Thank you! :)

retrac · on June 24, 2021

Ah, Rust's grandparent!

retrac · on June 24, 2021

I've seen at least one implementation that uses explicit regions; before doing something with say a bunch of cons operations, you allocate a region, all the evaluation is done in the region, it returns a result to the parent region, and the region is then manually dropped, freeing the space used. Set up another region for the next large evaluation and so on.

Almost C style, and I'm sure just as error prone, but it seems like it could work.

https://github.com/wolfgangj/bone-lisp

guenthert · on June 24, 2021

Not exactly real time (as in proven bounded response time), but a noteworthy effort: https://franz.com/services/conferences_seminars/jlugm00/conf...

astrange · on June 24, 2021

As long as there aren't cycles of course you can do deterministic GC, it's just reference counting. It also helps if the program is single-threaded since otherwise any memory allocation/freeing can be unpredictable (since it probably locks.)

baryphonic · on June 25, 2021

I've seen Prescheme (a subset of Scheme used to bootstrap a full-fledged dialect of Scheme) that seems to allow only stack-based allocation and therefore forbids certain types of closures that cannot be reduced via hoisting at compile time.[0]

Lisp with ref counting (assuming acyclic references) instead of GC could also be interesting to try to hack together. I have a feeling closures would be a particularly quick way to get reference cycles, so some concept of weak references may be necessary.

[0] https://thintz.com/resources/prescheme-documentation

skylanh · on June 24, 2021

Another comment keyed onto a concrete example of why not, so I'll go with a presumptive reason:

Some of those elements will be part of the major service windows, and have expected operational and standby lifetimes.

So if a component with two elements has a service window of 10 years, and each element contributes to meeting that service window, then you've bumped your major service window from 10 years to a significant factor less than that.

e.g. the expected use profile might be: use element 1 for 6 years or 60% of service, switch to element 2 for 4 years, replace both during 10 year maintenance window. Interrupting that by bringing element 2 up reduces that window and contingency plans if the service window cannot be met.

I don't know, and I'm just talking out my you-know-what.

rtkwe · on June 24, 2021

> Wait, then why don't they just switch over each component in turn? The "divide and conquer" debugging strategy.

It's better to understand the problem than to just start changing stuff hoping you find the right thing even systematically. There's not a huge rush to fix this since it's the payload computer and the telescope is still being maintained by other systems. A lot bad could happen, if the switching system is flakey you could maybe get stuck in a bad system, or if there's a number of faults you might damage one of the backups. Without the shuttle there's not a plan to service it any more so why take the risk rushing through to the most simplistic debugging method?

rurban · on June 24, 2021

Forth yes, lisp not. Lisp was only used on ground to simulate the rover.

Also a repl in space only makes sense in earth orbit, but not farther away, with 8-20min waiting time for a packet roundtrip to Mars. Those machines really need proper and faster decision making (AI, think lots of `if` statements and proper modeling) on board, eg to perform landing or docking maneuvers. Or to detect and workaround radiation damage in its own circuits.

dchristian · on June 25, 2021

You might check on how New Horizons was implemented. They initially wanted to fly LISP, but (I think) they eventually used the LISP to generate C++. This was to allow dynamic, model based re-planning when the craft was in the outer reaches of the solar system.

dylan604 · on June 24, 2021

What if you backup is failing in the exact same way?

Koshkin · on June 24, 2021

That would probably mean that the whole thing just isn't there anymore.

dylan604 · on June 24, 2021

Or much more likely the same component was used as a back up, and is failing in a similar fashion. It's obvious the thing is still there.

belter · on June 24, 2021

They have two computers:

- First they had a DF-224 flight computer and a - Science Instrument Control and Data Handling (SI C&DH)

Initially DF-224 between missions got installed a coprocessor: https://asd.gsfc.nasa.gov/archive/hubble/a_pdf/news/facts/Co...

During another servicing mission they replaced it with something called the Advanced Computer with Intel 80486: https://asd.gsfc.nasa.gov/archive/hubble/a_pdf/news/facts/FS...

It looks like its about 50,000 lines of code in the C and Assembly programming languages.

https://www.nasa.gov/pdf/327688main_09_SM4_Media_Guide_rev1....

Fig 5-10 is the Data Management Subsystem

https://asd.gsfc.nasa.gov/archive/sm3a/downloads/sm3a_media_...

belter · on June 24, 2021

There is also a Help Desk but its probably busy right now...

"Welcome to the Hubble Space Telescope Help Desk"

https://stsci.service-now.com/hst?id=hst_index

tyingq · on June 24, 2021

From a NASA post 2 days ago[1]:

"The operations team is investigating whether the Standard Interface (STINT) hardware, which bridges communications between the computer’s Central Processing Module (CPM) and other components, or the CPM itself is responsible for the issue. The team is currently designing tests that will be run in the next few days to attempt to further isolate the problem and identify a potential solution."

So "can't figure out" sounds more like "haven't yet figured out", but they have remaining ideas to play through.

[1] https://www.nasa.gov/feature/goddard/2021/operations-underwa...

nashashmi · on June 24, 2021

> Most of Hubble's components have redundant back-ups, so once scientists figure out the specific component that's causing the computer problem, they can remotely switch over to its back-up part.

Of course they do! I wonder if they ever had to put another part out of service. I also wonder whether the twin of the part could also suffer the same failure at the same time without being used.

mikeytown2 · on June 24, 2021

Gyroscopes are in short supply on Hubble currently

wongarsu · on June 24, 2021

And that despite regular servicing back when we still had the Space Shuttle.

Hubble started out with 6 gyroscopes, in 1996 they replaced four of them, by 1999 four had failed so they replaced all six, by 2009 three had failed again, so they replaced all six. Now they are again down to three, and one of the remaining ones has a defect that required some workarounds. The last three gyros are at least a new design that should last a bit longer.

pbhjpbhj · on June 24, 2021

It sounds like a specific failure of gyros, what characteristic causes that failure? Are they more susceptible to cosmic rays or something? Do you know how they've mitigated that failure?

dangrossman · on June 24, 2021

This webpage describes Hubble's gyros and the reason some of the earlier ones failed: https://esahubble.org/about/general/gyroscopes/

Animats · on June 24, 2021

NASA has a Satellite Servicing Capabilities Office developing on-site robotic servicing capabilities for the Hubble and other large satellites. This is connected to the On Orbit Servicing, Assembly, and Manufacturing program at NASA Goddard. They've been working toward this for a decade.[1]

No part of that effort has actually repaired anything in space.

[1] https://nexis.gsfc.nasa.gov/osam/index.html

chias · on June 24, 2021

I can't figure out what's causing computer issues on my desk. I'm not even in space.

ssully · on June 24, 2021

Responding to this comment while debugging a CI/CD pipeline failure for the last hour. I'll toast to the NASA engineers with my cup of coffee.

testingcodehere · on June 24, 2021

Have you tried turning it off and turning it on again?

mcc1ane · on June 24, 2021

everything's in space

sharkweek · on June 24, 2021

“It’s all in your head, but so is everything.”

slver · on June 24, 2021

Technically the universe is projected into our brain and we only perceive that projection. The problem is that it's a very shitty projection.

Koshkin · on June 24, 2021

Good enough not to miss the bowl.

snowwrestler · on June 24, 2021

“Everything in space is trying to kill you. And everything is in space.”

ruined · on June 24, 2021

have you tried turning it off and then on again

chias · on June 24, 2021

Yup! Nevertheless, it remains a macbook ;)

dave_sid · on June 24, 2021

This thing? https://static.wikia.nocookie.net/villains/images/0/07/Tumbl...

scoutt · on June 24, 2021

> At first NASA scientists wondered if a "degrading memory module" on Hubble was to blame.

Funny enough, nobody posted the link to the article that says "70% of bugs are memory issues" (or something like that) yet.

lamontcg · on June 24, 2021

70% of all security fixes Microsoft releases are memory safety bugs.

https://news.hitb.org/content/microsoft-70-percent-all-secur...

This isn't a security issue, NASA isn't Microsoft, and physically degraded memory isn't the same as a memory safety programming bug.

I'll certainly bet that article is super popular with the rust crowd though.

steveklabnik · on June 24, 2021

That article specifically is popular with the Rust crowd, yes, but moreover, that roughly that number (70%-80%) has been replicated by a multiple big tech companies, not just Microsoft. Chrome was another large name.

(And yes you're right none of this has to do with this stuff, for sure.)

TwoBit · on June 24, 2021

"memory issue" seems overly broad or vague.

Black101 · on June 24, 2021

There's no way that's true... maybe 70% of bugs that crash your computer though.

jpeter · on June 24, 2021

Sounds like the plot of the three body problem

hacker_homie · on June 24, 2021

Tri-Solaris hacked the telescope and their coving it up?

behnamoh · on June 24, 2021

Off-topic but it reminded me of that story about using LISP for debugging a spacecraft remotely from the Earth:

https://baltazaar.wordpress.com/2009/07/20/a-story-about-lis...

ortusdux · on June 24, 2021

Time to bust out the tinkertoys!

https://www.nasa.gov/feature/goddard/hubble-memorable-moment...

desktopninja · on June 24, 2021

I wonder if NASA would welcome a live worldwide collaboration to try solve this problem.

lamontcg · on June 24, 2021

The probably 5 people in the world who are domain experts in the Hubble's control system don't really need a hundred backseat drivers, that won't help any.

And as someone who has been invited into "war rooms" by managers who do the "you're smart so of course you can help these other smart people stuck on a hard problem" there's a real skill to being able to read the room and know when to back the fuck out of it or just shut the fuck up -- which most intellectuals don't have. Sit and listen for awhile and take it in. Then maybe take your best idea and ask a very toned down question. If the person who seems to be leading the troubleshooting instructs you on why that's wrong, throws in 3 neighboring ideas that also don't work, with 5 reasons you haven't considered for why that's the entirely wrong path, then just nod in agreement and be quiet and see what you can learn from the domain experts.

Peppering that team with a dozen outside "experts" is going to be useless because they'll just start getting really defensive after awhile, and even if someone winds up throwing out the right solution they'll probably reflexively reject it.

OTOH if that team ASKS for someone who has expertise the team lacks and needs, then go assemble a team skilled at the use of cellphones and the internet to hunt that person down and drag them into the conversation.