Hacker Newsnew | past | comments | ask | show | jobs | submit | feznyng's commentslogin

The dose makes the poison.

Even caffeine has an LD50!

Did you get a chance to evaluate coding performance?


Yes, nothing to write home about. It's all relative of course, what stack, what goal, what approach on which models perform best, but for regular day-to-day coding, I do not find it usable given alternatives.

Kimi, Mimimax and GLM models provide far more robust coding assistance at sometimes no cost (financed via data sharing) or for very cheap. Output quality, tool calling reliability and task adherence tend to be far more reliable across all three over Mercury 2, so if you consider the time to get usable code including reviews, manual fixes, different prompting attempts, etc. end-to-end you'll be faster.

Only "coding" task I have found Mercury 2 to have a place for code generation is a browser desktop with simple generated applets. Think artefacts/canvas output but via a search field if the applet has been generated previously.

With other models, I need to hide the load behind a splash screen, but with Mercury 2 it is so fast that it can feel frictionless. The demo at this point is limited by the fact that venturing beyond a simple calculator or todo list, the output becomes unpredictable and I struggle to get Mercury 2 to rely on pre-made components, etc. to ensure consistent appearance and a11y.

Despite the benchmarks, cost and speed figure suggesting something different, I have had the best overall results with Haiku 4.5, simply because GPT-5.4-nano is still unwilling to play nice with my approach to UI components. I am currently experimenting with some routing, using different models for different complexity, then using loading spinners only for certain models, but even if that works reliably, any model that I cannot force to rely on UI components in a consistent manner isn't gonna work, so for the time being it'd just route between less expensive and more expensive Anthropic models.

Coding wise, one more exception can be in-line suggestions, though I have no way to fairly compare that because the tab models I know about (like Cursors) are not available via API, but Mercury 2 seems to perform solidly there, at least in Zed for a TS code base.

Basically, whether code or anything else, unless your task is truly latency dependent, I believe there are better options out there. If it is, Mercury 2 can enable some amazing things.


This is cool stuff, have you considered submitting any of these exploits to https://hackmyclaw.com/? Email being the only allowed injection vector might be tricky though.


Thanks!

I did (not extensively) tried hackmyclaw but no success. The challenge is a complete black box and the user intent (e.g., "summarize my emails") is not known - this is critical for the prompt injection payload. I also suspect that batch processing of "malicious" emails (every 3 hours) adds a bias to the model behaviour (a lot of potential and detected prompt injection payloads are injected in context). That's why I always start my experiments with a fresh context. Moreover, "hacking" the VPS is not allowed.

Imho the author shall disclose more info about the setup (version, user intent, exact config) to make it more realistic. I read people saying "OpenClaw is secure against prompt injection" because nobody was able to solve the challenge - it's not.


You could maybe rearchitect the backend to be provider-agnostic and sell this as a native client for users of Discord, Matrix, Zulip, Stoat, etc. That way you can capture a larger market and you're fine even if Discord kicks you out for ToS violations. Although, I suspect there's a bunch of complexity in papering over each platform especially when you factor in voice/streaming.


> I think maybe there's a mid-ground with buy forever, 1 year updates, so people get the product they paid for, and if they want updates or support the development they can re-buy, however I'm yet to hear opinions on this model.

As far as desktop software is concerned, I think this a commonly accepted approach. Sublime Text is probably the most notable example.


Can you treat kqueue as infallible? I've found FSEvents sometimes drops events at high volume (unless I'm misunderstanding how to use it).


That's the exciting thing about macOS development, right? There's always some funny riddles because Apple doesn't care to document stuff for 3rd parties (or at all).


Could still be useful; maybe for overnight async workloads? Tell your agent research xyz at night and wake up to a report.


Assuming 1 token per second and "overnight" being 12 hours, that's 43 200 tokens. I'm not sure what you can meaningfully achieve with that.


Sure, but if long-term throughput is a real limitation there's plenty of ways to address that while still not needing to keep anywhere close to all model weights in RAM (which is still the conventional approach with MoE). So the gain of a smaller memory footprint is quite real.


How? There's a bunch of annoying problems here:

- Where do you source real time traffic data, ferry schedules, etc? Google APIs get you part of the way there but you'd need to crawl public transit sites for the rest.

- How do you keep track of what went into the fridge, what was consumed/thrown away?

- How do you track real world events like buying a physical pass?


Feeding everything into a secure local environment with intelligence injected and then push things to your phone.

Oh wait. That might be a little insecure!

Hmm.


That isn't secure is the issue, the more things you have it hooked up to the more havoc it can cause. The environment being locked down doesn't help when you're giving it access to potentially destructive actions. And once you remove those actions, you've neutered it.


The openclaw security model is the equivalent of running as root - i.e. full access. If that is insecure the inverse of it is running without any access as default and adding the things that you need.

This is pretty much standard security 101.

We don't need to reinvent the wheel.


The unsolved security challenge is how to give one of these agents access to private data while also enabling other features that could potentially leak data to an attacker (see the lethal trifecta.)

That's the product people want - they want to use a Claw with the ability to execute arbitrary code and also give it access to their private data.


How do you make your win32 app look good to the average person?


Depends what you mean by "look good".

The main function of the app being discussed here is to draw solid black rectangles on the screen.

Don't forget the "average person", I'm assuming someone relying on software as a tool, doesn't care about the stuff "designers" seem to obsess over, and will actively hate if you break their workflow by doing things like adding useless padding that makes them scroll more or shows less information in the name of "modernity". There's a lot of specialized niche software for various industries, often very expensive too, which looks like it came out in the early 90s. As long as it works well, users won't complain.


Oh, how I hate when vendors bring "modern web" aesthetics to desktop utility programs. For example, Docker Desktop could go a long way in terms of usability if it just sticked to Win32 common controls - the kind of buttons, labels and list views that have been around since Windows 95. Maybe I wouldn't even have to wait 10 seconds for the main window to show up every time.


There's a pretty simple settings window: https://github.com/domenic/display-blackout?tab=readme-ov-fi...

Would that UI be hard to accomplish?


You mean conceptually or to match it? Native components are pretty much impossible to match without actually using the native framework which provides them, so you need WinUI/WPF.

Win32 provides its own components which are basically Win95 style apps, and you can draw the components using some graphics APIs by yourself.

The whole native development area is a mess exactly because making your own (decent) renderer is a huge undertaking.


Agreed. The Qt framework, which is a cross-platform UI framework, does a decent job mimicking the native Win32 looks. Inside, the code is a giant mess. But on the outside, the API is very well thought out and easy to use.


But you are making false equivalence, the Win32 GUI API is decades out of date from modern UIs. I can use flutter and make a pixel perfect equivalent of the above UI in an hour, with the exact same responsiveness behavior on both windows tablets and desktop, and scales perfectly in high DPI displays. 3 hours if you want the toggle animation timing to be exactly the same.

I came from the WinForms world so don't pretend I don't understand Win32 programming. The fault lies with Microsoft for not investing in it more.


You talk like that is a bad thing. Win32 UI works, is fast, works everywhere even on ancient 640x480 server screens, safe mode and vnc in 16 colors without opengl, directx, Angle or vulkan.

Flutter is nicer to scale and maybe design but it is a massive overhead. Skia still has trouble with some drivers and causes lag or falls back to software rasterization. Hot replacement while coding is pretty neat though. It runs much better on mobile devices imho.


It works, and fast, but it is not portable. I would argue something like Qt is much more viable in $current_year for cross-platform development. Or if you're really dead-set on actual native components, then I guess wxWidgets works too.


I'd rather tell Linux and Mac users to use WINE.


WinForms is Win32. There is just a managed API wrapper around it.


The functionality of that is not hard at all. A few checkboxes, a trackbar, and a hotkey control (there is actually a standard Win32 control for this: https://learn.microsoft.com/en-us/windows/win32/controls/hot... ), with "pushlike" checkboxes at the top to be drawn replicating the monitor layout.

But that "modern" style is... disgusting and repulsive. That whole dialog is bigger than one of my monitors due to how much wasted space it has.


My favourite example of "Modern" style is the toggle switch, shown even in that image. I laugh a lot of the times I see one, it's the 'replacement' for the checkbox, but it's so awful at actually telegraphing it's current state in a consistent way- (the entire purpose of the control!) that it has to have a label indicating whether it's on or off. I find it so absurd that people genuinely put this stuff into their programs and have no problem with it, because apparently we are just supposed to accept this type of poorly designed component because it's more "Modern".


But think of the poor users expecting consistency with their phones where confusion is expected!


I used to work in finance. Screens very densely packed with text is the preferred user interface.

We did a UI refresh at one point. It looked much nicer. People hated it. We had to hastily redesign it again and it looked far more like the original.


If your application saves me time (is intuitive) or enables me to do tasks that I couldn't do before (is powerful) then I don't care one whit what it looks like. As long as it doesn't actively hurt my eyes to stare at you can do whatever you want.


Sure, if I'm building something for myself or fellow hobbyists this approach works (though in that case I'd prefer a good TUI/CLI). But if you're building an app for the average person, how it looks has a big effect on whether they choose it over an alternative.


It's funny, the "modern" look has become a countersignal for me. If the app looks like a webpage, I instinctively don't want it. Not because of aesthetics, but simply because I've come to associate that style of appearance with a lack of (or awkward) keyboard shortcuts, featuresets dumbed down to a level appropriate for chimps, various nags injecting friction against getting work done (ads, feature tours, logins, update reminders, etc) and laggy, resource-squandering performance thanks to some kind of bloated rendering framework like Electron with multiple V8 hosting processes sprawled across chrome.exe instances or whatever.

Case in point, the Dropbox Simplified Desktop App was a huge improvement for me. It nails just about everything I ever needed their app to do, and removes all the user-hostile fluff I never asked for. Similarly, I found Windows 11 Enterprise IoT LTSC to offer an improved desktop experience compared to traditional Windows, thanks to its exclusion of a lot of the cruft Microsoft otherwise shoves down the throats of users who, as far as I can tell from frank discussions with many of them, likewise actively don't want.

I'm not saying your desire to make your app look polished means it's crap, but beauty is in the eye of the beholder. Just like fashion, I wouldn't be surprised if we see a shift in the aesthetics trend as more people discover a retro feel sometimes signals a better user experience.


Programmers and designers thinking the average person is a moron is one of the two reasons almost no good software is writren today.


What's the other reason?


That most programmers are not that great at programming, and wouldn't be able to produce high quality software even if that was their stated goal.


They really don't. Though if Microsoft wanted to, they could solve that too. For example, the OS control panel used to be extendable. Technically still is, just not the new one (and of course both remain). Then you could have this UI or something very similar to it right in there.

Several integration points like this have been removed, supposedly because third party software was just too bad in how they used them, causing issues. Or at least so goes Microsoft's perspective. Personally, I find that very believable! If by integration points the only thing one can imagine is calling into random third party code on the regular that the user has installed, bang spank in the middle of critical user flows, on the same thread and in the same process as itself, that's exactly the kind of grief I'd expect to occur...

If only there was a way to provide a way to craft e.g. custom flyout menus for the taskbar or custom pages in the Settings app, without invoking arbitrary third party code and possibly causing crashes and hangs in system apps and menus... or just not letting crashes and hangs affect the application (e.g. Windows Explorer) calling them in the first place.


Disable borders and design your app nicely with images to replace standard user input elements.


That sounds like a great way to make a mess. Look at Microsoft's own apps shunning proper File dialogs and instead presenting a giant, bizarre pane of mostly text and a few crudely-drawn boxes in order to save a file. You have no idea what you're looking at or where you are in the file system.

Then there's the removal of title bars from Windows. You often have no idea what app you're looking at. Pull up a PDF in Acrobat and also in Edge. Now, at a glance, which is which?

Regressive garbage.


> removal of title bars

they did what??? [i'm still on 10 because 11 won't run on my laptop </3]


Yes. It's an absolute mess.


God, i finally understand why people hate on windows xD


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: