Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Now Blocking 56,037,235 IP Addresses, and Counting (cheapskatesguide.org)
43 points by devinjon on Jan 5, 2024 | hide | past | favorite | 59 comments


>It means that more than one percent of the IPv4 real estate on the Internet (and probably much more) is occupied by people and organizations who are either clueless or just do not care how much the rest of us are paying to keep our websites on line

There's a significant mental leap here. "I block these IP to conserve my resources, therefore they belong to clueless or malicious organisations". It's wrong in both directions:

* I don't think Google, Bing and other crawlers are inherently malicious, and certainly not clueless. Search engines serve a very important role in the internet. Ditto archive.org, and probably dozens of other bots.

* IP based blocklists work well for honest bots (not malicious, or at least not illegal). Malicious bot operators just buy SIM cards and use regular mobile internet for the crawling (basically unblockable, because the IP may be renewed every day or every hour). And the really malicious actors use residential proxies, i.e. botnets that proxy traffic through normal users' computers. Anyway I wonder how many of those 56MM IP addresses are regular dynamically allocated consumer grade ISPs.

>1-5-2024

For the love of all that is holy, what is this date format.


"the IP may be renewed every day or every hour)"

What a coincidence! Right now from my phone on my carrier's network while traveling in the UK I am unable to reach https://cheapskatesguide.org My phone's IP address is likely on this guy's blacklist of 56M addresses. So I am forever going to remember whatever service this website provides may be arbitrarily unavailable unless I'm on a know good IP address. Overly aggressive blacklists like this are lame, make you lose customers, and go against the Internet's fundamental principle which is about the ability to exchange information with anyone in the world.


I will also add the mentality of choosing to allow or deny access to your service for millions of IP addresses at a time based on some ill-defined rule or based on the service operator's whim is EXACTLY the same mentality that lead us to the current difficulty of running self-hosted email/outgoing SMTP servers.

"Oh because your dynamic IP address was assigned to and used by shady guys 2 years ago, now you cannot access my website, or all your emails will be flagged as spam."

Don't do that people.


Not everyone on the internet is trying to get customers though.


I don't think the IP blocking refers to Google, Bing etc. The site has a robots.txt that allows for these. Of course if robots.txt isn't honored, an IP block is in order.

For-profit corporations aren't inherently malicious or benevolent. They are legal structures that maximize return on capital without moral judgement or care about what it does to others. A bit like Cthulhu.

https://cheapskatesguide.org/robots.txt


Corporations are legal entities incapable of any action. The actions carried out in their name are performed by people who very well can be malicious.


seeing month-day-year is one of my biggest pet peeves, why not just use day/month/year or year/month/day in a logical order.


Month: 1-12 Day: 1-31 Year: unbounded

I come from a dd/mm/yyyy country, but you can’t say mm/dd/yyyy is completely illogical.


mm/dd/yyyy is the most common format in the US. When speaking, I might say December 5, 2023, and this format matches that.

dd/mm/yyyy isn't objectively better IMO. It's what you're used to, making it easier for you to read. It's not what I'm used to, making it harder for me to read.

yyyy-mm-dd is the best format because it's sortable and unambiguous and standard: https://en.wikipedia.org/wiki/ISO_8601

yyyy/mm/dd isn't standard or commonly used AFAIK, so best to avoid it.

mm-dd-yyyy also isn't standard or commonly used AFAIK, so best to avoid it.

https://xkcd.com/1179/


dd.mm.yyyy is better because it’s in an order, but that’s besides the point. If lots of native English speakers (Americans) use one way, and lots (U.K. and bros) use another, the only logical thing to do is use the unambiguous one (yyyy.mm.dd).

People who use mm.dd.yyyy in English text with no indication that they’re American, writing for Americans, have no place on this planet. Joking not joking.


>dd.mm.yyyy is better because it’s in an order,

DD/MM/YYYY vs MM/DD/YYYY each have pros and cons. I don't think we can objectively say 1 is better than the other. I agree YYYY-MM-DD is best. The standard is YYYY-MM-DD, not YYYY.MM.DD, so the dashed version is better than the dot version.

>People who use mm.dd.yyyy in English text with no indication that they’re American, writing for Americans, have no place on this planet. Joking not joking.

The same would apply to dd.mm.yyyy or dd/mm/yyyy with people who don't indicate what country they're from. A few other countries use DD/MM/YYYY:

https://en.wikipedia.org/wiki/Date_format_by_country


> the dashed version is better than the dot version

:) yeah I'm fine with either. Just typed with dots out of habit.

> The same would apply to dd.mm.yyyy or dd/mm/yyyy with people who don't indicate what country they're from. A few other countries use DD/MM/YYYY

I thought maybe that was a typo and you meant "A few other countries use MM/DD/YYYY"? But then I looked on that page and only saw a few places that use a variety. Usually when a country uses multiple standards for anything, it's a sign that one of them is "token". For example English is one of two official languages in India:

https://en.wikipedia.org/wiki/India

English would be widely spoken there, but it's widely spoken everywhere without being an official language.

So if we take countries that say "yeah whatever, we'll do MM/DD/YYYY too", that leave America ;)


Oops, it was a typo, I did mean "A few other countries use MM/DD/YYYY".

According to the page, the Philippines, Panama, the Federated States of Micronesia, and the Marshall Islands use it as the primary format.

And Ghana and Togo use MM/DD/YYYY as the primary format in Ewe.

According to [1], Canada's official format is YYYY-MM-DD; and both MM/DD/YYYY and DD/MM/YYYY will lead to confusion.

[1] https://en.wikipedia.org/wiki/Date_and_time_notation_in_Cana...


Don't bots also get covertly installed on regular folks machines? And those machines will be running from domestic ISP IP address blocks which are commonly shared/cycled between the ISP's customers. Block those and you are blocking legit customers.


> Block those and you are blocking legit customers.

The block doesn't need to be permanent. There are people out there publishing list of IPs known to belong to botnets and they're regularly updated. You can ban an IP for, say, 72 hours, and update your ipset regularly.

But anyway I've got a philosophical question...

If a customer has its computer owned by a botnet operator and that computer connects to a banking website, is the customer legit?


Well you'd need to know if the customer or the bot is connecting. Both are on the same IP which was my point. Rationally I'd want to block any compromised device regardless of the customer, but it's a complex problem for sure.


>>1-5-2024

>For the love of all that is holy, what is this date format.

Most commonly used in the US. 'Cause we gotta show it like we say it. I think.


With dashes? We almost always use slashes here in the US. The only time I see dashes is for ISO format, and that's year-first.


Sure. It was the dominant written style throughout the 20th cent and folks kept on with it.


Yes. It’s less common but looks unremarkable to me.


I see mm/dd/yyyy much more often, though I did see mm-dd-yyyy or mm-dd-yy a fair amount in goverment and military work.

See https://eforms.state.gov/Forms/ds5507.PDF for example, page two. Though again, it's not as common as slashes.


No, you have it confused with 1/5/2024. We don't use dashes with this order.


I've seen dashes. Much less common but I have seen it used and I live in NYC.


People sometimes get used to dashes in dates when / is an illegal character in a filename.


After reading his three part multi-month series about how he can't set up a firewall, I don't think this guy is probably someone who should be providing any useful information on how to use the internet (or anything attached to it).


Now I read it too (well, skimmed).

>> -A INPUT -j DROP

Yep, it's all is needed to know about this guy.


"I imagine that if this article makes its way onto Hacker News, I will be criticized."


I think it’s a healthy thing to read other’s opinions, even if you disagree with it. HN does a good job of keeping the trolls out.

Edit: FWIW I think you’re fighting the good fight and I wish I could implement something similar at work, in where I constantly have bots trying to login to my corporate VPN.


> I think it’s a healthy thing to read other’s opinions, even if you disagree with it. HN does a good job of keeping the trolls out.

I like to think world class nits are picked here but you're right - this is one of the more rational neighborhoods.

> I wish I could implement something similar at work, in where I constantly have bots trying to login to my corporate VPN.

I kludged some bash together to grep the logs and give me IP hit-counts and network owners. I next struggled to make decisions about the IPs - due to 1) the prevalence of residential proxies and 2) learning one of our widgets refreshes 3x/min.


Using http://nginx.org/r/deny is a very inefficient way to block large number of IP/nets. It mentioned right in the documentation:

> In case of a lot of rules, the use of the ngx_http_geo_module module variables is preferable.


> I don't care.

Just do your own thing, learn from it, and repeat. That's all we can really want in our projects, on our limited lease on planet Earth. Kudos that you found something to work on.


There are some real bad actors behind IP blocks, or hosting providers that have no problem hosting them nor take actions on abuse reports. Referrer spamming, searching for vulnerabilities (some of them with very big URL list to try), misbehaving crawlers, or just plain DoS are some of the ways they may sites, specially the ones serving dynamic content. This space is usually fixed and used by servers, or VPNs exitpoits. Blocking all the blocks associated to their autonomous systems would avoid to put in the rules a lot of /24.

But then there are residential IP blocks, specially some with dynamic enough IPs or NATed ISPs. Some people of those blocks may have hostile or clueless behaviour, some may be used as proxy because malware or because they intentionally installed some of the residential proxy servers agents. There you may be blocking legitimate visitors, if a few clients of some ISP are very active you may end blocking a lot of innocent people. And, in this case too, you can target the IP blocks of its autonomous system if you feel that from there you only get bad traffic.

But in the end, is your site. you are free to decide to block what you understand that are bad neighbourhoods.


What is the issue they are trying to solve?

It seems to be a static site. Bots should cause only a neglible amount of traffic per month. My guess would be less than $1.

And aren't there free CDNs for static sites these days? I guess you can just push the whole frontent data (html+assets) into a public git repo, put it behind a github page with custom domain and call it a day?


> What is the issue they are trying to solve?

Apparently, he self-hosts on a Raspberry Pi 3B+. I guess you need to block half the internet if your server has about the same performance as a decade-old smartphone.

It's charming in a way, like someone who daily-drives a classic car.

https://cheapskatesguide.org/articles/self-hosters-nightmare...


> After a loss of power, my routers and web server must be restarted in a specific sequence for my home network to function properly.

There's a certain charm to Rube Goldberg machines as well.


LOL, the first comment of that article.

"To worldofmathew on Hacker News,

While I do sincerely appreciate the fact that you have taken the time to post several cheapskatesguide articles to Hacker News, this may have caused them to designate you as a blog spammer and block your posts. In other words, they may think that you own this site and are posting your own articles. May I suggest that you contact a Hacker News moderator and set him straight before you post another cheapskatesguide article?"


Just because there are things that we can do about symptoms of a problem doesn't change the fact that the problem itself might bother some of us.

For instance, it bothers me that certain networks quite literally do nothing about malicious actors attempting intrusion from those networks. Abuse complaints are ignored. Some people might say, "just run blocklistd", "use a non-standard port", et cetera, but the real issue is that people shouldn't be allowed to attempt intrusion, and when I send an email to a netblock's contacts with 1,000 login attempts, they should remove the accounts linked to that attempted intrusion.

They don't, so I block whole netblocks based on the lack of response and action. Should I also run blocklistd? Sure, and I do, but that's orthogonal to blocking.

So while the person running this site could do things differently and/or better, for certain definitions of better, they're doing what works for them.

Is reporting spam going to make a change in the world? No, but it FEELS better to do it, and it does affect my mailbox. Likewise, is blocking IPs making a change in the world? No, but it probably makes Cheapskate feel more in control of their server.


    when I send an email to a netblock's contacts with 1,000
    login attempts, they should remove the accounts linked
    to that attempted intrusion.
What proof can you give them that those login attempts really took place?


Logs.

If you're seriously suggesting that others shouldn't care because I could be making them up, then I suppose there's always vigilantism. After all, it can't be illegal if other people can't prove they didn't make up their own logs.

Are logs not real, like some people think birds and the Moon landings aren't real?


No, this craziness(!) here suggests that it's not a static site:

> "I added hyphens to the opening and closing PHP statements to prevent my web server from interpreting them as code."

It could be a static site, and one could also automate the job that this person spends their precious time doing every day, copying and pasting text (IP addresses) that matches a pattern from one place to another. But they seem content, so...


If they read this: You can use &lt; instead of < to solve the "parsed as code" issue.

Anyhow, it does not matter how they produce the frontend html+assets. They can still take it all and push it somewhere where it is hosted for cheap or free. "wget -r 127.0.0.1" and a few more lines of code should be all that is needed to autmate it.


The approach is so wrong on so many levels I'm wondering if that's just a rage clickbait.


I block as well with geoblocking and based on source behavior.

As long as you understand the limitations, ramifications, and futility in doing this, I have no problems with it as one of the many tactics to defend your footprint and de-noise your logs, etc.

It's a never ending endeavor and you will see the shifting attack sources from the baddies along with the games on stub blocks, prefix broker IP block swaps, and more.

Just know there is automation and horsepower into that entire attack infrastructure you can't possibly compete with but maybe you can mitigate with the limited time and resources you have and that will be enough to get you through.


Could it be that the slight delay between opening this page and my browser receiving the first bytes is nginx checking these 50 million IPs? How is this delay so small if there are really 50 million deny statements?

Is there a reason why they don't use a firewall?


>How is this delay so small if there are really 50 million deny statements?

More a testament to the years of optimization nginx has undergone.

>Is there a reason why they don't use a firewall?

Some other comment said they have a custom error page for blocked requests although I think you could still have the firewall rewrite the destination port and have a listener that serves the custom access denied page there.


It's a crazy way of staying online, but it does work, therefore it's not crazy.


Someone should tell this guy about bogans so he can block 500 million more IPs.

And if you want to do the same? For the love of god get a firewall and subscribe to some RBLs like a sane person.


I wouldn’t put something on the public internet without geoblocking China, Russia, and the UAE. You should too! Stop their bad behavior by removing them from the internet.


This would harm the governments so bad


Blocking bots is always going to be an uphill battle. But if the owner is worried about wasting meagre resources, why not serving static HTML files instead of running a PHP server for a simple blog?


well he should directly block everything and unblock those one by one. this would be more impressive to me :>


I imagine that if this article makes its way onto Hacker News, I will be criticized.

Maybe they will call me naive or compare me to Don Quixote fighting windmills.

Maybe they will call me stupid or paranoid for not using some centralized block list.

Maybe they will object to how I characterize those who employ web-crawling robots.

Ha. He is so wrong. We're going to criticize his nginx config and his use of PHP.

I mean yeah. We'll probably get after that other stuff too but still.


PHP on hackernews also caught my attention.


I cant read this, im blocked


Indeed, for whatever reason I appear to be blocked as well. Never even heard of this site, and I am not a bot. Just caught in the crossfire I guess.

¯\_(ツ)_/¯


> It means that more than one percent of the IPv4 real estate on the Internet (and probably much more) is occupied by people and organizations who are either clueless or just do not care how much the rest of us are paying to keep our websites on line.

Oh, tell me, how much? The whopping $5/month? Oh, maybe this is a high load WordPress/like CMS running on LAMP stack... so $8/month?

> I wrote the following small PHP script to search though my Nginx configuration file and tally up the number of IP addresses that I am blocking.

Holy shit. Blocking bots through nginx configuration, more so, blocking 56M addresses through nginx configuration...

Okay, for those of you who never did the thing or have no idea:

Just use the firewall (most of the time it is built-in in your OS), use some way to tell the firewall about the 'offenders' (eg fail2ban though there are options) and don't ever block something indefinitely, it's totally meaningless, just use timeouts.

If some Bob got his computer infected in 2015 and that computer tried to access /wp-admin.php then there is absolutely no reason to assume what in 2024 the IP address Bob's computer had in 2015 is still 'malicious'.

Automatic activity like the scans, bruteforcing and whatever is all about opportunity. They are searching for an easy opportunities to exploit and scanning a server what actively blocks you even for 30m at time is just pointless, there is way, way more opportunities in other places than wasting ~4 weeks trying to scan this server.

> I have custom 403 and 404 error pages that explain to those who may care why they are being blocked and how to regain access to the website

https://cheapskatesguide.org/custom404really.html


Maybe the custom "blocked" 403/404 error page only shows up if your IP is blocked?


They could run a web server (or listener/virtual host) on a separate port and use a firewall to rewrite the destination port for "blocked" requests.


Nobody own their IP, banning IPs means banning innocent people tomorrow

You should timeout IP for few days instead




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: