@adri avatar
UTC

Atypical Canadian
2009 Vespa S50(LX150 motor swap), 2006 Vespa GTS250ie
Joined: UTC
Posts: 2319
Location: Toronto, Canada
 
Atypical Canadian
@adri avatar
2009 Vespa S50(LX150 motor swap), 2006 Vespa GTS250ie
Joined: UTC
Posts: 2319
Location: Toronto, Canada
UTC quote
jess wrote:
There's no BBCode processing on attachment titles. Emoticons are not expected to work there.
Guess that's a feature request then? lol
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
adri wrote:
Guess that's a feature request then? lol
Unlikely.
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
jess wrote:
I have some ideas. Let's see which of us is smarter -- the bots or the humans.
I've tried out a few ideas, which are already live on the site. And I can certainly filter out some of the inauthentic traffic. But quite a bit of it remains, and it's really well disguised.

So, day 1: the bots are winning.
@besupa avatar
UTC

Hooked
GTS 300 HPE (2020); V-Strom 650 XT (2019)
Joined: UTC
Posts: 184
Location: SF Bay Area, California
 
Hooked
@besupa avatar
GTS 300 HPE (2020); V-Strom 650 XT (2019)
Joined: UTC
Posts: 184
Location: SF Bay Area, California
UTC quote
jess wrote:
So, day 1: the bots are winning.
Would it be fair to say that it's the load on the database that's the problem, or are you looking for an overall solution to bad actors?

If you don't mind me asking, since you're using CloudFront elsewhere, could that be used to deal with some of this? It looks like the live site is separate from the assets (static), but a strategy for database load might be to join them a little. I think either an Apache-first or CF-first could work. It wouldn't really be fixing the problem directly, but may be piling sandbags more efficiently.
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
besupa wrote:
Would it be fair to say that it's the load on the database that's the problem, or are you looking for an overall solution to bad actors?

If you don't mind me asking, since you're using CloudFront elsewhere, could that be used to deal with some of this? It looks like the live site is separate from the assets (static), but a strategy for database load might be to join them a little. I think either an Apache-first or CF-first could work. It wouldn't really be fixing the problem directly, but may be piling sandbags more efficiently.
Using CloudFront for all of the static assets is an obvious win -- it takes the load off the main server, and moving everything to edge locations makes it faster for everyone to get the data.

CloudFront is not nearly as obvious of a win for dynamic content, though. I know it's possible but... I just would really prefer that the main server is the only source for the actual posts and topics themselves.

I've spent years optimizing the living daylights out of the original forum software, and generally it's acceptable. The main server is only lightly loaded, and the database (which is a separate server) is similarly unchallenged. Both are running on burstable classes of hardware (T4g, specifically), and that's generally been sufficient for most of the use cases.

The main traffic problems in terms of potential overwhelming traffic loads are generally the high-speed scrapers, which we can (mostly) detect and thwart before they do too much damage. We see them show up frequently, but they don't get very far.

So really, I've got no reason for this bot-hunting adventure. Except... well, I'm a bit paranoid. I can see that someone is using residential VPN addresses to slowly and methodically scrape the site, and that worries me. I don't know if they're intending to publish an imposter site, or it is some kind of litigation-hungry compliance service, or...? I just really don't know. But it makes me queasy.

Also, it's a technical challenge, and now that I'm retired, I need the mental exercise of solving these kinds of puzzles.
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
There are three "official" session classes on Modern Vespa: registered users, known crawlers (e.g. google), and anonymous users. In theory, the anonymous class should be mostly humans that just haven't registered accounts. But in practice, I am guessing that more than 75% of the anonymous traffic is inauthentic. But that I mean, not actually human.

This statistic keeps me up at night.
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
In the continuing saga of Operation FdaBots, I've been watching the logs and doing a lot of cross-referencing of IP addresses. I can see a persistent pattern of requests, but they're nearly impossible to prevent. A single request, every 10-20 seconds, from a different residential IP address each time. Never any additional navigation around the site, only the one request. Oh, and always claiming to be referred by a Google search.

Blocking these would be suicide for the site. And yet, I can't help escape the idea that these are inauthentic requests.

I actually put a novel defense in place this afternoon to try to slow down this specific kind of request, and it did... SFA. Either the request is genuinely from a human, or the bots are smart enough to navigate through the additional requirements.

Either way, on day 2, the score is: Bots 2, jess 0.
@adri avatar
UTC

Atypical Canadian
2009 Vespa S50(LX150 motor swap), 2006 Vespa GTS250ie
Joined: UTC
Posts: 2319
Location: Toronto, Canada
 
Atypical Canadian
@adri avatar
2009 Vespa S50(LX150 motor swap), 2006 Vespa GTS250ie
Joined: UTC
Posts: 2319
Location: Toronto, Canada
UTC quote
jess wrote:
Unlikely.
Too seldom used?
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
adri wrote:
Too seldom used?
Seldom used, and also part of an archaic attachment system that is unlikely to survive the year before being replaced with something slimmer.

Also, generally speaking, text that isn't in the message body (i.e. posts) doesn't get BBCode formatting. Topic titles don't get formatting. Your list of scooters doesn't get formatting. The one exception besides posts is signatures.

I also don't even want to think about what happens when someone includes an inline image or a YouTube video in an attachment comment.

And finally, BBCode formatting comes at a processing cost. The code is surprisingly gnarly, and performs a ton of operations. Consider this MV-exclusive feature, where a link to a topic:
https://modernvespa.com/forum/topic187082
Turns into a shorthand BBCode:
[topic187082]
And then is displayed as a fully-formed topic link inline in the post:

[NSR] What's Pissing you off Today? III

All that is done as part of the BBCode processing.

The original forum software re-rendered every single post for every single topic on every single topic page view, which was a nightmare for server load. I've added cacheing of the post-processed HTML, which sacrifices database storage size to achieve fast page loads. Signatures are similarly rendered to HTML only once (or whenever you edit your sig) and kept in the database both as the original string and the post-processed HTML.

The complexity level of all of this extra plumbing is high. I do not especially want to extend that mechanism to another field, especially given how little use that field gets.
@jimc avatar
UTC

Moderaptor
The Hornet (GT200, aka Love Bug) and 'Dimples' - a GTS 300
Joined: UTC
Posts: 44337
Location: Pleasant Hill, CA
 
Moderaptor
@jimc avatar
The Hornet (GT200, aka Love Bug) and 'Dimples' - a GTS 300
Joined: UTC
Posts: 44337
Location: Pleasant Hill, CA
UTC quote
I have a feature request - having entered a search, and received the results, I'd like to post a link to that search, so that others could see the same results.
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
jimc wrote:
I have a feature request - having entered a search, and received the results, I'd like to post a link to that search, so that others could see the same results.
Agreed this would be a useful feature. I'm not entirely sure how hard it would be to implement. I'll have to stew on it.
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
I spent the day watching the logs again. It's about as much fun as watching paint dry, if there were a chance the paint would catch fire suddenly without warning. You'll just be sitting there, zoning out, watching the slow and steady crawl of text on the screen, when all of a sudden BAM BAM! BAM! BAM BAM BAM1!1!! and 100 requests go by in 3 seconds, usually from some shady network operator in SE Asia.

I've made a little bit of progress on Operation FdaBots, but not against the bots, per se. I've managed to identify some of the anonymous traffic that (a) makes a single request at a time at odd intervals, and (b) doesn't read like a human. It turns out that Google has a fleet of "prefetch proxy" servers whose sole job it is to fetch the content of search result links that might be clicked by a user.

They do this from a proxy server in order to preserve the privacy of whoever is searching. No cookies are sent, and in fact they won't do the prefetch if the user has any cookies to the site in question. So these particular requests are still very much anonymous, and not entirely human, but probably genuine nonetheless.

I managed to track down the header values that Google adds to these prefetch requests that ostensibly inform the site that it's a prefetch request, though there is very little that I can do in real time to validate that it is in fact Google -- mostly, I just have to take their word for it, and it's probably spoof-able.

In any case, once I subtract out these prefetch proxy requests, I'm left with slightly less requests that seem suspicious -- but only slightly.

So, let's see... Day 3: Bots 4, jess 1.
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
One of the obvious ways to thwart some of the mysteriously non-human anonymous requests would be to check the incoming IP address against a list of known-problem-proxy IPs -- in particular, IP addresses that are acting as a VPN egress point but are in a residential setting. I honestly don't care if people use VPNs, but the legitimate ones generally egress from a data center. VPNs that egress from a residential IP seem (to me) quite a bit shadier.

The problem with checking IP addresses in real time, though, is that it adds a lot of latency to every anonymous request. Or, at least, the first anonymous request we see from that IP address. Attempting to do reverse-DNS (to find the hostname) or using an IP proxy identification service would add tens or hundreds of milliseconds to the first request. For reference, I am pretty alarmed when any request on the site takes more than about 50ms to fulfill, and I'd prefer something below 30ms.

As an experiment, though, I am considering doing a post-processing step on these mysterious one-off anonymous requests to gather some information and see what percentage of them are egressing from residential VPN IP addresses. That might at least give me some sense of how big of a problem I actually have.
@adri avatar
UTC

Atypical Canadian
2009 Vespa S50(LX150 motor swap), 2006 Vespa GTS250ie
Joined: UTC
Posts: 2319
Location: Toronto, Canada
 
Atypical Canadian
@adri avatar
2009 Vespa S50(LX150 motor swap), 2006 Vespa GTS250ie
Joined: UTC
Posts: 2319
Location: Toronto, Canada
UTC quote
jess wrote:
Seldom used, and also part of an archaic attachment system that is unlikely to survive the year before being replaced with something slimmer.

Also, generally speaking, text that isn't in the message body (i.e. posts) doesn't get BBCode formatting. Topic titles don't get formatting. Your list of scooters doesn't get formatting. The one exception besides posts is signatures.

I also don't even want to think about what happens when someone includes an inline image or a YouTube video in an attachment comment.

And finally, BBCode formatting comes at a processing cost. The code is surprisingly gnarly, and performs a ton of operations. Consider this MV-exclusive feature, where a link to a topic:
https://modernvespa.com/forum/topic187082
Turns into a shorthand BBCode:
[topic187082]
And then is displayed as a fully-formed topic link inline in the post:

[NSR] What's Pissing you off Today? III

All that is done as part of the BBCode processing.

The original forum software re-rendered every single post for every single topic on every single topic page view, which was a nightmare for server load. I've added cacheing of the post-processed HTML, which sacrifices database storage size to achieve fast page loads. Signatures are similarly rendered to HTML only once (or whenever you edit your sig) and kept in the database both as the original string and the post-processed HTML.

The complexity level of all of this extra plumbing is high. I do not especially want to extend that mechanism to another field, especially given how little use that field gets.
Damn. Forget I even mentioned it
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
The latest threat actor that seems intent on attacking Modern Vespa is originating from Alibaba's data center, in the following CIDR ranges:

47.76.0.0/17
47.76.128.0/17
47.102.0.0/15
47.116.128.0/17

They are using a large number of IP addresses to avoid detection, but the requests are all grouped together within the same 1-2 second span, and all in similar IP ranges, so it's kind of obvious.

They are attempting to access (and being rejected from) a bunch of resources that are explicitly forbidden by robots.txt, including the login page, the PM page, the posting page, and so on. Clearly whatever entity is using Alibaba's hosting is not a legitimate crawler -- not even close.

Fuck those guys.
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
Here's a snippet from the logs, showing the Alibaba datacenter request. The action they are attempting (the "posting.php" part) is definitely NOT allowed by unregistered users, and the fact that they tried the same sequence twice in a row tells me that they know fuck-all about what they are doing.
Forum member supplied image with no explanatory text
@pigletpilot avatar
UTC

Molto Verboso
Gina, 1965 Vespa 180SS, Bella,1968 Vespa 150 Super, Mia, 2017 Vespa Primavera 70th Anniversary 150ie, Gabriella, 2017 GTS300 ABS
Joined: UTC
Posts: 1931
Location: Hamilton/Kirikiriroa, NZ
 
Molto Verboso
@pigletpilot avatar
Gina, 1965 Vespa 180SS, Bella,1968 Vespa 150 Super, Mia, 2017 Vespa Primavera 70th Anniversary 150ie, Gabriella, 2017 GTS300 ABS
Joined: UTC
Posts: 1931
Location: Hamilton/Kirikiriroa, NZ
UTC quote
For the first time ever that I have seen, we are showing as having 0 guests on the forum. Is this a problem or an odd coincidence?
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
pigletpilot wrote:
For the first time ever that I have seen, we are showing as having 0 guests on the forum. Is this a problem or an odd coincidence?
It's a bug. Working on it.
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
pigletpilot wrote:
For the first time ever that I have seen, we are showing as having 0 guests on the forum. Is this a problem or an odd coincidence?
Should be fixed now. Thanks for the tip!

And for finding (and most importantly reporting) this forum bug, I hereby bestow upon you the MV Entomologist Award. Wear it with pride.
@berto avatar
UTC

Molto Verboso
2006 LX150 (carbed) | 2007 GT200
Joined: UTC
Posts: 1954
Location: Toronto
 
Molto Verboso
@berto avatar
2006 LX150 (carbed) | 2007 GT200
Joined: UTC
Posts: 1954
Location: Toronto
UTC quote
jess wrote:
Here's a snippet from the logs, showing the Alibaba datacenter request. The action they are attempting (the "posting.php" part) is definitely NOT allowed by unregistered users, and the fact that they tried the same sequence twice in a row tells me that they know fuck-all about what they are doing.
Are these the "Blocked Bots" showing up in the page statistics section?

Looking there, as a couple of suggestions: add number formatting to make the large user counts more readable? Update timeseries heading from "recent activity" to "activity over past X hours"?
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
berto wrote:
Are these the "Blocked Bots" showing up in the page statistics section?
Yes.
berto wrote:
Looking there, as a couple of suggestions: add number formatting to make the large user counts more readable? Update timeseries heading from "recent activity" to "activity over past X hours"?
Done!
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
In the never-ending quest to understand which "guests" are human and which are bots, I've started to take a closer look at origins of the guests that show up and ask for only one (or sometimes two) pages, who don't read like humans, and who then never make another request.

I get these requests from all over the world, and the usual suspects are represented: China, Russia, and India. And some in the US as well.

What's surprising is how many of these are coming from the UK. What's more surprising is that most of these random one-off guests from the UK appear to be originating from residential (i.e. not data center or VPN) hosts -- Virgin Media, BTCentral, and Sky Broadband seem to reappear quite a bit, each time at a fresh IP address.

Hmmmm.
@shebalba avatar
UTC

Molto Verboso
2009 GTS250, Ducati Monster M900, KTM 390 Adventure, Honda CR125
Joined: UTC
Posts: 1695
Location: Oceanside, CA
 
Molto Verboso
@shebalba avatar
2009 GTS250, Ducati Monster M900, KTM 390 Adventure, Honda CR125
Joined: UTC
Posts: 1695
Location: Oceanside, CA
UTC quote
jess wrote:
What's surprising is how many of these are coming from the UK. What's more surprising is that most of these random one-off guests from the UK appear to be originating from residential (i.e. not data center or VPN) hosts --
Hmmmm.
Tisk tisk.. Bill Dog... Is there something you want to tell us?
@jimc avatar
UTC

Moderaptor
The Hornet (GT200, aka Love Bug) and 'Dimples' - a GTS 300
Joined: UTC
Posts: 44337
Location: Pleasant Hill, CA
 
Moderaptor
@jimc avatar
The Hornet (GT200, aka Love Bug) and 'Dimples' - a GTS 300
Joined: UTC
Posts: 44337
Location: Pleasant Hill, CA
UTC quote
jess wrote:
What's surprising is how many of these are coming from the UK. What's more surprising is that most of these random one-off guests from the UK appear to be originating from residential (i.e. not data center or VPN) hosts -- Virgin Media, BTCentral, and Sky Broadband seem to reappear quite a bit, each time at a fresh IP address.

Hmmmm.
VM, BT, and Sky have the vast majority of broadband customers in the UK, over 75% between them, so that's not surprising. I'll add, the more clueful customers won't touch any of those with a bargepole.
Forum member supplied image with no explanatory text
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
jimc wrote:
VM, BT, and Sky have the vast majority of broadband customers in the UK, over 75% between them, so that's not surprising. I'll add, the more clueful customers won't touch any of those with a bargepole.
It's not the specific providers that are surprising, though -- it's that there are so many hits that don't register as human coming from those providers.
@jimc avatar
UTC

Moderaptor
The Hornet (GT200, aka Love Bug) and 'Dimples' - a GTS 300
Joined: UTC
Posts: 44337
Location: Pleasant Hill, CA
 
Moderaptor
@jimc avatar
The Hornet (GT200, aka Love Bug) and 'Dimples' - a GTS 300
Joined: UTC
Posts: 44337
Location: Pleasant Hill, CA
UTC quote
jess wrote:
It's not the specific providers that are surprising, though -- it's that there are so many hits that don't register as human coming from those providers.
It could be that they (along with Talktalk and Vodafone) have nearly all of the more clueless customers, whose systems are oh-so-easily compromised (with default passwords etc etc) so are the favourites for more nefarious entities to hitch a ride on.

The remaining ISPs are more niche, and tend to have the more technically minded customers, who may run their own servers etc.
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
Putting aside the UK queries for a moment -- and to be clear, they aren't overwhelming, they're just a drip-drip-drip in the background -- today's real attack vector appears to be originating out of Singapore.

Singapore shows up as a source of nuisance crawling from time to time. Today, I am getting overwhelmed by requests. Enough that I've had to put an emergency stopgap into place to keep the server from melting down.

Guest clients arriving from Singapore (and a handful of other countries in the region) who immediately ask for a topic page (which is the usual target) will be asked to verify they are human in a very, very rudimentary (i.e. not very secure) fashion.

Once they declare themselves human, we leave the session alone and let them browse in peace.

So far, it seems to be keeping them at bay -- I'm still getting tons of requests, but they never make it past the "are you a human" page.

Fingers crossed.
Forum member supplied image with no explanatory text
@berto avatar
UTC

Molto Verboso
2006 LX150 (carbed) | 2007 GT200
Joined: UTC
Posts: 1954
Location: Toronto
 
Molto Verboso
@berto avatar
2006 LX150 (carbed) | 2007 GT200
Joined: UTC
Posts: 1954
Location: Toronto
UTC quote
jess wrote:
Yes.
Am I a blocked bot?!

I was catching up on Favorite threads, opening several (say 10) to new tabs in quick succession. After the first few, they all returned 403 Forbidden. Seems I got locked out?

I'm not from Singapore, if that matters
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
berto wrote:
Am I a blocked bot?!

I was catching up on Favorite threads, opening several (say 10) to new tabs in quick succession. After the first few, they all returned 403 Forbidden. Seems I got locked out?

I'm not from Singapore, if that matters
Heh. I was watching the logs when it happened. I've already un-blocked that browser/ip combo.

You definitely triggered the overlimit protection, which is there for both anonymous clients and registered users, from any country. The exact thresholds are looser for registered users, but they still exist.

The exact algorithm I am using is new, but if anything, it should be a bit more permissive than the old algorithm, as it is now counting unique requests from a client -- so reloading the same page repeatedly won't trigger the over limit ban.

If it does happen again, it will self-correct after some amount of time -- 10 minutes to an hour, depending on... reasons.
@berto avatar
UTC

Molto Verboso
2006 LX150 (carbed) | 2007 GT200
Joined: UTC
Posts: 1954
Location: Toronto
 
Molto Verboso
@berto avatar
2006 LX150 (carbed) | 2007 GT200
Joined: UTC
Posts: 1954
Location: Toronto
UTC quote
jess wrote:
Heh. I was watching the logs when it happened. I've already un-blocked that browser/ip combo.

You definitely triggered the overlimit protection, which is there for both anonymous clients and registered users, from any country. The exact thresholds are looser for registered users, but they still exist.

The exact algorithm I am using is new, but if anything, it should be a bit more permissive than the old algorithm, as it is now counting unique requests from a client -- so reloading the same page repeatedly won't trigger the over limit ban.

If it does happen again, it will self-correct after some amount of time -- 10 minutes to an hour, depending on... reasons.
Thanks. Confirmed it's working again.

I typically use my mobile, where I assume this would never be a problem (fingers are slow!).

But if I'm on my PC, it does happen that I'll open by Favorites synthetic forum and mouse-click all the ones I haven't read into new tabs. Then go and read through them one at a time. That's how I got to maybe 10 tabs in a few seconds. I've definitely never had the old algorithm lock me out before. (Not a complaint, just an observation.)
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
berto wrote:
But if I'm on my PC, it does happen that I'll open by Favorites synthetic forum and mouse-click all the ones I haven't read into new tabs. Then go and read through them one at a time. That's how I got to maybe 10 tabs in a few seconds. I've definitely never had the old algorithm lock me out before. (Not a complaint, just an observation.)
There are some subtle differences between the algorithms that could account for the different behavior.

The old algorithm was a fixed-window algorithm, meaning it would count requests in a fixed time period -- in this case the window was 10 seconds. But if you are straddling the window -- opening LIMIT/2 in the last few seconds of one window and LIMIT/2+1 more in the next window, you'd get away with it -- as long as you didn't open many more in the remaining seconds of the second window.

The new algorithm uses a sliding window, which means that it is counting how many requests are made in any 10 second period. This means if you make LIMIT+1 requests, you trigger the overlimit protection regardless.

In your case, you were clicking on "jump to first unread post" links. From your perspective, each click is one request. From the server's perspective, though, each of those is two requests, since the first request actually looks up the current status of the thread and then forwards you to the appropriate place in a second operation.

Which is all to say that I should probably revisit the specific thresholds I have set. I really don't want to have registered users triggering the overlimit protection unless it's egregious.
@safis avatar
UTC

Ossessionato
1979 P150X, 1983 P200E, 1987 PK125XL Elestart, 1988 T5, 1995 PX200E, 2011 Yamaha Fazer 600 S2
Joined: UTC
Posts: 4495
Location: Veria, Greece
 
Ossessionato
@safis avatar
1979 P150X, 1983 P200E, 1987 PK125XL Elestart, 1988 T5, 1995 PX200E, 2011 Yamaha Fazer 600 S2
Joined: UTC
Posts: 4495
Location: Veria, Greece
UTC quote
I think it was working before, but touching now on a pic will no longer get it in full screen on my iPad & iPhone (using Safari)…
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
SaFiS wrote:
I think it was working before, but touching now on a pic will no longer get it in full screen on my iPad & iPhone (using Safari)…
Someone threw a fit back on page 14, and now that feature is disabled on mobile. It still works on desktop systems, unless said desktop system is touch screen, in which case it doesn't.

I don't have a good compromise here.
@safis avatar
UTC

Ossessionato
1979 P150X, 1983 P200E, 1987 PK125XL Elestart, 1988 T5, 1995 PX200E, 2011 Yamaha Fazer 600 S2
Joined: UTC
Posts: 4495
Location: Veria, Greece
 
Ossessionato
@safis avatar
1979 P150X, 1983 P200E, 1987 PK125XL Elestart, 1988 T5, 1995 PX200E, 2011 Yamaha Fazer 600 S2
Joined: UTC
Posts: 4495
Location: Veria, Greece
UTC quote
jess wrote:
Someone threw a fit back on page 14, and now that feature is disabled on mobile. It still works on desktop systems, unless said desktop system is touch screen, in which case it doesn't.

I don't have a good compromise here.
Oh, I missed that. It was fine though…
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
SaFiS wrote:
Oh, I missed that. It was fine though…
Yeah, I tend to agree.
@safis avatar
UTC

Ossessionato
1979 P150X, 1983 P200E, 1987 PK125XL Elestart, 1988 T5, 1995 PX200E, 2011 Yamaha Fazer 600 S2
Joined: UTC
Posts: 4495
Location: Veria, Greece
 
Ossessionato
@safis avatar
1979 P150X, 1983 P200E, 1987 PK125XL Elestart, 1988 T5, 1995 PX200E, 2011 Yamaha Fazer 600 S2
Joined: UTC
Posts: 4495
Location: Veria, Greece
UTC quote
jess wrote:
Yeah, I tend to agree.
Could it maybe be done as OS specific instead of touch - no touch??
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
SaFiS wrote:
Could it maybe be done as OS specific instead of touch - no touch??
I don't think the OS is really the problem. As I understand the complaint, tap-to-enlarge makes it impossible to pinch-zoom into the image any further than full screen, and some would like to zoom farther than that.

I concede that pinch-zoom, where available, does give more flexibility than tap-to-enlarge, which is why I've made tap-to-enlarge contingent on the lack of touch.
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
besupa wrote:
If you don't mind me asking, since you're using CloudFront elsewhere, could that be used to deal with some of this? It looks like the live site is separate from the assets (static), but a strategy for database load might be to join them a little. I think either an Apache-first or CF-first could work. It wouldn't really be fixing the problem directly, but may be piling sandbags more efficiently.
I'm rethinking my stance on CloudFront as a front door for the main site. It solves several problems, and of course the cacheing can be turned off for dynamic content. I had been considering using an Application Load Balancer to solve some of those same issues, but with only one instance behind the load balancer, it really doesn't warrant the added cost. As you point out, I'm already using CloudFront, and the costs have been quite reasonable.

I've got a test instance running behind CF right now (which I'm using to post this message) and it is working better than I expected it to.

Thanks for mentioning it, I had dismissed it and you got me to reconsider it.
@besupa avatar
UTC

Hooked
GTS 300 HPE (2020); V-Strom 650 XT (2019)
Joined: UTC
Posts: 184
Location: SF Bay Area, California
 
Hooked
@besupa avatar
GTS 300 HPE (2020); V-Strom 650 XT (2019)
Joined: UTC
Posts: 184
Location: SF Bay Area, California
UTC quote
jess wrote:
Thanks for mentioning it, I had dismissed it and you got me to reconsider it.
I'm glad my armchair musings were some inspiration in your ongoing bot saga
Happy hunting out there--it looks like an uptick in "automated traffic" recently and the scramble to get snapshots of all the web for various training is probably just going to make it worse.
OP
@jess avatar
UTC

Petty Tyrant
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 And counting
Joined: UTC
Posts: 37896
Location: Bay Area, California
UTC quote
The bots are annoying me. It's not just that they are constantly trying to scrape the site, though -- it's that they are so incredibly bad at it that I am offended by their incompetence at what is actually a very straightforward task.

I've taken to taunting them in the error messages out of sheer boredom and disgust. I know that there is approximately a 0% chance of a human ever seeing this message. Still, it vents some of my anger at these idiots just to type the message.
Forum member supplied image with no explanatory text
DoubleGood Design banner

Modern Vespa is the premier site for modern Vespa and Piaggio scooters. Vespa GTS300, GTS250, GTV, GT200, LX150, LXS, ET4, ET2, MP3, Fuoco, Elettrica and more.

Buy Me A Coffee
 

Shop on Amazon with Modern Vespa

Modern Vespa is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com


All Content Copyright 2005-2024 by Modern Vespa.
All Rights Reserved.


[ Time: 0.0863s ][ Queries: 6 (0.0602s) ][ live ][ 318 ][ ThingOne ]