OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
Over the past weeks, I've been building out a fairly extensive reputation system that allows me some advance insight into which clients are bots and which are actually human. The catch is that bots have gotten clever over years, and the current strategy is to make one single request (typically for a topic page) from one single IP address, and then never use that IP address again. Or, at least, not very often.

This seems like an unsustainable approach for the bots. There are, after all, a finite number of IP addresses available. But I've amassed a list of 87,000 IP addresses that have done a single page request and then (importantly) not executed the javascript embedded in that page -- which strongly suggests that those 87,000 different IP addresses were bots.

Unfortunately, I don't know that until after I've served them a page.

Fortunately, I can track all the pertinent details about their IP address and the client software they are using to help make an informed analysis about the next request. Typically, that would mean screening the statistically suspicious requests based on past behavior, making the visitor answer a "Are you human?" question that (surprisingly) thwarts most bots.

I don't want to make every new visitor go through that process, but if some humans see it that's probably okay. I'm currently getting it right about 80% of the time, with a roughly even split of the remainder being either over-screening humans or under-screening bots.

Anyway, this is just a very long introduction to the point I was really trying to make.

One of the details I have available about each IP address is its country of origin. This isn't too useful, except for a handful of notoriously bot-friendly countries -- China and Hong Kong being the most extreme examples. These are mostly unsophisticated bots, however -- they don't seem to do the one-request-one-ip-address approach.

But I graphed the data anyway, even though I'm not using it to make screening decisions. The results were... curious, to say the least.

The graph below is ordered by overall number of sessions, so the US is where the majority of our traffic comes from, with UK in second place. The validated:unvalidated column is the interesting one -- this roughly translates to humans vs bots, and is the best gauge I have available to make screening decisions when applied to groups of IP addresses or client software signatures. Here, though, it seems to show that a disproportionate number of these bots are using IP addresses in commonwealth countries plus Ireland.

(Edited to add: these are, for the most part, residential IP addresses, used by consumers on consumer networks. These are not data centers, which I screen out wholesale in a separate process.)


Which is weird. And not at all what I expected. I would have expected the human:bot split to be roughly similar to the US and UK.

I don't really have a very good explanation for why the split is so lopsided for these specific countries.
Forum member supplied image with no explanatory text
⚠️ Last edited by jess on UTC; edited 1 time
@25bikez avatar
UTC

Molto Verboso
2009 Genuine Stella 2T (Sold). Helix Hunting.
Joined: UTC
Posts: 1294
Location: Texas
 
Molto Verboso
@25bikez avatar
2009 Genuine Stella 2T (Sold). Helix Hunting.
Joined: UTC
Posts: 1294
Location: Texas
UTC quote
Forgive my ignorance, but is it possible the commonwealth requests are not originating there, but merely being routed through them, because of some combination of lax enforcement and/or financial or tax incentives to host data centers?

In any case, interesting demographic dive! Thanks for posting.
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
25BIKEZ wrote:
Forgive my ignorance, but is it possible the commonwealth requests are not originating there, but merely being routed through them, because of some combination of lax enforcement and/or financial or tax incentives to host data centers?
I've already filtered out most of the large data centers (they are just outright banned), so these statistics largely show the IP addresses that are originating on consumer networks. You can see that in this other graph I've attached below, which is mostly comprised of legitimate consumer networks.

This means that these requests are being routed through IP addresses belonging to ordinary people. Yes, possibly or even likely from other countries, but that's not the curious part (to me). The curious part is why are these countries disproportionately represented, and what is the mechanism being used to route the requests?

My best guess is that this is a "free VPN" situation -- when you install free VPN software on your device, I am pretty sure your network connection is then used by other people in other countries, possibly for nefarious purposes.
Forum member supplied image with no explanatory text
UTC

Addicted
MP3 500 HPE 2019
Joined: UTC
Posts: 630
 
Addicted
MP3 500 HPE 2019
Joined: UTC
Posts: 630
UTC quote
Think you will find that most of the unvalidated ones, originate from 4 countries using VPN,s etc so pretty hard to track all that jess, lot of work to stop them.

China
North Korea
Russia
Iran
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
flybynight wrote:
Think you will find that most of the unvalidated ones, originate from 4 countries using VPN,s etc so pretty hard to track all that jess, lot of work to stop them.

China
North Korea
Russia
Iran
I have no doubt that the requests are originating elsewhere (though I doubt North Korea or Iran care one bit about our content). The curious thing for me is why are the commonwealth countries + Ireland more likely to be used as exit nodes? And what are the financial / social / cultural aspects that drive this?

These IP addresses, after all, appear to be owned by ordinary people. What are the factors that led to these ordinary people in these seemingly upstanding countries allowing their internet connections to be used for shady purposes?

This is the part that's surprising to me. The commonwealth countries + Ireland are decidedly not where I would expect to find people installing shady free VPN services (if that is indeed the routing mechanism).
@rrider avatar
UTC

Ossessionato
Triumph Bonneville 2022, Triumph Street Scrambler 2018 (sold), Suzuki VanVan200 (sold), 2015 Sprint 125 (sold)
Joined: UTC
Posts: 3308
Location: Finland
 
Ossessionato
@rrider avatar
Triumph Bonneville 2022, Triumph Street Scrambler 2018 (sold), Suzuki VanVan200 (sold), 2015 Sprint 125 (sold)
Joined: UTC
Posts: 3308
Location: Finland
UTC quote
Interesting statistics. Ireland made me first think about data centers too, but as these are private addresses...well, interesting.

What I've noticed is that many European ICT / software service providers have 'expert centers' in Ireland - nothing to do with the data center business, just normal consulting/project services, cyber security etc....but this does not explaing the potential use of wild private VPNs.
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
As an aside, I've built a scoring system into the reputation system as well, so I can gauge how well the system is working. This shows each new session and whether or not the system chose to screen that client. After the fact, we can determine if the screening was warranted or not.

"Over" means we screened a human (who passed the screening test). "Under" means we didn't screen a request, and that request turned out to (likely) be a bot.
Forum member supplied image with no explanatory text
@jbacklund avatar
UTC

Ossessionato
Sadly,the Vespa is gone.Triumph Rocket 3R/2019 Triumph Speedmaster/2013 BMW R1200R/1998 BMW K1200RS
Joined: UTC
Posts: 2592
Location: Black Hills South Dakota USA
 
Ossessionato
@jbacklund avatar
Sadly,the Vespa is gone.Triumph Rocket 3R/2019 Triumph Speedmaster/2013 BMW R1200R/1998 BMW K1200RS
Joined: UTC
Posts: 2592
Location: Black Hills South Dakota USA
UTC quote
jess wrote:
I have no doubt that the requests are originating
This is the part that's surprising to me. The commonwealth countries + Ireland are decidedly not where I would expect to find people installing shady free VPN services (if that is indeed the routing mechanism).
Leprechauns.
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
JBacklund wrote:
Leprechauns.
I mean, obviously.
@znomit avatar
UTC

Hobbitus Moderatorus
S50, R1100s, way too many pushbikes
Joined: UTC
Posts: 11591
Location: Hermit Kingdom
 
Hobbitus Moderatorus
@znomit avatar
S50, R1100s, way too many pushbikes
Joined: UTC
Posts: 11591
Location: Hermit Kingdom
UTC quote
Not shady VPN, just for watching The Goodies on BBC.
@jimc avatar
UTC

Moderaptor
The Hornet (GT200, aka Love Bug) and 'Dimples' - a GTS 300
Joined: UTC
Posts: 46046
Location: Pleasant Hill, CA
 
Moderaptor
@jimc avatar
The Hornet (GT200, aka Love Bug) and 'Dimples' - a GTS 300
Joined: UTC
Posts: 46046
Location: Pleasant Hill, CA
UTC quote
znomit wrote:
Not shady VPN, just for watching The Goodies on BBC.
This. I use a (not-free) VPN for accessing bbc.co.uk (NOT bbc.com), streaming from iPlayer and other UK suppliers, and streaming from (mumble mumble) other places via (mumble mumble) certain apps. As I still pay the BBC licence fee I'm damn well going to get some of my money's worth.

I suspect loads of other ex-pats and commonwealth residents do the same.
⚠️ Last edited by jimc on UTC; edited 1 time
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
jimc wrote:
This. I use a (not-free) VPN for accessing bbc.co.uk (NOT bbc.com), streaming from iPlayer and other UK suppliers, and streaming from (mumble mumble) other places via (mumble mumble) certain apps.

I suspect loads of other ex-pats and commonwealth residents do the same.
It's certainly possible that paid VPNs are selling access to their user's internet connections, but honestly that would surprise me a bit.

I'm virtually certain that free VPN providers do so, however.
@jimc avatar
UTC

Moderaptor
The Hornet (GT200, aka Love Bug) and 'Dimples' - a GTS 300
Joined: UTC
Posts: 46046
Location: Pleasant Hill, CA
 
Moderaptor
@jimc avatar
The Hornet (GT200, aka Love Bug) and 'Dimples' - a GTS 300
Joined: UTC
Posts: 46046
Location: Pleasant Hill, CA
UTC quote
jess wrote:
It's certainly possible that paid VPNs are selling access to their user's internet connections, but honestly that would surprise me a bit.

I'm virtually certain that free VPN providers do so, however.
The Scottish and Yorkshire ex-pats would probably use the free ones.
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
jimc wrote:
The Scottish and Yorkshire ex-pats would probably use the free ones.
Ah.
@az_slynch avatar
UTC

Molto Verboso
'07 GTS250, '07 LX150, '81 P200E, '78 P200E, '74 VBC1, '64 V90 and 3 Ciaos
Joined: UTC
Posts: 1995
Location: Tucson, AZ
 
Molto Verboso
@az_slynch avatar
'07 GTS250, '07 LX150, '81 P200E, '78 P200E, '74 VBC1, '64 V90 and 3 Ciaos
Joined: UTC
Posts: 1995
Location: Tucson, AZ
UTC quote
jimc wrote:
The Scottish and Yorkshire ex-pats would probably use the free ones.
Lot of skinflints they are. ROFL emoticon
UTC

Addicted
MP3 500 HPE 2019
Joined: UTC
Posts: 630
 
Addicted
MP3 500 HPE 2019
Joined: UTC
Posts: 630
UTC quote
Think you will find that Ireland of today has many immigrants, most workers in the hotels are east european or Brazil etc, then you have many other sectors, and most of these people probably use windows copies and computers infected to death with viruses and spyware due to them being of poor knowledge etc in cybersecurity, as Ireland is very liberal and does not really care, hence all coming from private names etc, and to balance the act many Irish are poor and in the same situation.
jess wrote:
I have no doubt that the requests are originating elsewhere (though I doubt North Korea or Iran care one bit about our content). The curious thing for me is why are the commonwealth countries + Ireland more likely to be used as exit nodes? And what are the financial / social / cultural aspects that drive this?

These IP addresses, after all, appear to be owned by ordinary people. What are the factors that led to these ordinary people in these seemingly upstanding countries allowing their internet connections to be used for shady purposes?

This is the part that's surprising to me. The commonwealth countries + Ireland are decidedly not where I would expect to find people installing shady free VPN services (if that is indeed the routing mechanism).
@crazycarl avatar
UTC

Ossessionato
2007 250 GTS, 1980 P200E, 2010 ThunderFly 190 (SOLD) 2015 Yamaha SMax (SOLD)
Joined: UTC
Posts: 3743
Location: Springboro, OH
 
Ossessionato
@crazycarl avatar
2007 250 GTS, 1980 P200E, 2010 ThunderFly 190 (SOLD) 2015 Yamaha SMax (SOLD)
Joined: UTC
Posts: 3743
Location: Springboro, OH
UTC quote
Most of this is discussion is above my understanding. But I'm glad that Jess is taking steps to keep this place safe from bad actors.

I can tell that it's a lot of effort though and hopeful that the level of success continues and improves.
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
flybynight wrote:
Think you will find that Ireland of today has many immigrants, most workers in the hotels are east european or Brazil etc, then you have many other sectors, and most of these people probably use windows copies and computers infected to death with viruses and spyware due to them being of poor knowledge etc in cybersecurity, as Ireland is very liberal and does not really care, hence all coming from private names etc, and to balance the act many Irish are poor and in the same situation.
While that may or may not be the driving factor, can't you say the same about much of Eastern Europe? Why do I not see the same level of unvalidated activity out of (say) Bulgaria or Romania?

Here's a snippet of the current stats that shows a group of countries that all contribute about the same level of traffic as each other, showing remarkably good validation rates. Surely the factors that you're imagining are happening in Ireland would be very prevalent in most of these countries too?
Forum member supplied image with no explanatory text
UTC

Addicted
MP3 500 HPE 2019
Joined: UTC
Posts: 630
 
Addicted
MP3 500 HPE 2019
Joined: UTC
Posts: 630
UTC quote
Not really simply because a lot of these countries have their families working in Ireland and need a computer to connect to home maybe, after that I can only imagine maybe the ISP,s themselves are hacked
jess wrote:
While that may or may not be the driving factor, can't you say the same about much of Eastern Europe? Why do I not see the same level of unvalidated activity out of (say) Bulgaria or Romania?

Here's a snippet of the current stats that shows a group of countries that all contribute about the same level of traffic as each other, showing remarkably good validation rates. Surely the factors that you're imagining are happening in Ireland would be very prevalent in most of these countries too?
@centersmith avatar
UTC

Enthusiast
Lola: a 2010 Vespa GTS 300 (formerly belonged to Ivana Tinkle)
Joined: UTC
Posts: 57
Location: San Rafael, CA
 
Enthusiast
@centersmith avatar
Lola: a 2010 Vespa GTS 300 (formerly belonged to Ivana Tinkle)
Joined: UTC
Posts: 57
Location: San Rafael, CA
UTC quote
Is it common for everyone to have to answer the "human" question on nearly every visit to the mv landing page? It's been my experience in recent weeks. I don't mind the extra click, but I wonder if I'm skewing your percentages? Also maybe my relatively new sonic.net isp (since early March, '25) sends confusing IP address data… or even though I don't use a VPN, maybe my IP address has been compromised?
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
centersmith wrote:
Is it common for everyone to have to answer the "human" question on nearly every visit to the mv landing page? It's been my experience in recent weeks. I don't mind the extra click, but I wonder if I'm skewing your percentages? Also maybe my relatively new sonic.net isp (since early March, '25) sends confusing IP address data… or even though I don't use a VPN, maybe my IP address has been compromised?
Roughly 5-12% of anonymous humans (the exact percentage fluctuates over the course of 24 hours) will be screened -- in other words, asked to click the "human" button. Similarly, about 5-12% of bots get away with their requests, without being challenged. So, roughly 80% of the time, the forum makes the right call as to who to screen and who not to screen.

In your case, the ASN your ISP uses (AS44092) has a positive score (i.e. good reputation), so you're not being screened because of that. It's possible that the browser signature you're using (technically called JA4, but that's a fairly arcane subject) is being used by a lot of bots, or being used by bots at your specific ISP. Either one of those would trigger a request to confirm you are human.

Unfortunately (or maybe fortunately) I can't correlate registered users to their JA4 signature -- I just don't collect that data on a per-user basis -- so I can't tell you if yours is on the suspect list or not.

All that said, these measures only apply to anonymous (i.e. not logged in) users. If you remain logged in, you shouldn't see the human request.

There's no particular advantage to logging out of MV -- unless you're using a shared device and don't want someone else posting cat memes under your name
@hilton avatar
UTC

Enthusiast
ET4-150 ( Called ET8 in HKG )
Joined: UTC
Posts: 62
Location: Lisbon
 
Enthusiast
@hilton avatar
ET4-150 ( Called ET8 in HKG )
Joined: UTC
Posts: 62
Location: Lisbon
UTC quote
Hi Jess. Not sure if this is the correct thread ........
Mac user from the late 80's
Just began using a rotating private address.
A setting on WiFi that I have never used before.
Join using my home WiFi. NOS Router.
I think this is OK and understand the idea.. just.

Noted about logging out, but old habits die hard.
Don't mind identifying as a human each time I read MV.

Interested to know if this will skew your results.

Obrigado
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
Hilton wrote:
Hi Jess. Not sure if this is the correct thread ........
Mac user from the late 80's
Just began using a rotating private address.
I honestly don't know much about that feature or what effect it will have on your experience here. Generally, the forum depends on a stable IP address, but only on the scale of a single session. In other words, if your IP address changes during a session (i.e. a single visit to the site, over the course of an hour or so) then you will get a new session. If you're not logged in when the IP changes, you won't notice any difference, unless you get flagged for being potentially suspicious, in which case you'll be asked if you are human again.

If you're logged in (and browsing) when the IP changes, you will remain logged in, but using a new session. This might cause a hiccup if you were in the middle of something that depends on keeping track of state from one request to the next -- doing a search might be impacted, for instance. If you were in the middle of a post, going back and forth between editing and preview, or uploading photos, those might glitch out.

Personally, knowing a bit how the larger web works, I don't know if rotating IP addresses would be something I would use, unless the IP rotation was well controlled and infrequent.

Do you have any information on how the IP rotation works?
@hilton avatar
UTC

Enthusiast
ET4-150 ( Called ET8 in HKG )
Joined: UTC
Posts: 62
Location: Lisbon
 
Enthusiast
@hilton avatar
ET4-150 ( Called ET8 in HKG )
Joined: UTC
Posts: 62
Location: Lisbon
UTC quote
Not really, sorry.
Reading about it, used to change the IP address to make it harder to
track an individual device.
Think, the newest updates change more frequently.

I'll note any changes from a user perspective.

I always like to know I'm not a robot.

Cheers
@hilton avatar
UTC

Enthusiast
ET4-150 ( Called ET8 in HKG )
Joined: UTC
Posts: 62
Location: Lisbon
 
Enthusiast
@hilton avatar
ET4-150 ( Called ET8 in HKG )
Joined: UTC
Posts: 62
Location: Lisbon
UTC quote
Seems it changes every 2 weeks.

From... "support.apple.com"
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
Hilton wrote:
Seems it changes every 2 weeks.

From... "support.apple.com"
Oh, I do know what this is. I was thinking this was something your home wifi router was doing.

The Apple private wifi address is a rotation of your MAC address (not to be confused with Mac). A MAC address is kind of like a hardware serial number, uniquely identifying your hardware. Nefarious, evil people (namely advertising cretins) have figured out that they can use that hardware serial number as a unique identifier to track you when they have hardware in your vicinity. Apple thwarts this to preserve your privacy by changing the supposedly-fixed hardware serial number every two weeks.

This won't affect MV at all.

(And anyone that is interested can read more about the feature here)
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
We hit a milestone today. We have now collected over 100,000 IP addresses that do not appear to be human.

When these IP addresses make a return visit (if they ever do -- most do not) we will screen them automatically, even if the networks they originate from appear to be legitimate.

Not sure if this is a positive or negative thing, frankly.

One technical detail: all of these IP addresses, as well as all the other reputation data, is held in RAM memory on the server in order to make screening decisions without adding more than a millisecond or two to the process. It's backed up on disk, of course, in case the server resets for some reason. But we hold everything in RAM at all times.
Forum member supplied image with no explanatory text
⬆️    About 1 month elapsed    ⬇️
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
A new milestone: 300,000 IP addresses on our suspicious list. Any client visiting from one of these IP addresses will automatically go through "human" screening to weed out the bots.

And we are weeding out a lot of bots.
Forum member supplied image with no explanatory text
@steelbytes avatar
UTC

Veni, Vidi, Posti
2019 GTS 300 HPE w Malossi cylinder & cam
Joined: UTC
Posts: 8656
Location: Batmania aka Melbourne, Aus
 
Veni, Vidi, Posti
@steelbytes avatar
2019 GTS 300 HPE w Malossi cylinder & cam
Joined: UTC
Posts: 8656
Location: Batmania aka Melbourne, Aus
UTC quote
iirc I got a challenge yesterday when I turned on an old laptop not used for a year
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
SteelBytes wrote:
iirc I got a challenge yesterday when I turned on an old laptop not used for a year
That could have been for a number of different reasons. Public IP address would be one, the specific ISP you are connected with would be another, and then finally the client software signature (see: JA4 signals).

A lot of the more malicious bots go out of their way to impersonate well-known browsers, right down to their JA4 signature. Generally, though, the bots impersonate slightly older browsers, probably because the process of replicating a well-known JA4 is nontrivial and doesn't keep up with the latest browser releases.

The behavior of those bots, in turn, tends to pollute a given JA4 signature in our database, such that MV starts automatically distrusting anyone using that signature.
@steelbytes avatar
UTC

Veni, Vidi, Posti
2019 GTS 300 HPE w Malossi cylinder & cam
Joined: UTC
Posts: 8656
Location: Batmania aka Melbourne, Aus
 
Veni, Vidi, Posti
@steelbytes avatar
2019 GTS 300 HPE w Malossi cylinder & cam
Joined: UTC
Posts: 8656
Location: Batmania aka Melbourne, Aus
UTC quote
jess wrote:
That could have been for a number of different reasons. Public IP address would be one, the specific ISP you are connected with would be another, and then finally the client software signature...
home ip which is static which is where I mostly used MV. ping my website for the ip.

browser was brave (as always) but I guess it was 12months old and hadn't yet auto updated.
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
SteelBytes wrote:
home ip which is static which is where I mostly used MV. ping my website for the ip.

browser was brave (as always) but I guess it was 12months old and hadn't yet auto updated.
Yep. Fourteen hours ago. The logs say that the specific ASN (network) and JA4 combo is mostly unvalidated. Anything over 50% unvalidated will get screened on the first request of a fresh session.

The scoring system correctly registered it as over-screened, meaning that after the screening decision was made, the client (that would be you) subsequently registered as probably human. We over-screen somewhere between 5-10% of fresh sessions, but it's my goal to get that percentage as low as possible.

Also, that particular JA4, independent of the ASN, has a pretty lousy reputation, which strongly suggests that the bots are replicating that specific client.

(Note the label of "Windows Edge / Chrome" isn't necessarily correct -- it was just my best guess based on actual registered MV user traffic).
Log entry acknowledging the request was over-screened.
Log entry acknowledging the request was over-screened.
Forum member supplied image with no explanatory text
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
More about scoring: I am scoring each client (upon first request of a fresh session) to determine if they need to be screened. But I am also scoring the scoring process itself. This helps me tune the scoring system for better accuracy.

These are the last few minutes of scored requests, and statistics about how well it's worked over the last hour:
Forum member supplied image with no explanatory text
@25bikez avatar
UTC

Molto Verboso
2009 Genuine Stella 2T (Sold). Helix Hunting.
Joined: UTC
Posts: 1294
Location: Texas
 
Molto Verboso
@25bikez avatar
2009 Genuine Stella 2T (Sold). Helix Hunting.
Joined: UTC
Posts: 1294
Location: Texas
UTC quote
Well, I've been challenged every single time for over a week. It's easy enough, but I'm starting to feel like a pair of brown shoes in a tuxedo world ( extra points for identifying the reference).
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
25BIKEZ wrote:
Well, I've been challenged every single time for over a week. It's easy enough, but I'm starting to feel like a pair of brown shoes in a tuxedo world ( extra points for identifying the reference).
Again, could be your browser is out of date. Or your ISP is on my shit list.

If you remain logged-in, though, you shouldn't get challenged. We only challenge anonymous users.

(There's no particular benefit to logging out, unless you share your device with someone else).
@nautiker avatar
UTC

Molto Verboso
'14 Piaggio BV 350
Joined: UTC
Posts: 1054
Location: New Hampshire
 
Molto Verboso
@nautiker avatar
'14 Piaggio BV 350
Joined: UTC
Posts: 1054
Location: New Hampshire
UTC quote
Jess, this is very interesting.
May I pose a question? As you've mentioned 'free' VPNs, I have a copy of the 'Opera' browser on my Macs - they supply a 'free' VPN with the browser.
Should I assume that my IP address may be compromised if I were to occasionally use that VPN 'feature'?

Edit: A quick internet search indicates that the Opera VPN is not a true VPN - is 'Ok' (at best), but there are far better options...
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
Nautiker wrote:
As you've mentioned 'free' VPNs, I have a copy of the 'Opera' browser on my Macs - they supply a 'free' VPN with the browser.
Should I assume that my IP address may be compromised if I were to occasionally use that VPN 'feature'?
It's a fine question, but I don't really know. If I had to guess, I would guess that Opera is probably not using your IP address maliciously -- but that's only a guess.
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
Today, I've been seeing a lot of requests all coming from different IP addresses, but all within a specific Amazon AWS network (AS14618, if anyone cares). These are blocked, as that network is known to be an AWS network and thus not an authentic source of user traffic.

But despite consistently returning a 403 (Forbidden) error to all these requests, they kept coming. All day. For many hours.

What's worse, every single request was asking for the home page -- the page you get when you first arrive at Modern Vespa. It wasn't even requesting a topic page (as most bots do) -- just the homepage. Always.

Interestingly, while there were many requests, they weren't quite as rapid as the usual high-speed scrapers that issue 100 requests within the span of 1 second. These were fairly well paced, running at the rate of maybe 4-5 requests per second.

Since I had many hours to ponder this situation, it eventually occurred to me that whatever was doing the requesting was probably not massively multithreaded -- even though it was coming from thousands of different IP addresses, it seemed likely that there were only a handful of "workers" (i.e. processes) behind the requests.

So I decided to conduct an experiment, to see how many concurrent requests I was getting from that network as a whole. I did this by introducing a long delay when responding to that specific request (which wasn't hard, since it was already a 403 error, and I just had to distinguish where it was coming from).

I started with 1 second. Then 3 seconds. Then 10 seconds. Then 20 seconds.

The answer: Three. There were exactly three concurrent "workers" issuing requests. In other words, there were never more than three outstanding requests at a time, despite the long delay.

So now, instead of making hundreds of requests per minute, the malignant shitbag fuckbot can only make 9 requests per minute. Which still returns an error, of course, but the server spends much less time serving bogus requests.

One side effect is that the MV server uses one process per request, and that process is tied up during the long 20 second delay. Anticipating that this won't scale very well, I built in a safeguard that keeps an eye on the number of outstanding requests we are currently serving, and scales back the delay if the server is busy (for whatever reason). And while using an entire process for the duration of the 20 second delay is not exactly a good thing, it's still much better than the churn caused by responding to hundreds of requests per minute.

Fucking bots.
@bvbob avatar
UTC

Molto Verboso
'95 Yamaha Riva 125- '05 Piaggio BV200-'05 Honda Reflex-'08 Honda Metropolitan
Joined: UTC
Posts: 1917
Location: Ohio
 
Molto Verboso
@bvbob avatar
'95 Yamaha Riva 125- '05 Piaggio BV200-'05 Honda Reflex-'08 Honda Metropolitan
Joined: UTC
Posts: 1917
Location: Ohio
UTC quote
Jess, I don't understand what you're going through with this but I'm amazed by your knowledge. This stuff is so complicated that it bogles the mind. It takes an analytical brain to work in this space. Good on you!
OP
@jess avatar
UTC

Petty Tyrant
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
 
Petty Tyrant
@jess avatar
0:7 and counting
Joined: UTC
Posts: 39725
Location: Bay Area, California
UTC quote
BVBob wrote:
Jess, I don't understand what you're going through with this but I'm amazed by your knowledge. This stuff is so complicated that it bogles the mind. It takes an analytical brain to work in this space. Good on you!
The truth is that I've never really had any idea what I'm doing. I'm an Operating System engineer, not a webdev.

But each day, and each week, and each year, I know slightly more than I did the day / week / year before.

It's a journey.

Modern Vespa is the premier site for modern Vespa and Piaggio scooters. Vespa GTS300, GTS250, GTV, GT200, LX150, LXS, ET4, ET2, MP3, Fuoco, Elettrica and more.

Modern Vespa is made possible by our generous supporters.

Buy Me A Coffee
 

Shop on Amazon with Modern Vespa

Modern Vespa is a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for sites to earn advertising fees by advertising and linking to amazon.com


All Content Copyright 2005-2026 by Modern Vespa.
All Rights Reserved.


[ Time: 0.0162s ][ Queries: 4 (0.0036s) ][ live ][ 343 ][ ThingOne ]