2024-02-23

Parkrun has a news/blog page. This news blog has an RSS feed, at https://www.parkrun.com/feed/. Which I subscribe to with Akregator, my RSS reader of choice.

Searching for some info on Parkrun recently, I came across a couple of recent articles which hadn't shown up on my news feed. And looking at the feed closely, I noticed that the most recent fetches had failed with some kind of error.

I normally ignore feed errors, because sometimes feeds aren't available for some reason (including my computer not being connected to the internet occasionally), and it's not a problem because I'll pick up any missed posts on the feeds the next time a feed is downloaded. But these articles were a week or two old, and so I should have got them recently.

I checked the page sources to see if the feed URL had changed (which happens on some websites sometimes) and couldn't find it listed at all. Interesting. Maybe the feed had been deleted altogether? I tried accessing the feed in Firefox... and there it was. Present in its original location, and seemingly free of errors.

So I tried downloading the feed manually, in order to have a closer look at it, to see if there were any issues that might cause my feed reader to think it was broken - which would be one reason why it didn't show up.

$ wget https://www.parkrun.com/feed/
--2024-02-23 12:00:00--  https://www.parkrun.com/feed/
Resolving www.parkrun.com (www.parkrun.com)... 34.248.148.22, 52.210.236.124, 54.171.25.67
Connecting to www.parkrun.com (www.parkrun.com)|34.248.148.22|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2024-02-23 12:00:00 ERROR 403: Forbidden.

Huh.

That's weird. Reload in Firefox, feed is fine. Re-run wget command, 403 Forbidden. Let's try...

$ wget --user-agent="Mozilla/5.0 (X11; Linux x86_64) Gecko/20100101 Firefox/123.0" https://www.parkrun.com/feed/
--2024-02-23 12:01:00--  https://www.parkrun.com/feed/
Resolving www.parkrun.com (www.parkrun.com)... 54.171.25.67, 34.248.148.22, 52.210.236.124
Connecting to www.parkrun.com (www.parkrun.com)|54.171.25.67|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39816 (39K) [text/xml]
Saving to: ‘index.html’
index.html                    100%[==============================================>]  38.88K   200KB/s    in 0.2s    
2024-02-23 12:01:00 (200 KB/s) - ‘index.html’ saved [39816/39816]

It does the same thing with curl. So, Parkrun is blocking access to the RSS feed of its news blog by User-Agent.

It's allowing Firefox, a web browser which doesn't actually provide a useful view of RSS feeds (unless you have an extra extension installed), but blocking actual RSS readers (like Akregator) or home-grown news-reading scripts which use standard cross-platform utilities like wget or curl.

W. T. F?!?!

Edit 2024-03-09: The feed seems to be working again. Yay?

◾ Tags:

Profile

grok_mctanys

May 2024

S M T W T F S
   1234
567891011
1213 1415161718
19202122232425
26 2728293031 

Expand Cut Tags

No cut tags