Aggregating blog comments
=========================
:date: 2022-03-09
IT-related stuff written in English tends to live its own life: when people
find something interesting, it ends up linked on various news
aggregators.footnote:[For example: I was writing an article about my attempts at implementing X11 GUIs with Go, https://blog.rfox.eu/[my friend] got the idea to post it on r/programming, and very soon it spontaneously ended up on more subreddits, Hacker News, and Lobsters, with lots of people reading it. To my mild surprise, the Acer laptop serving it over a 5 Mbit/s uplink on a residential DSL connection survived the traffic just fine, though I didn't have many images on my site at the time.]
Some years back, it could be
http://hackles.org/cgi-bin/archives.pl?request=129[Slashdot], nowadays mainly
Hacker News, Reddit, or Lobsters. Aside from assigning internet points, people
there like to discuss the contents, often disregarding whether it already has
a comment section of its own.
When I noticed Jeff Kaufman
https://www.jefftk.com/p/designing-low-upkeep-software[importing comments]
from various sources directly into his blog posts, I simply had to try this
out as well. It turns out it's rather easy!
So, how do you aggregate comments back from news aggregators?
Step 1: Know where the discussion happens
-----------------------------------------
One way to achieve this is to post your articles on all appropriate media
yourself, i.e., being the first. There are two issues with that. First, on
both Hacker News and Lobsters doing so is generally scowled at, in particular if
you do this too often, and you may be awarded with a ban. Second, you don't
control subsequent reposts, and everything on these sites has its own, limited
lifetime.
The more sustainable approach is to search for your domain's appearances on
those sites of interest in an automated fashion:
- *Hacker News* has an easy-to-use
https://hn.algolia.com/api[specialized JSON search API].
- **Reddit**'s https://www.reddit.com/dev/api/[general-purpose API] doesn't
like cURL's default User-Agent, and uses very awkward data structures,
yet it's also fine to use.
- *Lobsters* doesn't have an API per se at all, but you can easily parse
the search page.
In my case, I simply extended the link:personal-intelligence-service.html[
web-watching] script I launch once a day through systemd with a few more
requests. It reduces to:
```sh
#!/bin/sh -e
status=0 workdir=watch
mkdir -p $workdir
check() {
local url=$1 f=$workdir/$(echo "$1" | sed 's|/|\\|g')
if ! curl -A Skynet --no-progress-meter -Lo "$f.download" "$url"; then
status=1
else
shift
"$@" <"$f.download" >"$f.filtered" || status=1
if [ -f "$f" ] && ! diff "$f.filtered" "$f" >"$f.diff"; then
mail -s "$url updated" root <"$f.diff" || status=1
fi
mv "$f.filtered" "$f"
fi
}
check 'https://hn.algolia.com/api/v1/search_by_date?query=p.janouch.name' \
jq -r '.hits[] | "https://news.ycombinator.com/item?id=\(.objectID) \(.url)"'
check 'https://www.reddit.com/search.json?q=site%3Ap.janouch.name&sort=new' \
jq -r '"https://reddit.com" + .data.children[].data.permalink'
check 'https://lobste.rs/domain/p.janouch.name' \
perl -lne 'print "https://lobste.rs$&" if m|/s/\w+| && !$seen{$&}++'
exit $status
```
Thus, I get diffs by mail. You just have to love the Bourne shell, Perl and jq.
Of course, the results can also be further processed directly, and with the
exception of Lobsters, which I'll talk about in a moment, the requests can even
be run from your reader's browser only. That is, if you don't care about people
with disabled Javascript.
So far I'm quite content with adding links to my static pages manually,
in the manner of:
```
```
or declaratively, for my custom static site generator based on libasciidoc:
```
:hacker-news: 12345678
:lobsters: 1a2b3c
:reddit: 1a2b3c, 4d5e6f
:reddit-subs: r/linux, r/programming
```
Step 2: Steal all the replies
-----------------------------
Having enumerated the story/post IDs, the rest is easy: you just fetch some JSON
documents, slightly reprocess them to form a DOM tree to insert into your page,
come up with appropriate styling, and you're done. Roughly speaking, because
all the respective APIs come with certain quirks.
Lobsters
~~~~~~~~
Such as, having Lobsters' CORS settings give you the middle finger, so you need
to set up an appropriately rate-limited and cached
proxy.footnote:[The relevant GitHub issues are: https://github.com/lobsters/lobsters/issues/1029[#1029], https://github.com/lobsters/lobsters/issues/361[#361].]
nginx has proven to be quite awkward and fiddly to configure, so I won't bother
quoting my hodge-podge reverse proxy rules in full. The most important lines
were:
```nginx
server {
listen 443 ssl http2;
server_name lobsters.your.domain;
location / {
proxy_pass https://lobste.rs;
add_header Access-Control-Allow-Origin 'https://blog.your.domain';
add_header Access-Control-Allow-Methods 'GET, OPTIONS';
}
}
```
Luckily, the only remaining way in which this aggregator resists being
aggregated is that the comments you'll find at
_+++https://lobsters.your.domain/s/1a2b3c.json+++_ are all flattened into
a common array, along with their computed depth. Curiously, this makes the
obvious solution to your transformation code non-recursive, as an exception
here.
Hacker News
~~~~~~~~~~~
The https://github.com/HackerNews/API[Hacker News API] is a bit awkward in that
you need to fetch _each single comment separately_. The good news is that even
for a modestly sized comment section, trivially spawning `fetch` calls in
a massively asynchronous fashion turns out to work acceptably. Also beware that
the API returns even deleted comments, and those don't have all fields set.
Note that https://news.ycombinator.com/item?id=32543023[this API may be
replaced with something else in the future].
Reddit
~~~~~~
Except for the already-mentioned insane data structures, confusing
documentation, and the `raw_json` oddity where their JSON normally contains
SGML-quoted HTML fields, there isn't much to talk about. I'm not sure if the
API returns all discussions in full, but it doesn't seem to be worth spending
much time on. It returns _enough_.
Summing up the parts
~~~~~~~~~~~~~~~~~~~~
Have a look at
https://p.janouch.name/common.js[the short Javascript code used on this page].
I'm an incorrigible code-golfer, and I try to make do without libraries, so it
should be quite readable, and even reusable. As a side note, barebones web
development is _easy_ these days, with all those browser APIs and ES6.
To see it actually load nearly 200 comments from various sources, jump to
link:/article-xgb.html#comments[the end of my X11 article]. Seeing as it made
the comment section much, much longer than the article itself, I decided to roll
them up once the loaded item count exceeds a threshold.
Step 3: Profit
--------------
There's no profit, you're the commodity.
Is this a good idea?
--------------------
I don't know. Reading imported comments gets mildly confusing, because people
assume you can't see their reactions. I suppose it will be a bit better if you
clearly identify yourself as an intelligence agency at the end of your articles.
Then there's the problem of inviting pseudonymous _masses_ into your _home_ to
express opinions, shitpost, and ask weird questions. You will get a lot of
effectively worthless comments, even though it tends to be outweighed by
interesting feedback.
In any case, it is considerably low-effort, and makes your place look a bit more
'`cozy`'.
And is this for me?
~~~~~~~~~~~~~~~~~~~
On a personal, tangential, mildly philosophical note, some past gossip that I've
reconnected with its origin this way may also portray me unfavourably (or other
people, for that matter), which will now be right there for everyone to see, and
I assume embedded by search engines as well. Interspersing otherwise sterile
works with strong views, immoral jokes, and/or ranting apparently likes to make
people get caught up in pointless indignated discussions, which I've ended up
pulling all in. But without these peculiarities, my writing would be
inauthentic, lifeless, and boring to read as well as to write. Do I embrace the
resulting chaos?
All things considered... YOLO, I guess.