*Commonly called a folksonomy.* Yeah, it's not, really. And thankfully.

dredmorbius · on May 23, 2022

How so not?

del.icio.us is specifically listed as an example.

(I'm aware that you're more than casually familiar with del.icio.us. I'm just confused by your response.)

pvg · on May 23, 2022

It is but nobody involved in making deli.cio.us used it (del predates it, too). The term was popular for a bit at the time and has now mostly gone out of use. Unsurprisingly, being a bit of a clunker.

dredmorbius · on May 23, 2022

So ... del.icio.us's tagging wasn't used? Or the make-up-your-own tagging wasn't used?

Are you doing anything related these days?

pvg · on May 23, 2022

It's two things, really, one is that 'folksonomy' was a term of its time, like, I dunno, 'blogosphere' or 'microformats' and is similarly obsolete. The other is that there were lots of people who thought or hoped that del.icio.us-style tags would lead to some sort of useful or interesting taxonomy or ontology, either emergent or by more prescriptive means (as in the initiating comment above) and that didn't really happen, perhaps because it wasn't (for the most part) what the tags were for. 'Foksonomy' is terminology that came from that line of thinking.

dredmorbius · on May 23, 2022

Fair enough, thanks.

I've been generally inclined toward adopting an existing taxonomy (or at least a usefully-sized portion of it). Unfortunately, many of the more common ones are copyright-encumbered (e.g., Dewey classification). Library of Congress classification and subject headings are available. If somewhat unweildy.

A few of my tilted windmills...

pvg · on May 23, 2022

If you haven't come across it before, this is still a fun read, a kind of manifesto of the 'emergent ontology is going to be better than designed ontology' notion.

https://web.archive.org/web/20050601013309/http://shirky.com...

It made librarians pretty mad. At the end of the day, though, putting things like 'ontology' and 'keywords you type so you can find your bookmarks again later' in tension was a category error.

dredmorbius · on May 23, 2022

Thanks, excellent reference. I have seen that before, and Shirky hits it on the head here:

What's being optimized is number of books on the shelf.

Libraries organise their content. And in a print-and-paper world, that content has locational specificity.

You don't have to organise materials by topic, but in an open-stacks model that's almost always preferable, and it has utility with closed stacks as well.[1] Where the indexing system can provide a space-transgressing capability of cross references, even that was originally bound printed volumes or journals, though as the 19th century progressed, increasingly the more open to random-access, but still locationalised index card within cabinets.

I'd recently submitted the Hathi Trust archives of the Annual Report of the Librarian of Congress, 1866--2007 to HN: https://catalog.hathitrust.org/Record/000072049 (https://news.ycombinator.com/item?id=31421398)

Having read A.R. Spofford's reports, the physicality of the archive dominates --- during his tenure the collection was housed in the North wing of the US Capitol, adjacent to what is now the Old Senate Chambers (directly west, best I can tell, with three floors and presumably some basement space). The collection had burnt shortly previously (1851), and beginning in the 1870s, Spofford was urging Congress to dedicate a building to the archive, and complained incessently of the challenges in even enumerating the holdings due to crowding of books and other materials --- 800,000 volumes in a space meant for 200,000.

The new building opened the year of Spofford's retirement, in 1897. (Spofford himself lived to 1908, remaining as Chief Assistant Librarian, and presumably enjoying the main fruit of his labours in the Jefferson Building.

Following Spofford, attention seems to turn to cataloguing. I'm reading those reports now, which expand greatly from the earlier brief form (about 6 pages during Spofford's term). I'm interested to see how that discussion develops.

It is made clear that the organisation borrows heavily from Francis Bacon's trinary distinction of history (or memory), poetry (or creative works), and philosophy, or reason, acquired through the Thomas Jefferson collection's organisation. (Jefferson's personal library re-established the Library after an earlier fire of British origin in 1812.)

I've also looked at other ontologies --- Diderot, the Encyclopaedia Britannica, Wikipedia, and several library classifications (Dewey, LoC, Colon, ...).

As with principles of truth, classifications should be useful, serving a purpose. Organising and using a collection should presumably be key amongst those purposes.

Hierarchical classifciations, like metaphors, melt if pushed loudly enough.

In kicking around some ideas for an information management ... thing ... which I variously call KFC (Krell Functional/Fucking Context), docfs, and/or webfs (latter two should be self evident, see Plan9's 9p for strong precedent). One notion is that search affords identity, in the sense that a search which is sufficiently defined to result in a single work is an identity function for that work. That's mated with the notion that a search can return a different value of results: 0, 1, a few, and many. "Few" and "many" are relative to the ability to work with the results, they're not a fixed quantity, and will vary by characteristics of the system and its user. For a skilled researcher and good tools, I'd suggest "few" might range from an order of magnitude of 10 -- 1,000. Capable of being winnowed further, with effort. "Many" might be 100 and above (there's overlap, yes), to million, billions, or more.

For purposes of this discussion, values from 2 -- 9 equal 10 ;-)

A search result is nothing (empty set), one (identity), or > 1 (list). Where a list is presented, further subdivisions might be suggested: publication dates, authors, subjects, publishers, concepts, statistically significant words or phrases, titles... At some point, those subdivisions would likely provide an identity.

My thinking is that a filesystem-like expression of qualifiers might identify a given work, or set of works. Or the empty set. Something like:

  /docfs/au:fitzgerald/ti:gatsby

Or:

  /docfs/dt:600-799/kw:transsubstantiation

If you're interested in texts of a medaeval theological concept.

There might be more paths to Gatsby:

  /docfs/dt:1915-1930/kw:west egg/kw:daisy

Again: so long as you can come up with constructs which usefully winnow down the possible results set, you can find what you're looking for.

  /docfs/dt:1980--1989/su:machine learning

The advantage of a controlled vocabulary is not that it is a strict hierarchy, which seems to be what many people get boggded down in, but that it is a useful and defined vocabulary.

And amongst the reasons why the US LoC classification and subject headings are useful is not that they are perfect and a single authority, but because over more than a century of use and adaptation they've acquired the institutional tools and processes to manage change and ambiguity reasonably well.

That, and the fact that they're freely available. Albeit in inconvenient forms.

(Another project I've been working on in fits and starts.)

________________________________

Notes:

1. Among alternatives I'm aware of is the SuDoc classification, used by the U.S. Supervisor of Documents, which is arranged by government department* and date. Which turns out to be a useful way for grouping that corpus physically.