Sets and tags

Sets and tags

OAI-PMH supports a system for the creation of sub-collections that it calls "sets". oai4courts adds database support for a less-formal system of tags. Sets are named according to a hierarchical system that implies an equally hierarchical partitioning of the database. Tags may be applied in any way you like.

What are sets sets of? What do tags tag?

Sets and tags are assigned to decisions, not writings. This may seem an odd choice if your court's output is organized so that every writing that is part of a decision (majority opinion, concurrences, dissents, etc.) is separately disseminated and hence is (in OAI terms) a separate item. In most cases, though, classificatory metadata will apply to the decision rather than to the individual writings. Thus sets and tags apply to the decision too.

Sets and the database

The OAI-PMH specification imposes, somewhat sneakily, a requirement that the population of sets be determined in advance rather than (eg.) built dynamically from database queries. The oai4courts database structure tries -- equally sneakily -- to give as much support as possible to the notion of dynamic sets. This can be seen in the organization of the oai_sets table in the database, which has the following elements:

  • set_spec : the OAI <setSpec> element, which specifies an identifier for the set as well as showing its place in the hierarchy of sets
  • set_name: the OAI <setName> element, which provides a short human-readable name for the set
  • description: the OAI <setDescription> element, which describes the set
  • set_type: a hint about the process used to construct the set
  • query: a hint about what the processor should process

The last two elements, which are specific to oai4courts, can be used to specify a processing regime for population of the set by an external utility program or script. For example, a set_type of "supct-full-text" and a "query" value of "civil+rights" might indicate that members of the set with this id could be found by firing a fulltext search on the "supct" corpus using the query string "civil+rights". Or the set_type might indicate that a particular relational database holds relevant information, and the query element might contain a SQL query that, when fired against that database, would retrieve a list of set members for incorporation into the decisions_oai_sets table. In other words, the two fields simply allow the repository operator to store hints about what process and query should be used to identify members of the set. The hints and processes are entirely arbitrary and are assumed to be run by external utilities when the decisions_oai_sets table is to be populated.

Tags and the database

As of this writing, little has been done to support tagging other than the provision of appropriate tables in the database. Not what you would call a fully-thought-through feature.