Install BlackLab standalone on MacOS (by Dirk Roorda)
This guide was written by Dirk Roorda for the CLARIAH project General Missives project. This is a slightly edited (top heading, other heading levels) archived version; the original can be found here. It is licensed under the MIT license.
About
This document describes how to install a working Blacklab server and client on a standalone machine running macos.
TOC
- Prerequisites
- Requirements
- Java
- Tomcat
- Run Tomcat
- Manage TomCat
- Set up managers
- Blacklab
- File organization
- Explanation
- Server
- Configure
- Example corpus
- Deploy within TomCat
- Test the query tool
- Client
- File organization
Prerequisites
Requirements:
- macos 10.15.7 or higher (Catalina), earlier probably works as good
- Commandline tools installed (part of XCode)
xcode-select --install - HomeBrew (a macos package manager)
JAVA
We use Homebrew to install a Java Development Kit (JDK), from an open source, using a mechanism of HomeBrew that is optimized for large binaries.
brew update
brew tap homebrew/cask
brew cask install adoptopenjdk
java -versionResults in:
openjdk version "15" 2020-09-15
OpenJDK Runtime Environment AdoptOpenJDK (build 15+36)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 15+36, mixed mode, sharing)TOMCAT
We install both TomCat and its native library, in order to get access to SSL.
brew install tomcat
brew install tomcat-nativeWe get a notification:
In order for tomcat's APR lifecycle listener to find this library, you'll
need to add it to java.library.path. This can be done by adding this line
to $CATALINA_HOME/bin/setenv.sh
CATALINA_OPTS="$CATALINA_OPTS -Djava.library.path=/usr/local/opt/tomcat-native/lib"
If $CATALINA_HOME/bin/setenv.sh doesn't exist, create it and make it executable.N.B This CATALINA is a code name for TomCat and has nothing to do with the current
macos version 10.15, also named Catalina.
OK, it appears that
CATALINA_HOME is /usr/local/Cellar/tomcat/9.0.40/libexec
although there is no CATALINA_HOME visible to the shell.
You can either set such a variable and use it in the ocmmands below, or spell
the value out.
It seems that it is not needed to set this variable for TomCat to work.
Depending on whether you have chosen to set CATALINA_HOME in your shell
say either
vim $CATALINA_HOME/bin/setenv.shor
vim /usr/local/Cellar/tomcat/9.0.40/libexec/bin/setenv.shand add the line
CATALINA_OPTS="$CATALINA_OPTS -Djava.library.path=/usr/local/opt/tomcat-native/lib"Then
either
chmod ugo+x $CATALINA_HOME/bin/setenv.shor
chmod ugo+x /usr/local/Cellar/tomcat/9.0.40/libexec/bin/setenv.shRun TOMCAT
There are several ways to run TomCat:
As a service that starts when the Mac starts:
brew services start tomcatManually
catalina runand stop it by Ctrl-C.
As a background process:
catalina startand stop it with
catalina stopSee also
catalina -hManage TOMCAT
In the browser, navigate to
You should see a page that says that you have successfully installed TomCat.
Set up managers
See the users in
cd /usr/local/Cellar/tomcat/9.0.40/libexec
vim conf/tomcat-users.xmland add the lines
<role rolename="manager-gui"/>
<user username="dirk" password="dirk" roles="manager-gui"/>where you can replace dirk and dirk by whatever you like.
Blacklab
To see what Blacklab is, see blacklab intro.
A working blacklab installation consists of a server, a client, and corpus data.
File organization
The following bit of file organization is not rigidly prescribed. You can also choose another organization. Whatever you do, it needs to be reflected in subsequent config files and shell commands.
This is an organization that I find convenient at this stage:
blacklab/
data/
incoming/
indexes/
program/
installation/I have put it all under ~/local i.e. my home directory and then
a subdirectory blacklab.
Contents and downloads
The data directory will receive corpus data.
The incoming subdir receives downloaded data, the indexes subdir is the destination
of the blacklab indexer.
The installation directory receives the downloaded blacklab-server-2.1.0 war file.
This file is attached to a release of the Blacklab repo.
The releases are listed
here
and we pick release 2.1.0.
You see a file
blacklab-server-2.1.0.war
there, download it and place it in the installation directory.
We will unzip it in place, and copy its WEB-INF/lib directory to the program directory.
Over there, we move the blacklab-2.1.0.jar file one level up, so that it is directly beneath
the program dir.
When we cd to the program dir, we can easily run the java program in the blacklab jar file,
supported by the libraries in the jar files under the lib subdirectory.
We'll need the blacklab program soon: for indexing the first corpus.
We also need to download a front-end, a.k.a. client.
This is in the
instituutnederlandsetaal/blacklab-frontend repo.
Again, move to the
releases
page and there you find release 2.1.0.
You see a file
corpus-frontend-2.1.0.war
there, download it and place it in the installation directory.
Server
Configure
Set an environment variable to point to the blacklab server config, do
this in your .zshrc file.
Note that ~ will not work properly, so spell out the complete path
from the root of your system to the directory where your blacklab config dir is:
BLACKLAB_CONFIG_DIR="/Users/dirk/local/blacklab"
export BLACKLAB_CONFIG_DIRThen edit/create a server config file:
cd ~/local/blacklab
vim blacklab-server.yamland add the contents
---
configVersion: 2
# Where indexes can be found
# (list directories whose subdirectories are indexes, or directories containing a single corpus)
indexLocations:
- /Users/dirk/local/blacklab/data/indexesN.B.
Note that in this config file you can not use the ~ abbreviation.
Deploying the blacklab war now leads to a friendly message from blacklab that there are no indexes. So, before we deploy, we create the indexes for an example corpus and put it in place.
Example data
We download the
Brown corpus,
a single XML file of 66 MB when unzipped, to be put in data/incoming.
Run the blacklab index tool by running the blacklab jar:
cd ~/local/blacklab/program/Then run the jar:
java -cp "blacklab-2.1.0.jar" nl.inl.blacklab.tools.IndexTool create ~/local/blacklab/data/indexes/brown ~/local/blacklab/data/incoming/brownCorpus.lemmatized.xml teiDeploy within Tomcat
In the terminal, give the command
catalina startThen, in the browser navigate (again) to
Click the manage app and login with dirk, dirk (which is what have have put in the
TomCat config file for users, above).
In the list of applications, click the blacklab-server-2.1.0 entry.
You should see something like:
<blacklabResponse>
<blacklabBuildTime>2020-06-22 16:01:41</blacklabBuildTime>
<blacklabVersion>2.1.0</blacklabVersion>
<indices>
<index name="brown">
<displayName>brown</displayName>
<description/>
<status>available</status>
<documentFormat>tei</documentFormat>
<timeModified>2020-11-24 11:47:37</timeModified>
<tokenCount>1008320</tokenCount>
</index>
</indices>
<user>
<loggedIn>false</loggedIn>
<canCreateIndex>false</canCreateIndex>
</user>
</blacklabResponse>which means that all is well and that the Brown corpus indexes have been found.
Test the query tool
Still in the program directory you can run the query tool:
java -cp blacklab-2.1.0.jar nl.inl.blacklab.tools.QueryTool ~/local/blacklab/data/indexes/brownYou get a prompt CorpusQL> .
Enter the query "egg" followed by a newline.
You get results.
Exit by giving the command exit.
CorpusQL> "egg"
1. [0106] is thick , much like an [egg] plant?s skin , so that poison
2. [0115] it with beaten yolk of [egg]
3. [0144] kitchen for coffee grounds and [egg] shells . All these materials and
4. [0303] On this , she builds an `` [egg] compartment ?? or `` egg cell ?? which
5. [0303] builds an `` egg compartment ?? or `` [egg] cell ?? which is filled with
6. [0303] the beebread loaf and the [egg] compartment is closed . The queen
7. [0303] retires to a life of [egg] laying . The first worker bees
8. [0303] in unobtrusively , to deposit an [egg] on a completed loaf of
9. [0303] before the bumblebees seal the [egg] compartment . The hosts never seem
10. [0303] which is provided with an [egg] plus a store of beebread
11. [0332] with the chicken and the [egg] . Which came first ? ? Is it
12. [0400] constructed a highboard around the [egg] case which he had placed
12 hits in 6 documents
105 ms elapsed
CorpusQL> exit
dirk:~/local/blacklab/program >Client
The main front-end for Blacklab is in a separate GitHub repo