Embracing uncertainty: version 2 of the earthworm multi-access key

Earthworm by Charlie Bell

When I identify earthworms (a hobby that I find increasingly compelling!) my main identification resource is the knowledge-base developed by Emma Sherlock for her excellent FSC ADIGAP publication - Key to the earthworms of UK and Ireland. But my first point of access to that knowledge-base isn't the book itself, it's the Tomorrow's Biodiversity multi-access key.

I like to use the multi-access key because it lets me home straight in on the features that grab my attention on any earthworm I'm trying to identify and puts me in charge of the identification process. I find it more empowering than using a dichotomous key and I think that it is speeding up the rate at which I am assimilating the knowledge of earthworm morphology. However, I always have the AIDGAP key close to hand - it's invaluable for confirming identifications or for trying to sort out identification when the observed features don't quite match the knowledge-base.

Earthworm with segments marked (by Charlie Bell)But after using version 1 of the key for a while, I noticed that it was weakest (like any key) when the features on the earthworm I was trying to identify didn't seem to match the knowledge-base that well. This could happen for a number of reasons. For example sometimes the segment before the segment on which the clitellum (saddle) starts can be intermediate in appearance between the segments that come before it and those on the clitellum itself. Depending on how similar it is to those on the clitellum, it can be hard to say whether or not such a segment should actually be counted as the first clitellum segment.

In the first version of the key, if I indicated that the the clitellum for a tanylobic earthworm started on segment 25, then all earthworms that were not tanylobic with their clitellum start range encompassing the value of 25 would be shifted to the 'excluded' column as shown in the picture below.

Version 1 of key showing tanylobic worm with saddle starting on segment 25

The trouble with  version 1 of the key is that it considers all species not matching a particular character as 'equally wrong' for that character. But if we specify 25 for the clitellum start, is a species for which the minimum clitellum start character is 26 'equally wrong' as a species for which the minimum clitellum start character is 33? Clearly not: the first species is much more likely to be a match than the second when we account for the kinds of problems we've already discussed in interpreting characters like these (as well as other things like natural variation in morphology). 

Version 2 of the key, which we have recently published, radically changes the matching algorithms used to decided whether or not an earthworm is moved to the 'excluded' column and also how the earthworms are ranked (arranged from top to bottom) to indicate how far they are from matching the specified taxa. The outward appearance of the key has changed very little, but the changes to matching and ranking have resulted in a much more useful and information-rich key.

The image below shows version 2 of the key with the same character specification as before, i.e. a tanylobic worm with the clitellum starting on segment 25, but you can see significant changes in the way the earthworms are arranged in the two columns.

Version 2 of key showing tanylobic worm with saddle starting on segment 25

The first difference you notice is that although Allolobophoridella eiseni is still at the top of the 'matching' column, the species Lumbricus rubellus and Lumbricus castaneus are now shown just below it instead of being pushed into the 'excluded' column. Eagle-eyed readers will also have noticed that the 'matching' and 'excluded' columns have also changed their names from 'candidate taxa' to 'possible species' and 'excluded taxa' to 'unlikely species' respectively - partly to reflect the idea that the new matching and ranking methods embrace uncertainty. This alerts us to the possibility that Lumbricus rubellus and Lumbricus castaneus might be possible matches if we are not sure of the clitellum start character. For the specified user input, these are the next most likely matches according to the knowledge-base. But if you look for these species in the figure showing version 1 of the key, it is not obvious that these two species are quite close matches. This is why version 2 is such an improvement over version 1. 

That the species Lumbricus rubellus and Lumbricus castaneus are not relegated to the 'possible species' column reflects the fact that a 'tolerance' of two segments difference has been specified for any of the segment-based characters (accessed from the new 'Options' button which you can see in the image above). The bold number to the right of the species names indicates the total disagreement between all specified segment characters and the numbers in the knowledge-base for each species. 'Possible species' are always ranked, from top to bottom, in descending order for this number. 'Unlikely species' are ranked first according to how well they match the two categorical characters - 'head type' and 'setae spacing' - since these characters are considered to be the least variable. After that, they are ranked according to the total disagreement between the segment characters.

Another change in version two of the key is that no earthworm is ever moved to the 'unlikely species' on the basis of either the 'length' or 'diameter' characters - which reflects their extreme variability. The image below is from version 1 of the key where the only characters specified is 'length' at 10 mm. On the basis of this, all the earthworms except two species have been discounted.

Version 1 of key specifying a small worm of length 10 mm

An image from version 2 of the key is shown below. The arrangement of the species is radically different. None of the earthworms have been excluded based on the character, but the difference between their length ranges in the knowledge-based and the specified length of 10 mm has been used to rank them.

 Version 2 of key specifying a small worm of length 10 mm 

In version 1 of the key, the coloured indicators for specified characters were coloured either blue for species that matched or red for species that didn't. In version 2 of the key, the colours for segment and size characters show variations between blue and red depending on how different they are from the specified values. Colours for the two categorical characters - 'head shape' and 'setae spacing' - are still either blue or red.

Because the two categorical characters are considered to be the most reliable, they are positioned first in the list of characters (this has meant moving 'setae spacing' higher up the list of characters compared to version 1). Any earthworm that isn't a match for either of these categorical characters, if they are specified, are moved into the 'unlikely species' - no half measures. So if there's any doubt about either one of these characters,  it's best not to specify a value for it.

In summary, the sorting and ranking for version 2 can be summarised thus:

  1. A disagreement between a specified categorical character and any earthworm results in the earthworm being moved to the 'unlikely species' column.
  2. A disagreement between a specified segment character and any earthworm will only result in the earthworm being moved to the 'unlikely species' column if the difference exceeds the 'tolerance' value specified in the options (default value is 2).
  3. Earthworms are never moved to the 'unlikely species' column based on values of either of the size characters.
  4. Earthworms in the 'possible species' column are firstly ranked on the basis of the total disagreement between all specified segment characters and the values in the knowledge-base and secondly on the basis of the difference between any specified size characters values and their values in the knowledge-base.
  5. Earthworms in the 'unlikely species' column are firstly ranked on the basis of the number of matches with categorical characters, secondly on the total disagreement between all specified segment characters and the values in the knowledge-base and thirdly on the basis of the difference between any specified size characters values and their values in the knowledge-base.

Give the new version a go. See what you think.

Further information

For more information on the Tom.bio project, visit the Tom.bio homepage. Check out upcoming FSC field courses.  For upcoming earthworm ID events in 2016 run by the Earthworm Socieity of Britain (including two run jointly with FSC Tomorrow's Biodiversity) goto the ESB events page.