FreeLing is a library, so its results are stored in data structures and returned to the called application, who is aware of what needs to be done with the results (processed, printed, serialized, etc) and in which format.
However, many applications need to be able to dump and load results of FreeLing analysis, and that is why the library also offers some convenience classes to produce or load usual formats in the NLP community.
In this example, we will see how to use the class output_conll
to dump FreeLing analysis into a column format similar to that of CoNLL shared tasks.
This module has the advantatge of being configurable, so we can decide which columns and in which order the output is going to consist of.
We are going to use Example 9 as a starting point, and simply remove the ProcessSentences
function and call the output handler to print the analysis instead.
First, make sure to include the output handler classes:
#include "freeling/output/output_conll.h"
And in Example 9, remove function ProcessSentences
, and instead of calling it after the analysis are completed, use the code:
// Create output handler and select desired output
freeling::io::output_conll out(L"out.cfg");
// print analysis results in conll format
out.PrintResults(wcout,ls);
That will create the output handler instance, and call it to print the results of the analysis.
Find here the whole code:
Assuming the input file contains the following sentences:
The big cat eats fresh fish. My neighbour's dog chased the cat.
The dog is eating meat. Some mice were hunted by the cat.
And if the file out.cfg
contains, for instance:
<Type>
conll
</Type>
<Columns>
ID FORM LEMMA TAG DEPHEAD DEPREL SRL
</Columns>
we would obtain the output:
1 The the DT 3 NMOD - - -
2 big big JJ 3 NMOD - - -
3 cat cat NN 4 SBJ - A0 -
4 eats eat VBZ 0 ROOT eat.03|eat.01|eat.02 - -
5 fresh fresh JJ 6 NMOD - - -
6 fish fish NN 4 OBJ fish.01 A1 -
7 . . Fp 4 P - - -
1 My my PRP$ 2 NMOD - -
2 neighbour neighbour NN 4 NMOD - -
3 's 's POS 2 SUFFIX - -
4 dog dog NN 5 SBJ - A0
5 chased chase VBD 0 ROOT tail.01|trail.01|chase.01|track.01 -
6 the the DT 7 NMOD - -
7 cat cat NN 5 OBJ - A1
8 . . Fp 5 P - -
1 The the DT 2 NMOD - -
2 dog dog NN 3 SBJ - A0
3 is be VBZ 0 ROOT - -
4 eating eat VBG 3 VC eat.03|eat.01|eat.02 -
5 meat meat NN 4 OBJ - A1
6 . . Fp 3 P - -
1 Some some DT 2 NMOD - -
2 mice mouse NNS 3 SBJ - A1
3 were be VBD 0 ROOT - -
4 hunted hunt VBN 3 VC hunt.01 -
5 by by IN 4 LGS - A0
6 the the DT 7 NMOD - -
7 cat cat NN 5 PMOD - -
8 . . Fp 3 P - -
We can customize the output at will, selecting different columns, or altering the order in which they are printed.
For instance, with this content for out.cfg
:
<Type>
conll
</Type>
<Columns>
ID TAG LEMMA SENSE DEPHEAD DEPREL FORM SPAN_BEGIN SPAN_END
</Columns>
We would get the output below:
1 DT the - 3 NMOD The 0 3
2 JJ big 01382086-a 3 NMOD big 4 7
3 NN cat 02121620-n 4 SBJ cat 8 11
4 VBZ eat 01168468-v 0 ROOT eats 12 16
5 JJ fresh 01067694-a 6 NMOD fresh 17 22
6 NN fish 02512053-n 4 OBJ fish 23 27
7 Fp . - 4 P . 27 28
1 PRP$ my - 2 NMOD My 29 31
2 NN neighbour 10352299-n 4 NMOD neighbour 32 41
3 POS 's - 2 SUFFIX 's 41 43
4 NN dog 02084071-n 5 SBJ dog 44 47
5 VBD chase 02001858-v 0 ROOT chased 48 54
6 DT the - 7 NMOD the 55 58
7 NN cat 02121620-n 5 OBJ cat 59 62
8 Fp . - 5 P . 62 63
1 DT the - 2 NMOD The 64 67
2 NN dog 02084071-n 3 SBJ dog 68 71
3 VBZ be 02604760-v 0 ROOT is 72 74
4 VBG eat 01168468-v 3 VC eating 75 81
5 NN meat 07649854-n 4 OBJ meat 82 86
6 Fp . - 3 P . 86 87
1 DT some - 2 NMOD Some 88 92
2 NNS mouse 10335563-n 3 SBJ mice 93 97
3 VBD be 02604760-v 0 ROOT were 98 102
4 VBN hunt 01143838-v 3 VC hunted 103 109
5 IN by - 4 LGS by 110 112
6 DT the - 7 NMOD the 113 116
7 NN cat 02121620-n 5 PMOD cat 117 120
8 Fp . - 3 P . 120 121
It is important to note that the CoNLL-like output produced by the class output_conll
can be loaded again into FreeLing by the class input_conll
instantiated with the same configuration file. The input handler will return a list of sentences with the same content than the original dumped data structures.
So, this pair of I/O handlers can be used as a way to serialize/deserialize FreeLing data structures to store them to disk, or to send them to other processes using FreeLing in a multi-process application.
Also, FreeLing provides other output handlers that can dump the produced analysis to formats such as XML, json, or NAF.