Third International Chinese Language Processing Bakeoff
Bakeoff 2006 Result Submission Instructions |
You should submit your results by email to me at this address:
The subject of the message should be
Bakeoff 2006 Result Submission
The message should be sent by the primary investigator, i.e., the one who registered for participation. This is how I will match your submission to your registration.
Please use the following conventions:
X | is the name of the corpus: ckip, cityu, msra, upuc, ldc |
TASK | is whether the result is for Word Segmentation (WS) or Named Entity Recognition (NER) |
[OC] | is whether the result is for the Open or Closed track |
Y | is an optional identifier (a lower-case letter) for multiple runs of the system |
Z | is the file suffix, .txt or .utf8 |
For example, the results of running the UTF-8 encoded cityu corpus in the closed track of the Word Segmentation task in a single run would be
cityu_test_result_WS_C.utf8
Running the GB version of the Microsoft corpus in the open track for Named Entity Recognition with two alternate systems would be
msra_test_result_NER_O_a.txt msra_test_result_NER_O_b.txt
Note that 'a' and 'b' is used to distinguish two separate runs on this corpus.
Specifically, for word segmentation, the results file should appear with one line for each sentence/line in the test file with words and punctuation separated by whitespace.
For named entity recognition, the results file should be in CoNLL two column format, with one character per line with the appropriate tag in the second column separated by a single whitespace character. The primary format will be that of the Co-NLL NER task 2002, adapted for Chinese. The data will be presented in two-column format, where the first column consists of the character and the second is a tag. The tag is specified as follows:
Tag | Meaning |
---|---|
0 (zero) | Not part of a named entity |
B-PER | Beginning character of a person name |
I-PER | Non-beginning character of a person name |
B-ORG | Beginning character of an organization name |
I-ORG | Non-beginning character of an organization name |
B-LOC | Beginning character of a location name |
I-LOC | Non-beginning character of a location name |
B-GPE | Beginning character of a geopolitical entity |
I-GPE | Non-beginning character of a geopolitical entity |
To minimize confusion, 14:00 GMT on 2006/05/17 is: