Third International Chinese Language Processing Bakeoff
Data Download |
Four corpora are available for this bakeoff:
Corpus | Encoding | Zip Archive | Tar.gz Archive | Annotation guidelines | |
---|---|---|---|---|---|
Traditional Chinese | |||||
Academia Sinica | Unicode/Big Five Plus | Zip | tar.gz | ||
City University of Hong Kong | HKSCS Unicode/Big Five | Zip | tar.gz | ||
Simplified Chinese | |||||
Microsoft Research | gb18030/Unicode | Zip | tar.gz | Doc | |
University of Pennsylvania/University of Colorado | CP936/Unicode | Zip | tar.gz | HTML |
Corpus | Encoding | NE Types | Zip Archive | Tar.gz Archive | Annotation Guidelines |
---|---|---|---|---|---|
Traditional Chinese | |||||
City University of Hong Kong | HKSCS Unicode/Big Five | PER, LOC, ORG | Zip | tar.gz | |
Simplified Chinese | |||||
Microsoft Research | gb18030/Unicode | PER,ORG,LOC | Zip | tar.gz | |
Linguistic Data Consortium | CP936/Unicode | PER,LOC,ORG,GPE | Converts to Co-NLL and XML fomrats format | html |