The corpus contains complete Portuguese manuscripts published from 1500 to 1936 devided into 5 sub-corpora per century as shown below. The corpus was POS tagged using TreeTagger.
Texts are balanced in terms of the variety, consisting of 48 European Portuguese texts and 52 Brazilian Portuguese texts. You can find more information in the paper that describes the corpus. The complete inventory of texts is here and more detail regarding annotation can be found here.
Accessing the Corpus
The corpus can be downloaded with POS annotation or accessed via CQP query interface.
Citing the Corpus
Zampieri, M. and Becker, M. (2013) Colonia: Corpus of Historical Portuguese. In: ZSM Studien, Special Volume on Non-Standard Data Sources in Corpus-Based Research. Volume 5. Shaker. [pdf]