python - How do I download NLTK data? -
updated answer:nltk works 2.7 well. had 3.2. uninstalled 3.2 , installed 2.7. works!!
i have installed nltk , tried download nltk data. did follow instrution on site: http://www.nltk.org/data.html
i downloaded nltk, installed it, , tried run following code:
>>> import nltk >>> nltk.download()
it gave me error message below:
traceback (most recent call last): file "<pyshell#6>", line 1, in <module> nltk.download() attributeerror: 'module' object has no attribute 'download' directory of c:\python32\lib\site-packages
tried both nltk.download()
, nltk.downloader()
, both gave me error messages.
then used help(nltk)
pull out package, shows following info:
name nltk package contents align app (package) book ccg (package) chat (package) chunk (package) classify (package) cluster (package) collocations corpus (package) data decorators downloader draw (package) examples (package) featstruct grammar inference (package) internals lazyimport metrics (package) misc (package) model (package) parse (package) probability sem (package) sourcedstring stem (package) tag (package) test (package) text tokenize (package) toolbox tree treetransforms util yamltags file c:\python32\lib\site-packages\nltk
i see downloader there, not sure why not work. python 3.2.2, system windows vista.
tl;dr
to download particular dataset/models, use nltk.download()
function, e.g. if looking download punkt
sentence tokenizer, use:
$ python3 >>> import nltk >>> nltk.download('punkt')
if you're unsure of data/model need, can start out basic list of data + models with:
>>> import nltk >>> nltk.download('popular')
it download list of "popular" resources, these includes:
<collection id="popular" name="popular packages"> <item ref="cmudict" /> <item ref="gazetteers" /> <item ref="genesis" /> <item ref="gutenberg" /> <item ref="inaugural" /> <item ref="movie_reviews" /> <item ref="names" /> <item ref="shakespeare" /> <item ref="stopwords" /> <item ref="treebank" /> <item ref="twitter_samples" /> <item ref="omw" /> <item ref="wordnet" /> <item ref="wordnet_ic" /> <item ref="words" /> <item ref="maxent_ne_chunker" /> <item ref="punkt" /> <item ref="snowball_data" /> <item ref="averaged_perceptron_tagger" /> </collection>
edited
in case avoiding errors downloading larger datasets nltk
, https://stackoverflow.com/a/38135306/610569
$ rm /users/<your_username>/nltk_data/corpora/panlex_lite.zip $ rm -r /users/<your_username>/nltk_data/corpora/panlex_lite $ python >>> import nltk >>> dler = nltk.downloader.downloader() >>> dler._update_index() >>> dler._status_cache['panlex_lite'] = 'installed' # trick index treat panlex_lite it's installed. >>> dler.download('popular')
and if wants find nltk_data
directory, see https://stackoverflow.com/a/36383314/610569
and config nltk_data
path, see https://stackoverflow.com/a/22987374/610569
updated
from v3.2.5, nltk has more informative error message when nltk_data
resource not found, e.g.:
>>> nltk import word_tokenize >>> word_tokenize('x') traceback (most recent call last): file "<stdin>", line 1, in <module> file "/users/l/alvas/git/nltk/nltk/tokenize/__init__.py", line 128, in word_tokenize sentences = [text] if preserve_line else sent_tokenize(text, language) file "/users//alvas/git/nltk/nltk/tokenize/__init__.py", line 94, in sent_tokenize tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language)) file "/users/alvas/git/nltk/nltk/data.py", line 820, in load opened_resource = _open(resource_url) file "/users/alvas/git/nltk/nltk/data.py", line 938, in _open return find(path_, path + ['']).open() file "/users/alvas/git/nltk/nltk/data.py", line 659, in find raise lookuperror(resource_not_found) lookuperror: ********************************************************************** resource punkt not found. please use nltk downloader obtain resource: >>> import nltk >>> nltk.download('punkt') searched in: - '/users/alvas/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '' **********************************************************************
wiki
Comments
Post a Comment