python - counting word frequencies in a corpus is taking too long -

- September 25, 2011

my task to:

print out list of words occur in corpus , frequencies. list should sorted word frequencies in descending order (most frequent word first)."

def printwordfrequencies(index, vocab):     #print("your task 3: print out list of words occur in corpus , frequencies. list should sorted word frequencies in descending order (most frequent word first).")     newlist = []     in index:         newlist.append([i, len(index[i])])         sorted(newlist)     return newlist

index dictionary (associative array) maps words list of positions , vocab alphabetically sorted list of vocabulary used in corpus.

the function have written has 2 problems. firstly, not sort properly. gives right word frequency [.., ['plot', 128], ['two', 166] , ..] not sort depending on frequency.

secondly, takes way long time run. assuming because it's trying append , sort such long list, not sure how fix issue.

one small thing that, not sure why need vocab input parameter.

firstly, need sort newlist once, @ end. sorting each iteration of loop why running slowly. using sorted() not change original list, returns new sorted list, in code, sorting has no effect not storing list returned.

secondly, sort based on second item in each entry, can give sorted() key use. in case, sorts first based on second length item x[1], , if 2 entries have same length, sorts based on first entry x[0].

def printwordfrequencies(index, vocab):     newlist = []      in index:         newlist.append([i, len(index[i])])      return sorted(newlist, key=lambda x: [x[1], x[0]], reverse=true)

this further simplified using list comprehension:

def printwordfrequencies(index, vocab):         return sorted([i, len(index[i])]] in index, key=lambda x: [x[1], x[0]], reverse=true)

wiki

Search This Blog

tL

python - counting word frequencies in a corpus is taking too long -

Comments

Post a Comment

Popular posts from this blog

python - Read npy file directly from S3 StreamingBody -

Asterisk AGI Python Script to Dialplan does not work -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -