python - counting word frequencies in a corpus is taking too long -




my task to:

print out list of words occur in corpus , frequencies. list should sorted word frequencies in descending order (most frequent word first)."

def printwordfrequencies(index, vocab):     #print("your task 3: print out list of words occur in corpus , frequencies. list should sorted word frequencies in descending order (most frequent word first).")     newlist = []     in index:         newlist.append([i, len(index[i])])         sorted(newlist)     return newlist 

index dictionary (associative array) maps words list of positions , vocab alphabetically sorted list of vocabulary used in corpus.

the function have written has 2 problems. firstly, not sort properly. gives right word frequency [.., ['plot', 128], ['two', 166] , ..] not sort depending on frequency.

secondly, takes way long time run. assuming because it's trying append , sort such long list, not sure how fix issue.

one small thing that, not sure why need vocab input parameter.

firstly, need sort newlist once, @ end. sorting each iteration of loop why running slowly. using sorted() not change original list, returns new sorted list, in code, sorting has no effect not storing list returned.

secondly, sort based on second item in each entry, can give sorted() key use. in case, sorts first based on second length item x[1], , if 2 entries have same length, sorts based on first entry x[0].

def printwordfrequencies(index, vocab):     newlist = []      in index:         newlist.append([i, len(index[i])])      return sorted(newlist, key=lambda x: [x[1], x[0]], reverse=true) 

this further simplified using list comprehension:

def printwordfrequencies(index, vocab):         return sorted([i, len(index[i])]] in index, key=lambda x: [x[1], x[0]], reverse=true) 




wiki

Comments

Popular posts from this blog

Asterisk AGI Python Script to Dialplan does not work -

python - Read npy file directly from S3 StreamingBody -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -