python - counting word frequencies in a corpus is taking too long -
my task to:
print out list of words occur in corpus , frequencies. list should sorted word frequencies in descending order (most frequent word first)."
def printwordfrequencies(index, vocab): #print("your task 3: print out list of words occur in corpus , frequencies. list should sorted word frequencies in descending order (most frequent word first).") newlist = [] in index: newlist.append([i, len(index[i])]) sorted(newlist) return newlist
index
dictionary (associative array) maps words list of positions , vocab
alphabetically sorted list of vocabulary used in corpus.
the function have written has 2 problems. firstly, not sort properly. gives right word frequency [.., ['plot', 128], ['two', 166] , ..]
not sort depending on frequency.
secondly, takes way long time run. assuming because it's trying append , sort such long list, not sure how fix issue.
one small thing that, not sure why need vocab
input parameter.
firstly, need sort newlist
once, @ end. sorting each iteration of loop why running slowly. using sorted()
not change original list, returns new sorted list, in code, sorting has no effect not storing list returned.
secondly, sort based on second item in each entry, can give sorted()
key
use. in case, sorts first based on second length item x[1]
, , if 2 entries have same length, sorts based on first entry x[0]
.
def printwordfrequencies(index, vocab): newlist = [] in index: newlist.append([i, len(index[i])]) return sorted(newlist, key=lambda x: [x[1], x[0]], reverse=true)
this further simplified using list comprehension:
def printwordfrequencies(index, vocab): return sorted([i, len(index[i])]] in index, key=lambda x: [x[1], x[0]], reverse=true)
wiki
Comments
Post a Comment