The above is the poem I made in the end.
It has two original texts. Both of them are translated from ancient Chinese poems. One poet is male and the other is female.
The poem composed by the male poet is called Bring In The Wine; The poem composed by female poet is called Little Overlapping Hills. Here they are.
The process I made the final poem.
- Use textblob to identify the tags of words, like noun, verbs.
- Create a python function file to count the top 10 words that show up the most in each poem.
- Create another python function file, that combines the tow word list I made together by adding a “:” in between them.
The command I used in command line. The actual function python files attached after these.
- python wordCount_textBlob_tag.py <bring_in_the_wine.txt >male.txt
- python wordCount_textBlob_tag.py <little_over_lapping_hills.txt >female.txt
- python combineTwoFiles.py male.txt female.txt >final.txt
Code for wordCount_textBlob_tag.py
import sys,string from textblob import TextBlob from collections import Counter import codecs whole_line = "" word_list =  text = sys.stdin.read() text = text.lower() text = text.translate(string.maketrans("",""), string.punctuation) # text = sys.stdin.read().decode('ascii', errors="replace") blob = TextBlob(text) tags = blob.tags for word,tag in blob.tags: if ("NN" in tag): word_list.append(word) # print word counter = Counter(word_list) most_common = counter.most_common(10) for item in most_common: print item
Code for combineTwoFiles.py
import sys word_list= minLen = 0 # for n in sys.argv[1:]: # print n file1 = open(sys.argv) file1_lines = file1.readlines() file2 = open(sys.argv) file2_lines = file2.readlines() if(len(file1_lines)<len(file2_lines)): minLen = len(file1_lines) else: minLen = len(file2_lines) i = 0 while(i < minLen): word_list.append(file1_lines[i] +":"+file2_lines[i]) i += 1 for item in word_list: item = item.replace('\n','') print item
>>>>>>>Some Detour I made before
I used word count method to list the top 10 words first, without having it analyzed by textblob. But it turns out to have a lot of “the, a, of …” So I was not happy about it.
The following are the pure word count method I made in python.
import sys,string from collections import Counter whole_line = "" word_list =  for line in sys.stdin: line = line.split('\n') whole_line += line+" " whole_line = whole_line.lower() #remove all the punctuations whole_line = whole_line.translate(string.maketrans("",""), string.punctuation) word_list = whole_line.split() counter = Counter(word_list) most_common = counter.most_common(10) for item in most_common: print item