#08 中等

题目描述

读取一个文本文件,统计其中每个单词出现的频率(忽略大小写和标点),按频率从高到低输出前 N 个单词。

示例

输入:文件内容: "Hello world! Hello Python. Hello everyone."

输出:hello: 3, world: 1, python: 1, everyone: 1

提示

使用 open() 读取文件,用 split() 分词,用 collections.Counter 统计频率。注意清理标点符号。

参考答案

from collections import Counter
import string

def word_frequency(filename, top_n=10):
    """统计文本文件中每个单词的频率"""
    with open(filename, 'r', encoding='utf-8') as f:
        text = f.read()
    
    # 转小写,移除标点
    text = text.lower()
    # 替换标点为空格
    for char in string.punctuation:
        text = text.replace(char, ' ')
    
    # 分词并统计
    words = text.split()
    counter = Counter(words)
    
    return counter.most_common(top_n)

# 示例:创建测试文件
with open('test.txt', 'w') as f:
    f.write("Hello world! Hello Python. Hello everyone. Python is great!")

# 测试
result = word_frequency('test.txt')
for word, count in result:
    print(f"{word}: {count}")
# hello: 3, python: 2, world: 1, everyone: 1, is: 1, great: 1