让micolog首页自动生成摘要

作者包子
2011 年 5 月 2 日
暂无评论

micolog是GAE下几个最好的开源博客之一，但是首页不能自动生成摘要这个看起来很头疼。现在刚好找到一篇文章可以解决这个问题。徐明大哥好好加油,也希望更多的高手写出很多的插件。

———————–

不知各位使用Micolog的朋友有无发现，如果你的文章没有填写摘要部分，默认情况下首页会显示文章的全部内容，比如你配置了每页显示10篇，那么每页就会显示10页的全部内容，如果文章内容过长的话，浏览效果感觉非常的不好。

所以我就想着自动生成摘要，对于python，我也就是个初学者，相关语法资料什么的学习时间加起来都不超过3小时，还好python语法还算简单，而且网上搜索了半天，发现了已经有人写了一个小函数，可以提取html中的文字。--------这里要解释一下，可能有些朋友的做法或者想法较简单，认为就是对文章内容的一个简单的substring就搞定了么，呵呵，很可惜，不是那么的简单，如果文章里边包含了html的话，你会发现这个substring出来的文字显示是乱七八糟，有时候甚至会破坏你的网站布局。所以我们必须做的是提取出html中的文字部分。

那找到这个html处理函数后，就需要找修改Micolog的位置，您大可以在publishi这篇文章的时候使用这个函数，也可以在页面进行加载的时候使用，我是采用的第二种策略，下边给出详细位置，不太版本可能不一样，但如果您都已经搜到这篇文章了，那我相信您一定知道怎么找使用位置的。

打开Model.py, 找到下边这个函数

def get_content_excerpt(self,more='..more'):

if g_blog.show_excerpt:

if self.excerpt:

return self.excerpt+' %s'%(self.link,more)

else:

sc=self.content.split('《换掉书名号!--more--》')

if len(sc)>1:

return sc[0]+u' %s'%(self.link,more)

else:

return sc[0]

else:

return self.content

然后修改为下边的样子：
def get_content_excerpt(self,more='..more'):
if g_blog.show_excerpt:
if self.excerpt:
return self.excerpt+' %s'%(self.link,more)
else:
sc=self.content.split('《换掉书名号!--more--》')
if len(sc)>1:
return sc[0]+u' %s'%(self.link,more)
else:
return self.truncate_html((sc[0]),'120 html') +u' %s'%(self.link,more)
else:
return self.content
def truncate_html(self,content, args):
maxlen=100 #显示字数
format = 'html' #html：保留html格式截取 text:纯文本取
showimg = '' #空　表示不显示图片 img 显示图片
end='...' #省略显示符号
if args:
bits = args.split(' ')
if len(bits) == 3:
maxlen, format, showimg = bits
elif len(bits) == 2:
maxlen, format = bits
else:
maxlen = args
maxlen = int(maxlen)
#过虑段落标记
content = re.sub(']*/?>', '', content)
#截取纯文本
if format == 'text':
content = re.sub('(<)[^<>]*(>?)', '', content)
content = re.sub(' ', '', content)#过虑掉

if len(content) <= maxlen: return content return '%s%s' % (content[:maxlen],end) #过虑图片标记 if showimg != 'img': content = re.sub(']*/?>', '', content)
n =0
result = ''
temp = ''
isCode = False #是不是HTML代码
isHtml = False #是不是HTML特殊字符,如
#获取指定长度的内容
for i in range(len(content)):
temp = content[i]
if temp == '<': isCode = True elif temp == '&': isHtml = True elif temp == '>' and isCode:
n = n -1
isCode = False
elif temp == ';' and isHtml:
isHtml = False

if not isCode and not isHtml:
n = n + 1

result += temp

if n >= maxlen:
break

#取出所有html标记
temp_result = re.sub('(>)[^<>]*(<', result) temp_result = temp_result.lower() if len(content) - len(temp_result) < maxlen: return content result += str(end) #去掉不需要结束标记html标记 rg = "]*/?>"

temp_result = re.sub(rg, '',temp_result)

#去掉成对的html标记
temp_result = re.sub('<([a-zA-Z]+)[^<>]*>(.*?)', '',temp_result)

#取出html标记符号
arr = re.findall('<([a-zA-Z]+)[^<>]*>', temp_result)

#补全html标记
for i in range(len(arr)):
result += '' % arr[len(arr)-i-1]
return result
上边这个函数核心都是摘自网络，可惜找不到原作者链接了，抱歉抱歉...
然后直接保存上传即可
当然，您也可以直接使用我的这个py文件. http://www.xioxu.com/media/agp4aW94dXNpdGUxcg0LEgVNZWRpYRix4AcM/model.py.zip
最后再加一句牢骚，我真的是很不喜欢python没有大括弧这样的东西，通过对齐来进行逻辑区分，这真的是很不舒服。

----------------------------
我试过了，果真可以！
转自：http://www.xioxu.com/2010/10/24/micolog_summary.html

Tags: 网络

包子

知足常乐勤为本,忍耐谦和心自宽。