7行代码爬取本博客所有文章

Lan

2021-01-05 / 0 评论 / 724 阅读 / 正在检测是否收录...

01/05

温馨提示：

本文最后更新于2021年01月05日，已超过1890天没有更新，若内容或图片失效，请留言反馈。

为了水篇博客，我也是尽力了。

如果报错，就新建一个文件夹abc

import requests, parsel
for i in range(1, 37):
    res = parsel.Selector(requests.get(f'https://gitlab.com/Vastsa/lanpicbed/-/raw/master/page_{i}.html').text)
    titles = res.xpath("//h2[@class='entry-title']/a/text()").extract()
    for index, value in enumerate(res.xpath("//h2[@class='entry-title']/a/@href").extract()):
        with open("./abc/" + titles[index] + '.html', 'a+', encoding='utf8')as f:
            f.write(parsel.Selector(requests.get(value).text).xpath("//div[@class='single-content']").extract_first())

暂无标签

版权属于： Lan

本文链接： https://lanol.cn/post/432.html

作品采用：《署名-非商业性使用-相同方式共享 4.0 国际 (CC BY-NC-SA 4.0) 》许可协议授权

7行代码爬取本博客所有文章

评论 (0)