“Wyodrębnij cały tekst ze strony internetowej za pomocą BeautifulSoup i Python” Kod odpowiedzi

Wyodrębnij cały tekst ze strony internetowej za pomocą BeautifulSoup i Python

from bs4 import BeautifulSoup
from bs4.element import Comment
import urllib.request


def tag_visible(element):
    if element.parent.name in ['style', 'script', 'head', 'title', 'meta', '[document]']:
        return False
    if isinstance(element, Comment):
        return False
    return True


def text_from_html(body):
    soup = BeautifulSoup(body, 'html.parser')
    texts = soup.findAll(text=True)
    visible_texts = filter(tag_visible, texts)  
    return u" ".join(t.strip() for t in visible_texts)

html = urllib.request.urlopen('http://www.nytimes.com/2009/12/21/us/21storm.html').read()
print(text_from_html(html))

CoderHomie

Odpowiedzi podobne do “Wyodrębnij cały tekst ze strony internetowej za pomocą BeautifulSoup i Python”

Jak uzyskać tekst wszystkie linki z strony internetowej Python Beautifulsoup

Pytania podobne do “Wyodrębnij cały tekst ze strony internetowej za pomocą BeautifulSoup i Python”

Więcej pokrewnych odpowiedzi na “Wyodrębnij cały tekst ze strony internetowej za pomocą BeautifulSoup i Python” w Python

Przeglądaj popularne odpowiedzi na kod według języka

Przeglądaj inne języki kodu

Shell/Bash

C++

CSS

HTML

Java

JavaScript

Objective-C

PHP

Python

Sql

Swift

Ruby

TypeScript

Kotlin

Assembly

VBA

Scala

Rust

Dart

Elixir

Clojure

Haskell

Matlab

Erlang

Cobol

Fortran

Scheme

Perl

Groovy

Lua

Julia

Delphi

Abap

Lisp

Prolog

Pascal

ActionScript

Basic

Solidity

PowerShell

GDScript

Excel