Started by Ronald RauheSep 11, 2021

Open
beautifulsoup div extract

0 VIEWES 0 LIKES 0 DISLIKES SHARE
0 LIKES 0 DISLIKES 0 VIEWES SHARE

I want to extract FIRMA, STADT,BEWORBEN FÜR POSITION,JAHR DER BEWERBUNG,ERGEBNIS information from ALL pages from below website. Here is code I used. It extracts needed data (from ALL pages) but duplicates output and keeps running. Is there any way to fix this solution? Maybe there is another solution?

data=[]
with requests.Session() as session:
session.headers = {
'x-requested-with': 'XMLHttpRequest'
}
page = 1
while True:
print(f"Processing page {page}..")
url = f'https://www.kununu.com/de/volkswagen/bewerbung/{page}'
response = session.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
new_comments = [
data.get_text()
for data in soup.find_all('div', {'class':'review-details user-content hidden-xs'})
]
if not new_comments:
print(f"No more comments. Page: {page}")
break
data += new_comments
print(data)
print(len(data))
page += 1
print(data)

0 Replies

You must be Logged in to reply
Trending Categories
15
Software39
DevOps45
Frontend Development24
Backend Development19
Server Administration17
Linux Administration24
Data Center24
Sentry24
Terraform21
Ansible29
Docker28
Penetration Testing14
Kubernetes16
NGINX19
JenkinsX17
Techiio

Techiio is on the journey to build an ocean of technical knowledge, scouring the emerging stars in process and proffering them to the corporate world.

Follow us on:

facebooklinkdeintwitter

Subscribe to get latest updates

You can unsubscribe anytime from getting updates from us
Copyright techiio.com @2020 Kolkata, India
made with by Abhishek & Priyanka Jalan
Copyright techiio.com @2020
made with by Abhishek & Priyanka Jalan