Free Friday 11/1/24

 In this blog post, we will discuss a simple Python script that uses the BeautifulSoup library to scrape a web page, search for a specific word entered by the user, and highlight its instances within the text.

The Python script starts by importing the necessary libraries: BeautifulSoup and requests. It then prompts the user to enter a word and a URL to scrape. The script makes a GET request to the provided URL, parses the HTML content using BeautifulSoup, and extracts the text from the page.

Next, it counts the number of times the user-specified word appears in the text, ignoring case sensitivity. If the word is found at least once, the script prints the total count and proceeds to locate and display the surrounding text of each instance with a context of 10 characters before and after the word.

Finally, if the word is not found in the text, a message indicating its absence is displayed.

This Python script offers a practical example of web scraping and text analysis using BeautifulSoup and requests libraries. By allowing users to input a word and a URL, it demonstrates how to search for and highlight instances of the word within the text of a web page. This functionality can be further expanded and customized to suit specific web scraping and text processing needs.

Code Snippet

from bs4 import BeautifulSoup
import requests

word = input("enter word \n>")
url = input("enter url \n>")

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

text = soup.get_text()
instances = text.lower().count(word.lower())

if instances > 0:
    print(f"The word '{word}' was found {instances} times in the URL.")
    index = 0
    while index < len(text):
        index = text.lower().find(word.lower(), index)
        if index == -1:
            break
        start_index = max(0, index - 10)
        end_index = min(len(text), index + len(word) + 10)
        surround = text[start_index:end_index]
        print("Surrounding text:")
        print(surround)
        index += len(word)
else:
    print(f"The word '{word}' was not found in the URL.")

Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *