Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_news() does not take into account the date range #99

Open
PashaM999 opened this issue Jul 26, 2022 · 7 comments
Open

get_news() does not take into account the date range #99

PashaM999 opened this issue Jul 26, 2022 · 7 comments

Comments

@PashaM999
Copy link

Hi, I have been looking through the code and found that the function get_news has the url generated as follows:

self.url = 'https://news.google.com/search?q={}+when:{}&hl={}'.format(key,self.__period,self.__lang.lower())

(line 259 in __init__.py)

This uses the self.__period variable, which is only responsible for periods like 7d 1m and etc. Google has a search filter to use for specific dates, which can be implemented into your code as follows:

start = f'{self.__start[-4:]}-{self.__start[:2]}-{self.__start[3:5]}'
end = f'{self.__end[-4:]}-{self.__end[:2]}-{self.__end[3:5]}'
self.url = 'https://news.google.com/search?q={}+before:{}+after:{}&hl={}'.format(key,end, start, self.__lang.lower())

This is merely a suggestion, but I feel like if the __start and __end variables are set, you sould prioritize this over your original solution.

Hope that will be useful for someone :)

@HurinHu
Copy link
Member

HurinHu commented Jul 26, 2022

Time period is not always working with Google, sometimes Google will return data out of specific range. Anyway, start/end filter might be able to add, will update it in next release.

@guibolla
Copy link

I came across the same issue, PashaM999's solution seems to work.

@zxdawn
Copy link

zxdawn commented Sep 28, 2023

Nice solution! Any plan to fix it?

@HurinHu
Copy link
Member

HurinHu commented Sep 28, 2023

Nice solution! Any plan to fix it?

It's from Google, we can't do anything about it.

@zxdawn
Copy link

zxdawn commented Sep 28, 2023

Em ... I have tried to add before:{}+after:{} like @PashaM999 did and it works well for my case. Maybe we can add the function and mention that Google sometimes returns data out of a specific range in the README file?

@matanton
Copy link

matanton commented Feb 1, 2024

I am trying this on 1.6.12 version. after my code (shown below) was only returning the recent news.

import pandas as pd
from GoogleNews import GoogleNews
#googlenews = GoogleNews()
googlenews = GoogleNews()
googlenews.clear()
googlenews = GoogleNews(start='01/01/2021',end='12/31/2021')

#googlenews = GoogleNews(lang='pt', region='BR')
googlenews.set_lang('pt')
googlenews.set_time_range('01/01/2021','12/31/2021')

I am editing now init.py (seems the self.url line is now on 273)
Just copy and paste the @PashaM999 solution on init.py file over the 273rd line?
Where do I set the start and end? on my own code?

@hanskwan
Copy link

Is it possible to include @PashaM999 in future updates? That would be helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants