macicogna wrote:As modern Search Engines, like Google, return just a bunch of Javascript, my first step was to capture "Captcha blocking", seen it with a TWebBrowser, in order to inspect the response latter and learn how to identify its content as HTML files.
Modern search engines provide REST APIs for performing searches in application code, returning machine-parsable results (usually XML or JSON). You should not be submitting HTML webforms and then scraping the resulting HTML/JavaScript for results.
macicogna wrote:Yes, even using TWebBrowser the TWebBrowser::OnDocumentComplete approach is a better idea. My doubt is that subsequent TWebBrowser::Navigate() calls (inside a loop) might cancel previous one.
If you use the OnDocumentComplete event (and use it correctly, ie no ProcessMessages() loop), then you won't be able to use a simple Navigate() loop anymore. You will have to wait until OnDocumentComplete is fired before then calling Navigate() again. You will have to break up your code logic into pieces, executing each piece at the proper time.
macicogna wrote:Did you verify with a packet sniffer that Navigate() is actually attempting to contact Google?
Yes, I was using TCPView.
TCPView is not a packet sniffer. It only shows you active connections, but not the actual data that being transmitted on those connections. If you Navigate() to multiple URLs on the same server, connections might get reused. Use a real packet sniffer, like Wireshark or Fiddler, to look at the actual HTTP requests.
macicogna wrote:Your question about "[...] actually attempting to contact Google?" makes me think that this problem might be outside my code, as just Google behave badly in Windows 7 an 10.
Web browsers don't treat Google differently than any other sites. Something else is going on.
macicogna wrote:So, I've searched about Google's integration with Web Browsers and I've seen different Google URL patterns. So, I changed it to this one:
- Code: Select all
Google=https://www.google.com.br/search?q=$(q)
And "Bingo"! Now it is working! The "/search?" makes the difference in Windows 7 and 10.
Whatever made you think that "https://www.google.com.br/#q=$(q)" would work in the first place? "#" is a bookmark delimiter. Everything after "#" is not actually part of the requested URL itself.
When you navigate to "https://www.google.com.br/#q=bcbj", for example, the web browser will connect to "www.google.com.br" and send a request for "/". The web server will never see "q=bcbj". And Google's "/" page is fairly minimal, which could explain why your Navigate() loop ran so quickly. Only AFTER the response has been fully processed by the web browser, the web browser will then look for a bookmark named "q=bcbj" within the HTML, and if found then scroll the display to that position.
When you navigate to "https://www.google.com.br/search?q=bcbj" instead, the web browser will connect to "www.google.com.br" and send a request for "/search?q=bcbj", which is a request for "/search" with "q=bcbj" as its input parameters, thus allowing search results to be queried and returned.