Google Trends Market Sentiment Analysis Tool
Overview
Traditional market data captures what has happened, but rarely explains why or what happens next. This project introduces a systematic framework that leverages alternative data—specifically online search volumes via Google Trends—as a leading indicator for tactical asset allocation and risk control.
By analyzing real-time shifts in collective investor attention, the tool quantifies market psychology before it fully materializes into trading decisions.
The Core Scaling Challenge & Solution
The Problem: Google Trends normalizes search volume to a relative
0 \text{ to } 100scale per individual request. This makes it statistically impossible to directly compare or chain together data from different batch requests.The Algorithmic Solution: This script implements an "Anchor-Logic" to establish a unified global scale. Every automated batch request includes a high-volume, neutral reference term (configurable via
--anchor, default:'weather'). The pipeline then dynamically rescales parallel batches using the median ratio of the overlapping anchor series:\text{Scaling Factor} = \text{median}\left(\frac{\text{Anchor}_{\text{Target Batch}}}{\text{Anchor}_{\text{Reference Batch}}}\right)This technique achieves true cross-batch comparability across independent API calls.
Methodology & Pipeline Architecture
The prototype (google_trends_sentiment_prototype.py) is structured as a modular quantitative pipeline:
1. Data Ingestion (Anchor-Based)
Automated retrieval of pre-defined Risk-On, Risk-Off, and Macroeconomic keywords via the pytrends API, structurally unified globally using the Anchor-Logic described above.
2. Normalization Layer
Applies a Z-score transformation to the rescaled raw data. This establishes statistical parity across keywords with vastly different structural search volumes by centering the mean at 0 and scaling variance to 1:
z = \frac{x - \mu}{\sigma}
Where:
xis the anchor-adjusted search volume intensity.\muis the historical mean of that specific keyword series.\sigmais the historical standard deviation of the series.
3. Index Construction & Signal Extraction
- Sentiment Spread: Measures the relative strength of optimism versus pessimism in the market:
\text{Sentiment Spread} = \left( \frac{1}{N} \sum_{i=1}^{N} z_{\text{Risk-On}, i} \right) - \left( \frac{1}{M} \sum_{j=1}^{M} z_{\text{Risk-Off}, j} \right) - Macro PCA Factor: Extracts the first principal component (
PC_1) from the combined Z-score feature matrix using Singular Value Decomposition (SVD) viascikit-learn:\mathbf{Z} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T \implies PC_1 = \mathbf{Z}\mathbf{v}_1This isolates the dominant underlying psychological driver capturing the highest common variance.
4. Market Validation (Optional)
Resamples the extracted signals to a weekly frequency and performs quantitative correlation analysis against live financial benchmarks using yfinance without compromising the statistical independence of the core signal.
Note: This prototype currently focuses on contemporaneous correlation as a proof-of-concept. Time horizons and keyword definitions are structurally predefined rather than data-driven optimized.
Getting Started
Dependencies
Install the required quantitative stack:
pip install pytrends pandas numpy scikit-learn yfinance matplotlib