Whether it is an investment or a data science project, stock market data is often applied for analysis. The Stock Exchange provides Stock Daily Transaction Information on the government data open platform. You can also query historical records through API, but it is easy to be banned for frequent requests.
A common alternative is to download it from Yahoo Finance. It is convenient to access by search stock symbols, like “TSLA” for Tesla. If you want to search for other stock markets, just add the market name after the symbol. For example, you can get 0050 in Taiwan by querying “0050.TW”.
Many articles mentioned that the free API service of Yahoo Finance had been closed. Indeed, I occasionally get an error message as follows when I execute it, but it can work normally after a few minutes later.
If you run a Python script manually, I don’t think this will be a big issue. However, if you use an automated task to retrieve data, a try/except statement may be a good solution.
HTTPSConnectionPool(host=’query1.finance.yahoo.com’, port=443): Max retries exceeded with url: /v8/finance/chart/0050.TW?period1=-2208988800&period2=1606746234&interval=1d&includePrePost=False&events=div%2Csplits (Caused by NewConnectionError(‘: Failed to establish a new connection: [Errno 51] Network is unreachable’))
Besides, some companies also provide similar API services, such as IEX Cloud and Alpha Vantage, but these services maybe not contain the market data you invest. It is strongly recommended to check clearly before paying. The keyword you need is “Supported Symbols”. Symbols refer to the stock code, such as this article 〈How to Find All Supported Symbols on IEX Cloud〉.
If you want to obtain historical data and keep accumulating new data, the best way is to use the following code to capture all historical data, and then write a program to retrieve daily transactions in the future. You may be interested with Schedule Python And R Script With Linux Crontab
I executed the following code on June 25, 2020, to download historical data of the Taiwan Market and saved them as h5 file. The total rows is about 3.8 million, and the file size is about 300Mb, which is not too big.
Process And Code
Get Stock Symbol List
First, you have to get the stock symbol list. The piece of code is to get the stock symbol list from daily transaction data in Taiwan Market.
import requests import numpy as np import pandas as pd link = 'https://quality.data.gov.tw/dq_download_json.php?nid=11549&md5_url=bb878d47ffbe7b83bfc1b41d0b24946e' r = requests.get(link) data = pd.DataFrame(r.json()) data.to_csv(path + '/stock_id.csv', index=False, header = True)
Download Stock Data With Yahoo Finance API
According to Free Stock Data for Python Using Yahoo Finance API, the limit of Yahoo Finance is: Using the Public API (without authentication), you are limited to 2,000 requests per hour per IP (or up to a total of 48,000 requests a day). I used the following code to retrieve the data of 1,116 symbol, and it perfectly dit its job.
# Remember to install yfinance import yfinance as yf import h5py import pandas as pd # Read stock symbol list stock_list = pd.read_csv(path + '/stock_id.csv') stock_list.columns = ['STOCK_ID', 'NAME'] historical_data = pd.DataFrame() for i in stock_list.index: # Retrieve Data stock_id = stock_list.loc[i, 'STOCK_ID'] + '.TW' data = yf.Ticker(stock_id) df = data.history(period="max") # Add Stock Symbol df['STOCK_ID'] = stock_list.loc[i, 'STOCK_ID'] # Merge historical_data = historical_data.append(df) time.sleep(0.8) historical_data.to_hdf(path + '/historical_data.h5', key='s')