Advanced Python for Finance Technologies

Advanced 3 days

In this course you will learn to

  • Automatically extract financial data from common data providers
  • Clean, aggregate and manipulate financial data effectively
  • Conduct elementary time series analysis
  • Understand stochastic processes and common noise models
  • Construct models for inference and forecasting, such as ARIMA and linear and logistic regression
  • Generate powerful visualizations, such as candlestick charts
  • Extract financial data by scraping websites
  • Understand the fundamentals of supervised and unsupervised machine learning models as applied to finance
  • Apply Recurrent Neural Nets (RNNs) and Long Short-Term Memory Units (LSTMs) to financial time series and understand their limitations
  • Understand the principles behind blockchain technology

Training materials

All Python training students will receive comprehensive courseware.

Suggested attendees

Students who are familiar with fundamental Python syntax and concepts.

Course Outline

  • Crunching the numbers: numerical python with NumPy
    • Introduction to the n-d-array
    • NumPy operations
    • Broadcasting
    • Missing data in NumPy (masked array)
    • NumPy structured arrays
    • Improving performance through vectorization
    • Random number generation
    • Introduction to Monte-Carlo methods
    • General approaches to implementing mathematical algorithms
  • Acquiring and manipulating financial data with pandas and pandas data reader
    • Series vs. DataFrames
    • Overview of data types in pandas
    • Pandas I/O tools: CSV/Excel/SQL
    • Pandas I/O tools: Pandas-data reader
    • Subsetting DataFrames
    • Creating and deleting variables
    • Discretization of continuous data
    • Scaling and standardizing data
    • Identifying duplicates
    • Dummy coding
  • Exploratory data analysis and advanced pandas methods
    • Uni- and multivariate statistical summaries and detecting outliers
    • Group-wise calculations using pandas
    • Pivot tables
    • Long to wide and back: pivoting, stacking and melting
    • Python visualization: Matplotlib and seaborn
    • Pandas visualization: histograms, bar and box plots
    • Pandas visualization: scatter plots and pie charts
    • Group-by plotting
    • Pandas plot formatting
    • mpl-finance and candlestick charts
    • Merging DataFrames
    • Pandas string methods
    • Implementing regular expressions in pandas
    • Handling missing data in pandas
  • Elementary time series analysis
    • Date/time formats in Python and pandas
    • Running/rolling aggregates
    • Resampling
  • Stochastic processes
    • Overview of noise models
    • Stationarity
    • Random walks and martingales
    • Brownian motion
    • Diffusion models
    • The Black-Scholes model—and its limitations
  • Time series forecasting
    • De-trending and seasonality
    • Interpolation and extrapolation
    • Auto-Regressive Integrated Moving Average (ARIMA) models
  • Measuring impact: testing for group differences
    • Null hypothesis testing and p-values
    • Group comparisons (p-values, t-tests, ANOVA, Chi-square tests)
    • Correlation
  • Progressing with regression models
    • Linear regression
    • Logistic regression
    • Regression on count outcomes (Poisson processes)
  • Optional: machine learning fundamentals for finance with scikit-learn
    • Requirements: NumPy, pandas. Time required: 4 hours
    • Machine learning approaches to multivariate statistics
    • Machine Learning theory
    • Data pre-processing
    • Supervised versus Unsupervised learning
    • Unsupervised learning: clustering
      • Clustering algorithms
      • Evaluating cluster performance
    • Dimensionality reduction
      • A priori
      • Principal component analysis (PCA)
      • Penalized regression
    • Supervised learning: regression
      • Linear regression
      • Penalized linear regression
      • Stochastic gradient descent
      • Scoring new data sets
      • Cross-validation
      • Variance-bias trade-off
      • Feature importance
    • Supervised earning: classification
      • Logistic regression
      • LASSO
      • Random forests
      • Ensemble methods
      • Feature importance
      • Scoring new data sets
      • Cross-validation
  • Optional: recurrent neural nets and LSTMs with PyTorch
    • Requirements: NumPy, pandas, Machine Learning fundamentals. Time required: 4 hours
    • Introduction to PyTorch
      • Introduction to tensor algebra and calculus
      • Tensor algebra in PyTorch
      • Training and validating models
    • Regression in PyTorch
      • Optimizers in PyTorch
      • Linear regression
      • Logistic regression
      • Artificial Neural Networks
      • Overview of Artificial Neural Networks (ANNs)
      • Recurrent Neural Networks (RNNs)
      • Sequence models and Long Short-Term Memory Networks (LSTMs)
    • RNNs/LSTMs with PyTorch
      • Building, training and validating a basic ANN
      • Creating a RNN
      • Building a LSTM
      • Applications to financial time series, and cautionary tales
  • Optional: scraping by—obtaining financial data from publicly accessible websites
    • Requirements: Base Python. Time required: 2 hours
    • Parsing HTML/CSS with BeautifulSoup
      • Navigating tree data structures
      • Selecting named node elements
      • Selecting by property
    • Establishing a Connection
      • Urllib3 and connections
      • POST and GET directives
    • Building a Web Scraper
      • Parsing a list of websites
      • Collecting and storing data
    • Advanced Scraping: Building a Web Spider with Scrapy
  • Optional: blockchain technologies
    • Requirements: Basic Python, NumPy (useful, but not mandatory). Time required: 4 hours.
      • The ingredients for a blockchain
        • Transaction records
        • The distributed ledger
        • Chain validation
        • Nonces
      • The Hash function
        • Overview of hash functions and tables
        • Cryptographic hash functions
        • Proof-of-work
      • Advanced functions
        • Return statements
        • The JSON format
        • Exception trapping
        • Assertions
      • Constructing your own blockchain
        • Generating a block
        • The genesis block
        • Generating a chain though block validation
    • Shortcomings of current blockchain technologies

  • Any Windows, Linux or macOS operating system
  • Python 3.x installed (Anaconda bundle recommended)
  • An IDE with Python support (Jupyter Notebook, Spyder or PyCharm Community Edition, which is free)