Advanced Python for Finance Technologies

Advanced 3 days

In this course you will learn to

  • Automatically extract financial data from common data providers
  • Clean, aggregate and manipulate financial data effectively
  • Conduct elementary time series analysis
  • Understand stochastic processes and common noise models
  • Construct models for inference and forecasting, such as ARIMA and linear and logistic regression
  • Generate powerful visualizations, such as candlestick charts
  • Extract financial data by scraping websites
  • Understand the fundamentals of supervised and unsupervised machine learning models as applied to finance
  • Apply Recurrent Neural Nets (RNNs) and Long Short-Term Memory Units (LSTMs) to financial time series and understand their limitations
  • Understand the principles behind blockchain technology

Training materials

All Python training students will receive comprehensive courseware.

Suggested attendees

Students who are familiar with fundamental Python syntax and concepts.

Course Outline

  • Crunch numbers: numerical python with NumPy
    • Introduce the n-d-array
    • NumPy operations
    • Broadcasting
    • Missing data in NumPy (masked array)
    • NumPy structured arrays
    • Improve performance through vectorization
    • Random number generation
    • Introduce Monte-Carlo methods
    • General approaches to implementing mathematical algorithms
  • Acquire and manipulate financial data with pandas and pandas data reader
    • Series vs. DataFrames
    • Data types in pandas overview
    • Pandas I/O tools: CSV/Excel/SQL
    • Pandas I/O tools: Pandas-data reader
    • Subset DataFrames
    • Create and delete variables
    • Discretization continuous data
    • Scale and standardize data
    • Identify duplicates
    • Dummy coding
  • Exploratory data analysis and advanced pandas methods
    • Uni- and multi-variate statistical summaries and detecting outliers
    • Group-wise calculations using pandas
    • Pivot tables
    • Long to wide and back: pivoting, stacking and melting
    • Python visualization: Matplotlib and seaborn
    • Pandas visualization: histograms, bar, box plots, scatter plots and pie charts
    • Group-by plotting
    • Pandas plot formatting
    • mpl-finance and candlestick charts
    • Merge DataFrames
    • Pandas string methods
    • Implement regular expressions in pandas
    • Handle missing data in pandas
  • Elementary time series analysis
    • Date/time formats in Python and pandas
    • Run/roll aggregates
    • Resample
  • Stochastic processes
    • Noise models overview
    • Stationarity
    • Random walks and martingales
    • Brownian motion
    • Diffusion models
    • Black-Scholes model—and its limitations
  • Time series forecasting
    • De-trending and seasonality
    • Interpolation and extrapolation
    • Auto-Regressive Integrated Moving Average (ARIMA) models
  • Measure impact: test for group differences
    • Null hypothesis testing and p-values
    • Group comparisons (p-values, t-tests, ANOVA, Chi-square tests)
    • Correlation
  • Progress with regression models
    • Linear regression
    • Logistic regression
    • Regression on count outcomes (Poisson processes)
  • Optional: scraping by—obtain financial data from publicly accessible websites
    • Requirements: Base Python. Time required: 2 hours
    • Parse HTML/CSS with BeautifulSoup
      • Navigate tree data structures
      • Select named node elements
      • Select by property
    • Establish a connection
      • Urllib3 and connections
      • POST and GET directives
    • Build a Web Scraper
      • Parse a list of websites
      • Collect and store data
    • Advanced scraping: Build a Web Spider with Scrapy
  • Optional: machine learning fundamentals for finance with scikit-learn
    • Requirements: NumPy, pandas. Time required: 4 hours
    • Machine learning approaches to multi-variate statistics
    • Machine learning theory
    • Data pre-processing
    • Supervised vs. unsupervised learning
    • Unsupervised learning: clustering
      • Clustering algorithms
      • Evaluate cluster performance
    • Dimensionality reduction
      • A priori
      • Principal component analysis (PCA)
      • Penalized regression
    • Supervised learning: regression
      • Linear regression
      • Penalized linear regression
      • Stochastic gradient descent
      • Scoring new data sets
      • Cross-validation
      • Variance-bias trade-off
      • Feature importance
    • Supervised earning: classification
      • Logistic regression
      • LASSO
      • Random forests
      • Ensemble methods
      • Feature importance
      • Score new data sets
      • Cross-validation
  • Optional: recurrent neural nets and LSTMs with PyTorch
    • Requirements: NumPy, pandas, machine learning fundamentals. Time required: 4 hours
    • Introduce PyTorch
      • Introduce tensor algebra and calculus
      • Tensor algebra in PyTorch
      • Train and validate models
    • Regression in PyTorch
      • Optimizers in PyTorch
      • Linear regression
      • Logistic regression
      • Artificial Neural Networks
      • Overview of Artificial Neural Networks (ANNs)
      • Recurrent Neural Networks (RNNs)
      • Sequence models and Long Short-Term Memory Networks (LSTMs)
    • RNNs/LSTMs with PyTorch
      • Build, train and validate a basic ANN
      • Create a RNN
      • Build a LSTM
      • Applications to financial time series and cautionary tales
  • Optional: blockchain technologies
    • Requirements: Basic Python, NumPy (useful, but not mandatory). Time required: 4 hours.
      • The ingredients for a blockchain
        • Transaction records
        • The distributed ledger
        • Chain validation
        • Nonces
      • The Hash function
        • Overview of hash functions and tables
        • Cryptographic hash functions
        • Proof-of-work
      • Advanced functions
        • Return statements
        • The JSON format
        • Exception trapping
        • Assertions
      • Construct your own blockchain
        • Generate a block
        • Genesis block
        • Generate a chain though block validation
    • Shortcomings of blockchain technologies

  • Any Windows, Linux or macOS operating system
  • Python 3.x installed (Anaconda bundle recommended)
  • An IDE with Python support (Jupyter Notebook, Spyder or PyCharm Community Edition, which is free)

Python logo

You may also be interested in

Introduction to Python Programming

Learn more about this course