GSOC 2017 - Proposal and basic structure of API

3 minute read

Published:

This is the basic outline of the design of the scipy.diff module. I have thoroughly investigated the functioning of statsmodels and numdifftools. Along with it, scipy.misc.derivative has also been looked at. In scipy.diff I propose to have adaptations from numdifftools and statsmodels majorly due to their accurate results, sophisiticated techniques of computation and ease of accessibility.

  • Platform for coding :

    As the base code of scipy is present in github and due to the wide usablitiy and easy to use features, I have planned to do the coding of the module in gitub under my forked scipy repository.

  • Numerical Differentiation :

    For numerical differentiation, I will use the finite difference method along with Richardson extrapolation for better accuracy. Complex step method approximation will also be implemented at the latter part of the summer.

    I have decided to implement the following:

    • Derivative
    • Gradient
    • Jacobian
    • Hessian
    • Hessdiag
  • Statsmodels

    Statsmodels provide a faster amaths innd accurate results for the numerical differentiation for first differences. It also computes the derivatives using complex-step derivative approximations. The code is very simple and easy to understand. However, it is not modular and contains no classes which is the reason why it has to be refactored. The default method is forward differences which I think should be changed to central due to the higher accuracy of central method. However, there is a concern related to the analyticity of a function while computing the numerical derivatives using complex steps. This issue needs to be discussed with the mentors and get suggestions from. Statsmodels can be used to compute the Derivative, Gradient, Jacobian and Hessian.

  • Numdifftools

    Numdifftools provides a higher accuracy as compared to stasmodels due to the use of Richardson extrapolation and use of adpative step sizes. However, due to looping, the methods are a bit slower as compared to statsmodels. The code is highly modular and implemented using OOP concepts. It provides a high range of options such as order of accuracy, order of differentiation, steps generator methods, etc: Numdifftools uses adaptive step-size to calculate finite differences, but will suffer from dilemma to choose small step-size to reduce truncation error but at the same time avoid subtractive cancellation at too small values.

  • Structure and Functions:

    I decided to make a fully modularized code using the OOP concept so that the code becomes easy to interpret and understand. The maintainablity also becomes easy. Following is the basic set of inputs and the output for Derivative, Gradient, Jacobian, Hessian, Hessdiag

    • Input

      fun: Input function
           function of one array.
               
      step: float, array-like or StepGenerator object, optional 
           default can be min_step_generator.
               
      method: {‘central’, ‘complex’, ‘multicomplex’, ‘forward’, ‘backward’}, optional
              default will be central.
                  
      order: int, optional
            defines the order of the error term in the Taylor approximation used.  For ‘central’ and
            ‘complex’ methods, it must be an even number. defualt is 2.
                
      n: int, optional
            Order of the derivative. Default is 1.
                
      tool: {‘numdifftools’,‘statsmodels’}, optional
            default will be numdifftools.     
            *Note: This parts needs to be discussed with the mentors.
      
    • Output

     der: ndarray
          array of derivatives
    

    The module will contain the following files:

    • extrapolation.py
    • derivative.py
    • gradient.py
    • jacobian.py
    • hessian.py
    • step_generator.py
    • tests (folder)
    • docs (folder) However, this is subject to change as per the requirements.
  • Project Timeline:

    Project Timeline is same as my GSoC proposal (as of now) link

  • To be discussed:

    • Which tool (Statsmodels or Numdifftools) should be given a higher priority and when to adpot one method over other?
    • Should tolerence be used to choose one tool over other?
    • Is there any need to include naive implementations from scipy.misc?

NOTE This proposal will be altered and evolved along according to the discussion with the mentors.