dsp2 agrégation

Multi-source bank aggregation: how does it work?

Providing users with a centralized view of their personal and corporate banking data is at the heart of what we do. At Budget Insight, we use multiple sources. Our goal is to retrieve the maximum amount of information that’s widely accessible through web scraping, an excellent complement to PSD2.
Find out what it’s is all about and how we manage multi-source aggregation.

A pioneer in aggregation through web scraping 

Budget Insight is one of the first players to have used web scraping to conduct banking aggregation through an API with the goal of freeing up usage and providing new services. Through the use of web scraping, we connect to bank websites on behalf of users and automate the retrieval of data which is then cleaned and stored in a database that’s updated daily.

The advantage of web scraping is that it can cover a wide area — from payment accounts to documents, heritage products and cryptocurrencies. 

PSD2, the cusp of a great change 

With the implementation of a new European legal framework came PSD2 APIs for banks to provide their data in an authenticated way. Three types of players can be distinguished: ASPSPs (banks providing the APIs), TPPs (consumers of the APIs, such as Budget Insight) and PSUs (holders of the accounts).

The PSD2 is accompanied by constraints for the TPPs. On one hand, there’s the obligation to use APIs instead of banking websites when it comes to payment accounts. On the other hand, there’s the obligation of strong authentication (SCA). In the past, PSUs were only required to securely provide us with their credentials in order for us to establish a secure connection. With the generalization of SCA (both on APIs and on banking sites), connection must take place in two steps which regularly generates misunderstanding among PSUs.

However, PSD2 also has two advantages. It makes aggregation APIs more stable than websites (the security layer is standardized for communication, changes are announced upfront) and, with all European banks covered by PSD2, we can deploy our technology to integrate more.

Implementation of PSD2 into existing technology 

1. PSD2 API: our technical improvements for security and speed of development

After experimenting with a few banks in their sandboxes and in pre-production, we have obtained the long-awaited eIDAS certificates to access production APIs. Since the certificates contain private keys that are considered sensitive, we decided to add a brick to our technical stack — a proxy that intercepts the requests and presents the certificates if the bank’s connector requires it. 

The proxy, in coordination with the bank’s module, will also add signature headers as required by the various PSD2 API standards. Despite API standards such as STET and the Berlin Group, the signature part was implemented separately between the various banks. For each new implementation, we needed to ensure that it was compatible with pre-existing signatures and, if not, to create a new one.

Subsequently, some banks have made a new API route available for enrolling SIPs. Enrollment provides a client_id and a client_secret, identifiers that allow us to connect to their APIs. We jumped at the opportunity to automate the enrollment of white label customers who use their own approval. The administration console has been adapted to our backend to allow for the automation of these enrollments. 

Several types of enrollment are available depending on the bank:

  • Manual enrollment via the corresponding web portal or by sending an email to the bank. This may seem simple, but many traps need to be avoided, such as the subscription to production or the subscription to certain APIs. Above all, they require a significant amount of human time, which we can only help you with.
  • Automatic enrollment with a simple click on a button on the console.
  • Hybrid enrollments, where one part must be done on the portal and another part via the API. This is not our favorite method as it requires special support.
  • Approval numbers, where no enrollment is necessary. In this case, the certificates and the approval number as client_id are sufficient for recognizing the TPP. Most foreign connectors (mostly with the Berlin Group standard) work this way. This method has the advantage of being easy to implement, but it does not allow customization of the bank interface (e.g., the TPP name and logo cannot be displayed).

It’s unfortunate that there were not stronger restrictions in the PSD2 standards regarding enrollment. To avoid losing our customers during this step, documentation needed to be maintained for each bank for which we have developed a PSD2  connector. We are setting up a structure to manage this maintenance. 

However, standardization has allowed us to establish notions of relatedness between our various PSD2 connectors, which prevents us from starting from scratch when implementing a new API. From a parent class that manages the OAuth2 path of our connectors, we were able to establish abstractions for each standard. Nevertheless, the divergences (linked to the divergence from the standard or to points left to free interpretation by the bank) are directly managed in the bank’s connector.

The integration of the APIs in our model was generally done according to two types of cases. Either the connector did not yet exist, or we already had a connector where we were doing the scraping. In the second case, we needed to maintain the continuity of the connections and ensure the consistency of the data. For this, we introduced the notion of source. According to this logic, each connection or connector is linked to sources (scraping or the PSD2 API). The sources must therefore coexist while remaining as transparent as possible for our customers.

2. Authentication mechanisms according to use cases: adapting to the depth of data required

The arrival of new sources and the implementation of strong authentications on pre-existing sources have modified the path of the PSUs. First, regarding strong authentications on historical web scraping sources, several types of paths have been set up:

  • Cross-browser authentication: When the PSU validates its strong authentication on one device, it’s no longer necessary for 90 days on any other device. The advantage of this type of authentication is that the PSU can perform its strong authentication independently, without us. However, if authentication was not performed when establishing the connection with us, we will indicate to the PSU that an action is required. If we can manage the SCA without the PSU leaving our solution we will, but it’s sometimes difficult to develop this part because authentication only appears once every 90 days.

  • Non cross-browser authentication: The PSU must perform its strong authentication on each new device. The connector then manages this authentication. The path is smoother, but it can also be a source of errors or the authentication mode may not yet be managed (or simply not manageable), such as with certificate-based authentications. Moreover, the resolution of some errors in the authentication path cannot be done without the help of a PSU.

  • Systematic authentication: This is usually established for corporate customers and is coupled with strong authentication every 24 hours on the bank’s API. We can handle these authentications, but the length of time they are valid is a hindrance to automated data synchronization for PSUs as well as for our clients. Note that a recent EBA decision stated that it’s no longer possible to request an ACS from the PSU through a TPP more frequently than 90 days.

Let’s now add to this path the PSD2 source which also requires strong authentication every 90 days. Depending on the implementation of the connector, we propose two types of paths for this new source: 

  • The Webauth path: We perform a total redirection of the PSU on the bank’s dedicated interface, where it will need to perform its double authentication on its side before the bank returns and handles access to the API data. 
  • The Credentials path: We implement the scraping of the redirection, where our connector handles the strong authentication itself.

The advantage of the Webauth path is that it does not require the management of the various authentication modes and therefore requires less maintenance on this part. Nevertheless, we have no control over the interface’s possible errors. Among those, we regularly notice problems in the App2App functionality. Where the interface should lead directly to the bank’s application on a cell phone, thus facilitating authentication, we sometimes encounter compatibility problems which prevent the path from being followed, depending on the version of the application or the OS. For any error, an audit can be frustrating. The error that appears on the bank’s part is technically untraceable (no logs on our side). Nevertheless, the error can be reported to us, and we can speak to the bank about it.

Depending on the customer’s needs, we can change the path. For example, if the customer is only interested in payment accounts, the Webauth path is useful because the PSU that’s redirected to the bank’s interface can be more easily trusted. If the customer is interested in account types outside the scope of the PSD2 (such as savings accounts), the credentials mode should be preferred. If the customer has chosen the Webauth path, the PSU should enter their credentials once on the bank’s interface and then on the Budget Insight (or customer’s) interface in order to use them on the scraping source. The goal is still for the PSU to authenticate in the shortest possible timeframe and in the most fluid way. In order to make the user experience successful, we sometimes work with the banks to optimize their pathways. In the future, it’s possible that regulations will push us to generalize the Webauth path for PSD2  sources.

3. A quality of service mission

When new PSD2  sources are put into production, our mission is to ensure the continuity of the service and the data retrieved in the most transparent way possible. It’s important to remember that once the strong authentications have been carried out, our connections retrieve part of their accounts from the PSD2  source and another part from the banking site. If the PSD2  API does not respond, we must be able to retrieve the same information for these first accounts on the banking site, and we will therefore switch automatically until the failure is resolved.

When we build a new connector, we ensure through our knowledge of banking data that the information is consistent on both sides: web scraping and the PSD2  API. For example, we ensure that the balances are identical, that the credit card payment dates are present and that we have access to the complete list of transactions, etc. In the end, regardless of the source, and without even being aware of it, a person using our services must find the same data to follow their accounts. 

Starting from a 100% web scraping composition, the problem was in adding the PSD2  source and, for payment accounts, to guarantee an identical result after these accounts are brought up by the API instead of the site. This may seem immediate but, in fact, in order to meet our quality objectives, building a new API connector follows several steps:

  1. Opening of accounts at the relevant banks. From our experience, very few banks offer functional sandboxes with realistic data whose API reacts in the same way as the production API. Therefore, we absolutely need real data.

  2. Thorough testing of the API: Robustness, error codes, limitations, data identical to that on the site…for example, do we get three months of past transactions in addition to those that have not yet been debited? Are the transaction and account labels legible and identical to those on the site? For deferred debit cards, are the total debit amounts at the end of each month available as well as the details of each transaction with the actual payment date? Are the IBANs present? We have already observed problems with each of these items at multiple banks.

  3. Tests in an API context. What is the quality of the webview? Is it stable? Does it provide clear messages to the PSU in the case of errors, such as a wrong password? Is the App2App well implemented? An observed example of a blocking path for the PSU involves an interface that sends an image of a crying cloud with the obscure message, “An error has occurred” for any error. We support some level of auditing, even if it means reporting errors to the bank rather than putting a connector into production and risking customer complaints later on.

  4. Auditing the behavior in integrated mode with our own API. This involves ensuring that not only is a connection possible, but that a hundred or so are also possible. This is where we can observe daily synchronization problems, possible load problems or even data feedback problems that we could not see with our only test account. For example, the API works perfectly for the consumer segment, but not at all for the business segment.

  5. Complements to the checks on the consistency of the data itself, as described above, from the first connections established.

You will easily understand that in order to develop new connectors while maintaining quality requirements, going through a significant beta testing stage is a must. This generally results in an early production launch on internal domains, and then on a voluntary customer site before full deployment.


PSD2 represents both a challenge and a great opportunity. To integrate it, we have fully experimented with the new APIs while suffering a few setbacks. For the past three years, we have been testing and deploying the APIs of most French banks, and pointed out to regulators the difficulties caused by banks’ minimalist reading of the PSD2  framework. A recent victory was obtaining a strong authentication exemption (SCA) from 90 to 180 days from the Authority.

To ensure a complete and robust offering and to make it appropriate for as many people as possible, we believe in continuing to play both sides of the coin, offering only API sources to customers who are exclusively interested in aggregating bank data covered by PSD2  and offering our web scraping expertise with our dedicated sources for other data. We have also made our technology more resilient. Through multi-sourcing, we are able to retrieve data at any time and update connections in the most seamless way, even when a banking site or API is unavailable.

Above all, PSD2  APIs give us the opportunity to very quickly open up the European market. However, in the interest of quality, it’s not just a matter of connecting to the APIs, but of testing them in advance with clients and our own accounts. We now have all the expertise to conquer Europe.

Authors: Damien Mat and Maxime Gasselin, Software Engineers

Let’s liberate finance!
Learn how Budget Insight can help you create better banking services and experiences.



Posted in Uncategorised