In summer 2021, our perfect startup team developed telegram bot, which allows you to find the shortest path between two instagram users. As a result you'll get a chain of users subscribed for each others. In this article, I'm going to tell you how we achieved it using pgRouting – extension for PostgreSQL.
To use this extension, you have to install it, just use following guide.
# 1. Enter your container with PostgreSQL or skip this step # if you have postgres server running on your system directly dokku postgres:enter <service_name> # dokku style docker exec -it <container_name> bash # docker style # 2. Update apt packages apt update && apt upgrade # 3. Install pgRouting (for 12th version of postgresql) apt install postgresql-12-pgrouting # 4. Go to psql and enter your database psql -U postgres <your_db_name> # 5. Create extensions CREATE EXTENSION PostGIS; CREATE EXTENSION pgRouting; # 6. Check version SELECT PostGIS_Version(); SELECT pgr_version();
In our database we had user's relationships in following format (screenshot below). User from left column is in friendship with user from right column, i.e. they both subscribed for each other. By the way, we had about 4.2 million of such friendship edges 😯
To find shortest path between two user ids, we used pgRouting implementation of Dijkstra algorithm. You can find more examples of using on their page. In our case, we executed query below.
SELECT ig_us.user_id, ig_us.username, ig_us.profile_pic_url, row_number() over () as "order" FROM pgr_dijkstra( 'SELECT id, user_id_1 as source, user_id_2 as target, 1 as cost FROM instagram_friendship', 19010706, -- start user_id 295372316, -- target user_id FALSE -- oriented graph ) pgr LEFT JOIN instagram_user ig_us on pgr.node = ig_us.user_id;
To get result, which looks more prettier, we joined
pgr_dijkstra result with
instagram_user table. Below is final result of query, which is actually a path from
Path is found 🎉
With more than 200 million users in
instagram_user table and about 4.2 million friendship edges it took about 7 seconds to find a path in average. We had thoughts about switching to neo4j (graph DB) for storing data and searching paths, but couldn't betray PostgreSQL ❤️, which we use almost everywhere.