Content based recommendation engine

We have a request from a media news company to create a real time recommendation engine based on what users read now, without using any personal data. All recommendation should be based just on news article data available from feed.

Datasets will be available only to the validated users, please signup in the NeuroMarket for future collaboration

Task description

Based on the fact that the user sees the 1st news, we need to recommend the list of next publications that will interest visitor based on the information from the previous one:

  • Title
  • News text
  • Date and time
  • Source
  • Category

Engine should support 2 languages: Ukrainian (primary) and Russian

No need to show to user the same news, but from different author, so duplicates filtering should be included in solution

Technical requirements

  • 25,000 news items appear every day
  • 100,000 unique users in 1 minute read the news

News feed format

<rss xmlns="http://..." version="2.0">
   <channel>
      <title>...</title>
      <link>https://.../</link>
      <image>
         <url>https://....png</url>
         <link>https://.../</link>
         <title>...</title>
      </image>
      <description>Новини України</description>
      <language>uk</language>
      <lastBuildDate>Thu, 14 Jul 2022 12:33:34 +0300</lastBuildDate>
      <item>
         <title>
            <![CDATA[ Теракт у Вінниці: зросла кількість загиблих ]]>
         </title>
         <link>https://.../.../1100...</link>
         <guid>https://.../post/1100...</guid>
         <pubDate>Thu, 14 Jul 2022 12:31:47 +0300</pubDate>
         <description>
            <![CDATA[ Внаслідок влучання російських ракет у Вінниці 8 людей загинуло]]>
         </description>
         <category>Події</category>
         ...
      </item>
      ...
   </channel>
</rss>

Datasets will be available to the validated users, please signup in our Marketplace to be able collaborate with our clients

 

There are no reviews yet.

Write a review

Your email address will not be published. Required fields are marked *