Parsing a long text and fetching desired pattern strings using SPARK SQL

Pusapativna
3 min readSep 21, 2020

Recently there was a need from an analyst group to extract all desired words/substrings within a long text column using spark SQL.

The input text was basically an email message body text that a seller sends out to his/her customers with many urls like their store webpage, facebook page etc.

Trivial solution that the analyst team member tried was using regex_extract built-in function.

SET spark.sql.parser.escapedStringLiterals=true;
select

--

--