{
    "mode": "perldoc",
    "parameter": "DBD::SQLite::Fulltext_search",
    "section": "",
    "url": "https://www.chedong.com/phpMan.php/perldoc/DBD%3A%3ASQLite%3A%3AFulltext_search/json",
    "generated": "2026-06-14T13:44:57Z",
    "sections": {
        "NAME": {
            "content": "DBD::SQLite::Fulltextsearch - Using fulltext searches with DBD::SQLite\n",
            "subsections": []
        },
        "DESCRIPTION": {
            "content": "",
            "subsections": [
                {
                    "name": "Introduction",
                    "content": "SQLite is bundled with an extension module called \"FTS\" for full-text indexing. Tables with this\nfeature enabled can be efficiently queried to find rows that contain one or more instances of\nsome specified words (also called \"tokens\"), in any column, even if the table contains many\nlarge documents.\n\nThe first full-text search modules for SQLite were called \"FTS1\" and \"FTS2\" and are now\nobsolete. The latest version is \"FTS4\", but it shares many features with the former module\n\"FTS3\", which is why parts of the API and parts of the documentation still refer to \"FTS3\"; from\na client point of view, both can be considered largely equivalent. Detailed documentation can be\nfound at <http://www.sqlite.org/fts3.html>.\n"
                },
                {
                    "name": "Short example",
                    "content": "Here is a very short example of using FTS :\n\n$dbh->do(<<\"\") or die DBI::errstr;\nCREATE VIRTUAL TABLE ftsexample USING fts4(content)\n\nmy $sth = $dbh->prepare(\"INSERT INTO ftsexample(content) VALUES (?)\");\n$sth->execute($) foreach @docstoinsert;\n\nmy $results = $dbh->selectallarrayref(<<\"\");\nSELECT docid, snippet(ftsexample) FROM ftsexample WHERE content MATCH 'foo'\n\nThe key points in this example are :\n\n*   The syntax for creating FTS tables is\n\nCREATE VIRTUAL TABLE <tablename> USING fts4(<columns>)\n\nwhere \"<columns>\" is a list of column names. Columns may be typed, but the type information\nis ignored. If no columns are specified, the default is a single column named \"content\". In\naddition, FTS tables have an implicit column called \"docid\" (or also \"rowid\") for numbering\nthe stored documents.\n\n*   Statements for inserting, updating or deleting records use the same syntax as for regular\nSQLite tables.\n\n*   Full-text searches are specified with the \"MATCH\" operator, and an operand which may be a\nsingle word, a word prefix ending with '*', a list of words, a \"phrase query\" in double\nquotes, or a boolean combination of the above.\n\n*   The builtin function \"snippet(...)\" builds a formatted excerpt of the document text, where\nthe words pertaining to the query are highlighted.\n\nThere are many more details to building and searching FTS tables, so we strongly invite you to\nread the full documentation at <http://www.sqlite.org/fts3.html>.\n"
                }
            ]
        },
        "QUERY SYNTAX": {
            "content": "Here are some explanation about FTS queries, borrowed from the sqlite documentation.\n",
            "subsections": [
                {
                    "name": "Token or token prefix queries",
                    "content": "An FTS table may be queried for all documents that contain a specified term, or for all\ndocuments that contain a term with a specified prefix. The query expression for a specific term\nis simply the term itself. The query expression used to search for a term prefix is the prefix\nitself with a '*' character appended to it. For example:\n\n-- Virtual table declaration\nCREATE VIRTUAL TABLE docs USING fts3(title, body);\n\n-- Query for all documents containing the term \"linux\":\nSELECT * FROM docs WHERE docs MATCH 'linux';\n\n-- Query for all documents containing a term with the prefix \"lin\".\nSELECT * FROM docs WHERE docs MATCH 'lin*';\n\nIf a search token (on the right-hand side of the MATCH operator) begins with \"^\" then that token\nmust be the first in its field of the document : so for example \"^lin*\" matches 'linux kernel\nchanges ...' but does not match 'new linux implementation'.\n"
                },
                {
                    "name": "Column specifications",
                    "content": "Normally, a token or token prefix query is matched against the FTS table column specified as the\nright-hand side of the MATCH operator. Or, if the special column with the same name as the FTS\ntable itself is specified, against all columns. This may be overridden by specifying a\ncolumn-name followed by a \":\" character before a basic term query. There may be space between\nthe \":\" and the term to query for, but not between the column-name and the \":\" character. For\nexample:\n\n-- Query the database for documents for which the term \"linux\" appears in\n-- the document title, and the term \"problems\" appears in either the title\n-- or body of the document.\nSELECT * FROM docs WHERE docs MATCH 'title:linux problems';\n\n-- Query the database for documents for which the term \"linux\" appears in\n-- the document title, and the term \"driver\" appears in the body of the document\n-- (\"driver\" may also appear in the title, but this alone will not satisfy the.\n-- query criteria).\nSELECT * FROM docs WHERE body MATCH 'title:linux driver';\n"
                },
                {
                    "name": "Phrase queries",
                    "content": "A phrase query is a query that retrieves all documents that contain a nominated set of terms or\nterm prefixes in a specified order with no intervening tokens. Phrase queries are specified by\nenclosing a space separated sequence of terms or term prefixes in double quotes (\"). For\nexample:\n\n-- Query for all documents that contain the phrase \"linux applications\".\nSELECT * FROM docs WHERE docs MATCH '\"linux applications\"';\n\n-- Query for all documents that contain a phrase that matches \"lin* app*\".\n-- As well as \"linux applications\", this will match common phrases such\n-- as \"linoleum appliances\" or \"link apprentice\".\nSELECT * FROM docs WHERE docs MATCH '\"lin* app*\"';\n\nNEAR queries.\nA NEAR query is a query that returns documents that contain a two or more nominated terms or\nphrases within a specified proximity of each other (by default with 10 or less intervening\nterms). A NEAR query is specified by putting the keyword \"NEAR\" between two phrase, term or\nprefix queries. To specify a proximity other than the default, an operator of the form\n\"NEAR/<N>\" may be used, where <N> is the maximum number of intervening terms allowed. For\nexample:\n\n-- Virtual table declaration.\nCREATE VIRTUAL TABLE docs USING fts4();\n\n-- Virtual table data.\nINSERT INTO docs VALUES('SQLite is an ACID compliant embedded relational database management system');\n\n-- Search for a document that contains the terms \"sqlite\" and \"database\" with\n-- not more than 10 intervening terms. This matches the only document in\n-- table docs (since there are only six terms between \"SQLite\" and \"database\"\n-- in the document).\nSELECT * FROM docs WHERE docs MATCH 'sqlite NEAR database';\n\n-- Search for a document that contains the terms \"sqlite\" and \"database\" with\n-- not more than 6 intervening terms. This also matches the only document in\n-- table docs. Note that the order in which the terms appear in the document\n-- does not have to be the same as the order in which they appear in the query.\nSELECT * FROM docs WHERE docs MATCH 'database NEAR/6 sqlite';\n\n-- Search for a document that contains the terms \"sqlite\" and \"database\" with\n-- not more than 5 intervening terms. This query matches no documents.\nSELECT * FROM docs WHERE docs MATCH 'database NEAR/5 sqlite';\n\n-- Search for a document that contains the phrase \"ACID compliant\" and the term\n-- \"database\" with not more than 2 terms separating the two. This matches the\n-- document stored in table docs.\nSELECT * FROM docs WHERE docs MATCH 'database NEAR/2 \"ACID compliant\"';\n\n-- Search for a document that contains the phrase \"ACID compliant\" and the term\n-- \"sqlite\" with not more than 2 terms separating the two. This also matches\n-- the only document stored in table docs.\nSELECT * FROM docs WHERE docs MATCH '\"ACID compliant\" NEAR/2 sqlite';\n\nMore than one NEAR operator may appear in a single query. In this case each pair of terms or\nphrases separated by a NEAR operator must appear within the specified proximity of each other in\nthe document. Using the same table and data as in the block of examples above:\n\n-- The following query selects documents that contains an instance of the term\n-- \"sqlite\" separated by two or fewer terms from an instance of the term \"acid\",\n-- which is in turn separated by two or fewer terms from an instance of the term\n-- \"relational\".\nSELECT * FROM docs WHERE docs MATCH 'sqlite NEAR/2 acid NEAR/2 relational';\n\n-- This query matches no documents. There is an instance of the term \"sqlite\" with\n-- sufficient proximity to an instance of \"acid\" but it is not sufficiently close\n-- to an instance of the term \"relational\".\nSELECT * FROM docs WHERE docs MATCH 'acid NEAR/2 sqlite NEAR/2 relational';\n\nPhrase and NEAR queries may not span multiple columns within a row.\n"
                },
                {
                    "name": "Set operations",
                    "content": "The three basic query types described above may be used to query the full-text index for the set\nof documents that match the specified criteria. Using the FTS query expression language it is\npossible to perform various set operations on the results of basic queries. There are currently\nthree supported operations:\n\n*   The AND operator determines the intersection of two sets of documents.\n\n*   The OR operator calculates the union of two sets of documents.\n\n*   The NOT operator may be used to compute the relative complement of one set of documents with\nrespect to another.\n\nThe AND, OR and NOT binary set operators must be entered using capital letters; otherwise, they\nare interpreted as basic term queries instead of set operators. Each of the two operands to an\noperator may be a basic FTS query, or the result of another AND, OR or NOT set operation.\nParenthesis may be used to control precedence and grouping.\n\nThe AND operator is implicit for adjacent basic queries without any explicit operator. For\nexample, the query expression \"implicit operator\" is a more succinct version of \"implicit AND\noperator\".\n\nBoolean operations as just described correspond to the so-called \"enhanced query syntax\" of\nsqlite; this is the version compiled with \"DBD::SQLite\", starting from version 1.31. A former\nversion, called the \"standard query syntax\", used to support tokens prefixed with '+' or '-'\nsigns (for token inclusion or exclusion); if your application needs to support this old syntax,\nuse DBD::SQLite::FTS3Transitional (published in a separate distribution) for doing the\nconversion.\n"
                }
            ]
        },
        "TOKENIZERS": {
            "content": "",
            "subsections": [
                {
                    "name": "Concept",
                    "content": "The behaviour of full-text indexes strongly depends on how documents are split into *tokens*;\ntherefore FTS table declarations can explicitly specify how to perform tokenization:\n\nCREATE ... USING fts4(<columns>, tokenize=<tokenizer>)\n\nwhere \"<tokenizer>\" is a sequence of space-separated words that triggers a specific tokenizer.\nTokenizers can be SQLite builtins, written in C code, or Perl tokenizers. Both are as explained\nbelow.\n\nSQLite builtin tokenizers\nSQLite comes with some builtin tokenizers (see <http://www.sqlite.org/fts3.html#tokenizer>) :\n\nsimple\nUnder the *simple* tokenizer, a term is a contiguous sequence of eligible characters, where\neligible characters are all alphanumeric characters, the \"\" character, and all characters\nwith UTF codepoints greater than or equal to 128. All other characters are discarded when\nsplitting a document into terms. They serve only to separate adjacent terms.\n\nAll uppercase characters within the ASCII range (UTF codepoints less than 128), are\ntransformed to their lowercase equivalents as part of the tokenization process. Thus,\nfull-text queries are case-insensitive when using the simple tokenizer.\n\nporter\nThe *porter* tokenizer uses the same rules to separate the input document into terms, but as\nwell as folding all terms to lower case it uses the Porter Stemming algorithm to reduce\nrelated English language words to a common root.\n\nicu The *icu* tokenizer uses the ICU library to decide how to identify word characters in\ndifferent languages; however, this requires SQLite to be compiled with the\n\"SQLITEENABLEICU\" pre-processor symbol defined. So, to use this tokenizer, you need edit\nMakefile.PL to add this flag in @CCDEFINE, and then recompile \"DBD::SQLite\"; of course, the\nprerequisite is to have an ICU library available on your system.\n\nunicode61\nThe *unicode61* tokenizer works very much like \"simple\" except that it does full unicode\ncase folding according to rules in Unicode Version 6.1 and it recognizes unicode space and\npunctuation characters and uses those to separate tokens. By contrast, the simple tokenizer\nonly does case folding of ASCII characters and only recognizes ASCII space and punctuation\ncharacters as token separators.\n\nBy default, \"unicode61\" also removes all diacritics from Latin script characters. This\nbehaviour can be overridden by adding the tokenizer argument \"removediacritics=0\". For\nexample:\n\n-- Create tables that remove diacritics from Latin script characters\n-- as part of tokenization.\nCREATE VIRTUAL TABLE txt1 USING fts4(tokenize=unicode61);\nCREATE VIRTUAL TABLE txt2 USING fts4(tokenize=unicode61 \"removediacritics=1\");\n\n-- Create a table that does not remove diacritics from Latin script\n-- characters as part of tokenization.\nCREATE VIRTUAL TABLE txt3 USING fts4(tokenize=unicode61 \"removediacritics=0\");\n\nAdditional options can customize the set of codepoints that unicode61 treats as separator\ncharacters or as token characters -- see the documentation in\n<http://www.sqlite.org/fts3.html#unicode61>.\n\nIf a more complex tokenizing algorithm is required, for example to implement stemming, discard\npunctuation, or to recognize compound words, use the perl tokenizer to implement your own logic,\nas explained below.\n"
                },
                {
                    "name": "Perl tokenizers",
                    "content": "Declaring a perl tokenizer\nIn addition to the builtin SQLite tokenizers, \"DBD::SQLite\" implements a *perl* tokenizer, that\ncan hook to any tokenizing algorithm written in Perl. This is specified as follows :\n\nCREATE ... USING fts4(<columns>, tokenize=perl '<perlfunction>')\n\nwhere \"<perlfunction>\" is a fully qualified Perl function name (i.e. prefixed by the name of\nthe package in which that function is declared). So for example if the function is \"myfunc\" in\nthe main program, write\n\nCREATE ... USING fts4(<columns>, tokenize=perl 'main::myfunc')\n\nWriting a perl tokenizer by hand\nThat function should return a code reference that takes a string as single argument, and returns\nan iterator (another function), which returns a tuple \"($term, $len, $start, $end, $index)\" for\neach term. Here is a simple example that tokenizes on words according to the current perl locale\n\nsub localetokenizer {\nreturn sub {\nmy $string = shift;\n\nuse locale;\nmy $regex      = qr/\\w+/;\nmy $termindex = 0;\n\nreturn sub { # closure\n$string =~ /$regex/g or return; # either match, or no more token\nmy ($start, $end) = ($-[0], $+[0]);\nmy $len           = $end-$start;\nmy $term          = substr($string, $start, $len);\nreturn ($term, $len, $start, $end, $termindex++);\n}\n};\n}\n\nThere must be three levels of subs, in a kind of \"Russian dolls\" structure, because :\n\n*   the external, named sub is called whenever accessing a FTS table with that tokenizer\n\n*   the inner, anonymous sub is called whenever a new string needs to be tokenized (either for\ninserting new text into the table, or for analyzing a query).\n\n*   the innermost, anonymous sub is called repeatedly for retrieving all terms within that\nstring.\n\nUsing Search::Tokenizer\nInstead of writing tokenizers by hand, you can grab one of those already implemented in the\nSearch::Tokenizer module. For example, if you want ignore differences between accented\ncharacters, you can write :\n\nuse Search::Tokenizer;\n$dbh->do(<<\"\") or die DBI::errstr;\nCREATE ... USING fts4(<columns>,\ntokenize=perl 'Search::Tokenizer::unaccent')\n\nAlternatively, you can use \"new\" in Search::Tokenizer to build your own tokenizer. Here is an\nexample that treats compound words (words with an internal dash or dot) as single tokens :\n\nsub mytokenizer {\nreturn Search::Tokenizer->new(\nregex => qr{\\p{Word}+(?:[-./]\\p{Word}+)*},\n);\n}\n"
                }
            ]
        },
        "Fts4aux - Direct Access to the Full-Text Index": {
            "content": "The content of a full-text index can be accessed through the virtual table module \"fts4aux\". For\nexample, assuming that our database contains a full-text indexed table named \"ft\", we can\ndeclare :\n\nCREATE VIRTUAL TABLE ftterms USING fts4aux(ft)\n\nand then query the \"ftterms\" table to access the list of terms, their frequency, etc. Examples\nare documented in <http://www.sqlite.org/fts3.html#fts4aux>.\n",
            "subsections": []
        },
        "How to spare database space": {
            "content": "By default, FTS stores a complete copy of the indexed documents, together with the fulltext\nindex. On a large collection of documents, this can consume quite a lot of disk space. However,\nFTS has some options for compressing the documents, or even for not storing them at all -- see\n<http://www.sqlite.org/fts3.html#fts4options>.\n\nIn particular, the option for *contentless FTS tables* only stores the fulltext index, without\nthe original document content. This is specified as \"content=\"\"\", like in the following example\n:\n\nCREATE VIRTUAL TABLE t1 USING fts4(content=\"\", a, b)\n\nData can be inserted into such an FTS4 table using an INSERT statements. However, unlike\nordinary FTS4 tables, the user must supply an explicit integer docid value. For example:\n\n-- This statement is Ok:\nINSERT INTO t1(docid, a, b) VALUES(1, 'a b c', 'd e f');\n\n-- This statement causes an error, as no docid value has been provided:\nINSERT INTO t1(a, b) VALUES('j k l', 'm n o');\n\nOf course your application will need an algorithm for finding the external resource\ncorresponding to any *docid* stored within SQLite.\n\nWhen using placeholders, the docid must be explicitly typed to INTEGER, because this is a\n\"hidden column\" for which sqlite is not able to automatically infer the proper type. So the\nfollowing doesn't work :\n\nmy $sth = $dbh->prepare(\"INSERT INTO t1(docid, a, b) VALUES(?, ?, ?)\");\n$sth->execute(2, 'aa', 'bb'); # constraint error\n\nbut it works with an explicitly cast :\n\nmy $sql = \"INSERT INTO t1(docid, a, b) VALUES(CAST(? AS INTEGER), ?, ?)\",\nmy $sth = $dbh->prepare(sql);\n$sth->execute(2, 'aa', 'bb');\n\nor with an explicitly typed \"bindparam\" in DBI :\n\nuse DBI qw/SQLINTEGER/;\nmy $sql = \"INSERT INTO t1(docid, a, b) VALUES(?, ?, ?)\";\nmy $sth = $dbh->prepare(sql);\n$sth->bindparam(1, 2, SQLINTEGER);\n$sth->bindparam(2, \"aa\");\n$sth->bindparam(3, \"bb\");\n$sth->execute();\n\nIt is not possible to UPDATE or DELETE a row stored in a contentless FTS4 table. Attempting to\ndo so is an error.\n\nContentless FTS4 tables also support SELECT statements. However, it is an error to attempt to\nretrieve the value of any table column other than the docid column. The auxiliary function\n\"matchinfo()\" may be used, but \"snippet()\" and \"offsets()\" may not, so if such functionality is\nneeded, it has to be directly programmed within the Perl application.\n",
            "subsections": []
        },
        "AUTHOR": {
            "content": "Laurent Dami <dami@cpan.org>\n",
            "subsections": []
        },
        "COPYRIGHT": {
            "content": "Copyright 2014 Laurent Dami.\n\nSome parts borrowed from the <http://sqlite.org> documentation, copyright 2014.\n\nThis documentation is in the public domain; you can redistribute it and/or modify it under the\nsame terms as Perl itself.\n",
            "subsections": []
        }
    },
    "summary": "DBD::SQLite::Fulltextsearch - Using fulltext searches with DBD::SQLite",
    "flags": [],
    "examples": [],
    "see_also": []
}