NAME

unicode - command line unicode database query tool

SYNOPSIS

unicode [options] string

DESCRIPTION

This manual page documents the unicode command.
unicode is a command line unicode database query tool.
 

OPTIONS

-h
--help Show help and exit.
-x
--hexadecimal Assume string to be a hexadecimal number
-d
--decimal Assume string to be a decimal number
-o
--octal Assume string to be an octal number
-b
--binary Assume string to be a binary number
-r
--regexp Assume string to be a regular expression
-s
--string Assume string to be a sequence of characters
-a
--auto Try to guess type of string from one of the above (default)
-mMAXCOUNT
--max=MAXCOUNT Maximal number of codepoints to display, default: 20; use 0 for unlimited
-iCHARSET
--io=IOCHARSET I/O character set. For maximal pleasure, run unicode on UTF-8 capable terminal and specify IOCHARSET to be UTF-8. unicode tries to guess this value from your locale, so with properly set up locale, you should not need to specify it.
--fcp=CHARSET
--fromcp=CHARSET Convert numerical arguments from this encoding, default: no conversion. Multibyte encodings are supported. This is ignored for non-numerical arguments.
-cADDCHARSET
--charset-add=ADDCHARSET Show hexadecimal reprezentation of displayed characters in this additional charset.
-CUSE_COLOUR
--colour=USE_COLOUR USE_COLOUR is one of on off auto --colour=on will use ANSI colour codes to colourise the output --colour=off won't use colours. --colour=auto will test if standard output is a tty, and use colours only when it is. --color is a synonym of --colour
-v
--verbose Be more verbose about displayed characters, e.g. display Unihan information, if available.
-w
--wikipedia Spawn browser pointing to English Wikipedia entry about the character.
--wt
--wiktionary Spawn browser pointing to English Wiktionary entry about the character.
--brief
Display character information in brief format
--format=fmt
Use your own format for character information display. See the README for details.
--list
List (approximately) all known encodings.
--download
Try to download UnicodeData.txt into ~/.unicode/
--ascii
Display ASCII table
--brexit-ascii
--brexit Display ASCII table (EU–UK Trade and Cooperation Agreement 2020 version)

USAGE

unicode tries to guess the type of an argument. In particular, if the arguments looks like a valid hexadecimal representation of a Unicode codepoint, it will be considered to be such. Using
 
unicode face
 
will display information about U+FACE CJK COMPATIBILITY IDEOGRAPH-FACE, and it will not search for 'face' in character descriptions - for the latter, use:
 
unicode -r face
 
 
For example, you can use any of the following to display information about U+00E1 LATIN SMALL LETTER A WITH ACUTE (á):
 
unicode 00E1
 
unicode U+00E1
 
unicode á
 
unicode 'latin small letter a with acute'
 
 
You can specify a range of characters as argumets, unicode will show these characters in nice tabular format, aligned to 256-byte boundaries. Use two dots ".." to indicate the range, e.g.
 
unicode 0450..0520
 
will display the whole cyrillic and hebrew blocks (characters from U+0400 to U+05FF)
 
unicode 0400..
 
will display just characters from U+0400 up to U+04FF
 
Use --fromcp to query codepoints from other encodings:
 
unicode --fromcp cp1250 -d 200
 
Multibyte encodings are supported: unicode --fromcp big5 -x aff3
 
and multi-char strings are supported, too:
 
unicode --fromcp utf-8 -x c599c3adc5a5
 

BUGS

Tabular format does not deal well with full-width, combining, control and RTL characters.
 

SEE ALSO

ascii(1)
 
 

AUTHOR

Radovan Garabík <garabik @ kassiopeia.juls.savba.sk>
 
 

Recommended readings

Pages related to unicode you should read also: