xdccbLister - converts offline XDCC listings into XDCC Browser (XCB) format.
xdccbLister --entity-hint Group^| --filter-include FavouriteSeries http://group.web.site/ # Note: # In this case the name of the group (Group|WeRule) contains a "|" # and symbols like "|" may need to be escaped in your OS's shell. On Windows # command line it is done using the prefix "^". with configuration file xdccbLister.conf: network=Rizon server=irc.rizon.net channel=#group # or xdccbLister --type text --split-output --spaces --usage "/ctcp entity xdcc retrieve" downloaded-listing.txt another-listing.txt yet-another-listing.txt with configuration file xdccbLister.conf: network=EFnet server=irc.efnet.org port=6665 channel=#mp3 spaces=1 dns-lookup=0 # or xdccbLister -N EFnet -S irc.efnet.fi -P 6668 -C #mp3 --filter-exclude "Avril Lavigne" "c:\listings\her horrible poppy mp3.html" # Notes: # A) No xdccbLister.conf file required since all required options specified, # B) Also, note I used the quote marks around both the filter's mask and the # path to the html file because they both contained spaces. # or # Advanced Windows example: # # Layout on disk (xdccbLister in PATH or c:\listings directory): # # c:\listings - \config - \Rizon.conf # - \Rizon-myFavouriteChannel.conf # - \EFnet.conf # - \hqIRC.conf # - \Evilnet.conf # - \processed - .\Rizon-Bot1.xcb # - .\Rizon-Bot2.xcb # - .\EFnet-Friend1.xcb # - .\EFnet-Friend2.xcb # - .\EFnet-Friend3.xcb # - .\hqIRC-Stranger1.xcb # - .\Evilnet-IRC-AnotherStranger1.xcb # - \loaded - .\24-S4-18.xcb # - \saved - .\savedFromXdccBrowser.xcb # - Network1 - .\Channel1-Bot1.xcb # - .\Channel1-Bot2.xcb # - .\Channel1-Bot3.xcb # - Network2 - .\Channel2-Bot1.xcb # - .\Channel3-Bot1.xcb # - .\channel4-Bot1.xcb # # then the listings for #myFavouriteChannel@Rizon change on their web site... 1) Windows taskbar: start->run->cmd 2) cd c:\listings 3) xdccbLister -c config\Rizon-myFavouriteChannel.conf -o simpsons-latest.xcb --filter-include "Simpsons" --filter-episode 112- 4) mirc->xdcc browser->import c:\listings\processed\simpsons-latest.xcb 5) move processed\simpsons-latest.xcb loaded\simpsons-112.xcb with configuration file config\Rizon-myFavouriteChannel.conf: network=Rizon server=irc.rizon.net port=6666 channel=#myFavouriteChannel spaces=1 input=http://myFavouriteChannelWebSite/xdcc/listings.html output-dir=processed # Notes: # A) Directories under \processed created from the single large file # .\savedFromXdccBrowser.xcb using --split-output, "--split-source network", # B) All options have shortened versions and can be placed in configuration # files (perhaps one per network/channel/series/music group/whatever). So, # for example, --filter-include and --filter-episode can be written as -fi # and -fr, respectively. # or # Advanced unix example: running an XDCC server using iroffer. # # 1) Edit /home/dude/bin/iroffer/iroffer-dudesGroup.conf: # ... # filedir /home/dude/archive/xdcc/dudesGroup # xdcclistfile /home/dude/public_html/xdcc/listings.txt # ... # # 2) Run command: chmod +x /home/dude/bin/xdccbLister.pl # # 3) Edit user scheduled cron tasks: crontab -e: # ... # 10 * * * * /home/dude/bin/xdccbLister.pl # -c /home/dude/etc/xdccbLister/dudesGroup.conf # 10 * * * * cp /home/dude/archive/xdcc/dudesGroup/listings.xcb # /home/dude/public_html/xdcc/listings.xcb # ... with configuration file /home/dude/etc/xdccbLister/dudesGroup.conf: network=EFnet server=irc.efnet.org port=6667 channel=#dudesChannel spaces=0 input=/home/dude/public_html/xdcc/listings.txt output=/home/dude/archive/xdcc/dudesGroup/listings.xcb # Notes: # A) What the above does, if it is not clear :), is to schedule one task # every 10 minutes to process it into an XCB listings.xcb file which becomes # one of the bot's packs! The other task copies the XCB listing to the public # web directory, # B) Yes, you can clearly combine the two crontab tasks above into a single # shell script and schedule that instead. It was just easier to illustrate # this way. :) # You are only limited by your imagination!
xdccbLister
is no longer a slightly complex script for a simple objective!
It is now pretty much a Swiss Army knife for turning offline XDCC listings into
files, which can be directly imported into the XDCC Browser or sliced and diced
into the most convenient form before importing.
The original intent was to make it as easy as possible to use the great XDCC
Browser (an MIRC client script) with XDCC listings which are not advertised via
IRC, but are available from a web site or a downloaded file. The latest version
of xdccbLister
does this and more.
Ok, intro over. How do we use it? The most convenient way is to just create a small config file (xdccbLister.conf typically) and put the required options in there with their default values:
network=<some irc network> server=<some irc server> port=<irc server port - only required if not port 6667> channel=<some irc channel>
Then, call the script with a url or file path like so:
xdccbLister http://hostname/path/to/html/page.html xdccbLister --type text c:\path\to\my\downloaded\listing.txt
This will generate a file with the default name (xdccbLister.xcb) in the default XCB format in the current directory. This file should be easy to import into an XDCC client script like XDCC Browser.
Ok, how about any gotchas or caveats?
If the name is not in the page or file at all, then you must explicitly specify it using --force-entity.
If you firmly believe a description in a listing may contain spaces and require the full description, you will need to use the --spaces option. The problem with using this option is that it can be too aggressive in assuming some text is part of the description (since it has few ways of knowing when a description ends, see BUGS). Only recommended on well-formed HTML or text XDCC listings with summary statistics at the end! Though that said, descriptions are not critical and it can be easy to filter out the occasional bogus entry with an odd description.And that is pretty much all there is too it for basic usage! There are many other options one can use to tweak what the script does in the creation of the output e.g. splitting the output into multiple files based on entity name, or across directories based on network name, filtering the list based on network, entity, size, queue (aka. gets), episode numbers (within the description!) etc. but these can be discovered later.
key=value
followed by a newline character. White space at the start, end or around the ``='' delimiter is ignored. Similarly, comments beginning a line with ``#'' are ignored. This option cannot be specified in a configuration file.
Default: .\xdccbLister.conf
Multiple --input options allowed.
Default: html.
Default: Mozilla/5.0.
The following options only apply to the input formats which have no well-defined structure (i.e. --type html or --type text). They have no effect when specified with other input formats.
Default: True (i.e. a DNS lookup is carried out).
Without this hint, there would be no way to pick up possible entity names which are not part of a trigger in the page or file (e.g. /msg Entity or /ctcp Entity). If there is no information at all in the page or file, please see the --force-entity.
Default: 30 characters (Rizon network appears to have the longest nick).
Default: iso-8859-1 (also known as latin1).
Recognised triggers (not case sensitive):
/msg entity xdcc send /msg entity xdcc get /ctcp entity xdcc send /ctcp entity xdcc get
Note that the word ``entity'' is replaced with the name of the entity in the output file.
This option should rarely be needed.
Default: 0/False (i.e. ``Usage='' lines are output if recognised triggers present).
Default: 6667
Default: 0/False (i.e. description does not contain spaces).
This option explicitly overrides the default behaviour of only outputting ``Usage='' lines if recognised triggers are present and forces the lines to be output with the set format.
An example trigger:
--usage "/ctcp entity xdcc retrieve"
This option should rarely be needed. Please contact the xdccbLister
author
if this option is often used with a particular trigger format not already in
the catalogue of default triggers (see --nousage).
Default: .\xdccbLister.xcb
Default: xdccb-v3.40
Default: 0/False
Default: xcb.
These options apply filters to the complete list once loaded from the source. They are invaluable in slicing and dicing the list into chunks (which are themselves lists) according to your requirements. For example,
Wildcard patterns: the usual wildcards are available: ``*'' and ``?'' for matching zero or more characters, and a single character, respectively. Please note that when used from the Windows command line certain characters (|,^) must be escaped with ``^'', e.g. ``^|'' and ``^^''.
Range filter: <number | number1-number2 | number- | -number>. The ranges are separated with commas (note no spaces). In addition, with some range filters an additional item, such as ``n/a'', may be allowed within the list. This enables you to allow entries which may simply not have a number. Examples: 1,2,3-6 or 8,20- or -30,n/a or -5,8,11,13-15,24-,n/a.
Combining filters: when multiple filters are used at the same time, the filters apply with a specific priority, basically, broadest first (or the sledgehammer to jeweller's chisel): --filter-network, --filter-entity, --filter-exclude, --filter-include, --filter-episode, --filter-size, --filter-gets, --filter-pack.
That is, first whole networks eliminated, then entities, then items based on their description, size, gets, finally individual pack numbers.
Default: alphabet.
Default: page.
Default: alphabet.
Splitting output can be handy to in order start taking control of parts of
lists. This is because these smaller lists can be further processed by
xdccbLister
(with a performance improvement) and then they can be loaded into
an XDCC client script more quickly and with minimal or no further changes.
Default: 0/False (i.e. output is not split across directories).
Default: 0/False (i.e. output is not split).
Default: entity.
These are the only options which cannot be used in a configuration file. They are intended for use only on the command line.
Basically, there are no guarantees about any sort of non-space white space on a web page other than that explicitly allowed by HTML markup. This includes newline characters. Since the script ignores all HTML markup (they are never part of an ordinary text XDCC listing) this means it ignores newline characters at the end of an XDCC entry/pack's description. Therefore, there is no way of knowing for sure when a description ends if it contains spaces.
The script currently guesses when a description ends by making assumptions about its content and what may appear after it. The description is typically a unix/windows filename and thus would never contain a ``*'' character. Also, the statistics at the end of many XDCC listings begin with ``**''. Thus, this is a natural end of description (if available). The description would also naturally end with the start of the next entry/pack (which always begins with ``#'') or if the user specified an entity hint. If all else fails, the description is limited to 255 characters.
Note, that after all this, since it is just a description, it should not be
the end of the world if a description is too short, contains odd characters,
or one ends up with a strange entry. In the latter case, for example, the entry
should be easy to filter out (either with the xdccbLister
or once imported
into the XDCC Browser).
Tasuki Yamamoto (sqzme at users dot sourceforge dot net).
Extreme thanks to Yochai Timmer who not only authored the best XDCC client script known to man, but also helped considerably in confirming all there is to know about the latest XCB formats!
Yochai Timmer's XDCC Browser for mirc, latest version available at http://www.mircscripts.org.
Download from SourceForge: https://sourceforge.net/projects/xdccblister/